I’d like to add (and sadly with a little concern):

Usually I agree with Doug that the number of time points is not so relevant (e.g. if it is 5 or 6 who cares). Especially if this is different in individual cases. If this differs consistently across groups it is a source of bias, for two reasons:

1. In the statistical model linear fits will be more reliable the more time points are available. So this is a general issue and I think this one is the smaller problem.

2. In your case it is 2 vs. 3 time points and there it does make a bigger difference. The reason is that we are computing a voxel-wise median image for the template on which surfaces etc are estimated. For 3 or more time points we compute a median, but for two it reduces to the mean and those images look significantly more blurry. So I do expect a bias regarding the surface placement.

Potential ideas how to approach this:

- You can compare this to cross sectional analysis (but that looses so much sensitivity).

- You could also drop a time point from the 3 tp dataset and use longitudinal processing (also loosing valuable information, but at least removing the potential bias).

- Or you could force the template creation to always use the mean (even for 3 time points). But I think this option does not even have a flag in recon-all. You would have to change this (hard coded) in the template creation step (the mri_robust_template call in one of recon-all’s sub-scripts). If you do that you do not solve the first issue, only the second (but probably is the bigger one). If you want to try that, I can see if I can point you to the changes that you need to do in the specific script.

Other sources of bias are:

- different cohorts (e.g. I often see people with a disease dataset and no controls, so they “take” them from a publicly available dataset). This does not make sense as that other dataset will be a very different distribution (regarding e.g. age, sex, ethnic or population differences, education …)

- consistent hardware differences: different scanners, field strengths, sequences, resolutions, head coils … , also padding of the head/pillow can lead to different motion levels.

- some sources of bias are hard to remove, eg. head motion (usually disease groups move more than controls, older participants move more, higher BMI move more etc. If you collect fMRI or diffusion, you can use motion scores from there in your statistics as a surrogate confounder.

- hydration levels (also differ across disease groups, or in longitudinal settings could differ based on time of day, or summer/winter).

The more you can control the better, sometimes this can be done in the statistics, but is difficult if it is correlated with your group effect.

On 11. Jun 2025, at 21:59, Douglas N. Greve <dgreve@mgh.harvard.edu> wrote:

answers below

On 6/6/2025 5:43 AM, Keller, Lara wrote:

Dear FreeSurfer Community,

I am currently analyzing structural MRI data using FreeSurfer 8 and have a specific question regarding the longitudinal pipeline. My dataset consists of two groups:

One group has 3 follow-up timepoints

The other group has 2 follow-up timepoints

I have processed the data using both the cross-sectional and longitudinal recon-all pipelines, and I am now wondering whether it is appropriate to proceed with the longitudinal output, despite the unequal number of timepoints between groups.

Key Concerns:

Since FreeSurfer creates subject-specific within-subject templates, could the differing number of sessions lead to relevant systematic differences between the groups? Specifically, will the segmentation accuracy or signal-to-noise ratio be noticeably different between the groups due to these variations?

I'm not so worried about this. The base template is only used to initialize the longitdinal time point. Once the init is done, it plays no more role. In long analysis, there is noise caused by the initializing the analysis at difference points and noise from the time point itself. The FS long stream only reduces the first kind, and this is only going to be modestly affected by different no of time points (if at all). All data sets regardless of the number of follow ups will see the same time-point specific noise, so, in the end, I don't think it will affect much to have one group with two and another with three.

Would it be valid to continue using the longitudinal output but statistically account for the unequal number of sessions (e.g., via Linear Mixed Models)?

Yes

Alternatively, would it be preferable to use cross-sectional outputs to avoid potential biases introduced by differing template quality?

I don't think so. See above

If I analyze only one specific timepoint for certain research questions, should I still use the longitudinal output, or would cross-sectional data be more appropriate?

The long is probably going to give you the most reliable and accurate surfaces and segs, so I would use that.
I find myself torn between the improved segmentation accuracy of the longitudinal approach and the potential systematic differences introduced by the varying number of timepoints. Would a viable solution be to proceed with the longitudinal output while ensuring robustness by replicating key findings using cross-sectional data?

I would greatly appreciate your insights on whether it is methodologically sound to proceed with the longitudinal pipeline in this case and, if so, how best to control for the unequal timepoints in statistical analysis.

Thank you very much for your time and guidance!

Best regards,

Lara
_______________________________________________
Freesurfer mailing list -- freesurfer@nmr.mgh.harvard.edu
To unsubscribe send an email to freesurfer-leave@nmr.mgh.harvard.edu
https://mail.nmr.mgh.harvard.edu/mailman3/lists/freesurfer@nmr.mgh.harvard.edu/
_______________________________________________
Freesurfer mailing list -- freesurfer@nmr.mgh.harvard.edu
To unsubscribe send an email to freesurfer-leave@nmr.mgh.harvard.edu
https://mail.nmr.mgh.harvard.edu/mailman3/lists/freesurfer@nmr.mgh.harvard.edu/