External Email - Use Caution
Hello,
I have a question regarding how to graph results that came from a vertex wise analysis using the command mri_glmfit and mri_glmfit-sim.
I was interested in investigating an interaction effect between groups and my variable of interest (continuous) while co-varying for three nuisance variables. After running mri_glmfit and mri_glmfit-sim to correct for multiple comparisons. I visualized the results and found significant clusters.
I'm interested in graphing these results. Based on archived questions to this mailing list the individual values for each cluster can be found within the ocn.dat file and the cluster information can be found in the summary file. My analysis looked at volume, thickness and surface area. Since mean volume and area is difficult to interpret I want to convert the values to a total measure. It has been suggested in the past this can be done by multiplying each individuals values within the ocn.dat file by the number of vertices the cluster has. However, from my understanding this could be done by altering the mri_segstats command (that mri_glmfit-sim automatically runs) to include the --accumulate option.
When I do these methods to convert mean area and volume to total area and volume the results are different.
My first question is 1) Shouldn't these values be identical? The values from multiplying mean volume by number of vertices are roughly around 3500. Whereas using the --accumulate in mri_segstats are around 2500. What could be causing this discrepancy?
2) If my cluster has a size of 1500 mm^2 (in a model for area) does it make sense that every individual's values after extraction and conversion to total area are larger than the cluster size?
3) ocn.dat files are the input values meaning they're raw and would need to be corrected in a statistically (in a similar way that I modeled it in freesurfer) before graphing right?
On 9/3/19 12:05 PM, cody samth wrote:
External Email - Use Caution
Hello,
I have a question regarding how to graph results that came from a vertex wise analysis using the command mri_glmfit and mri_glmfit-sim.
I was interested in investigating an interaction effect between groups and my variable of interest (continuous) while co-varying for three nuisance variables. After running mri_glmfit and mri_glmfit-sim to correct for multiple comparisons. I visualized the results and found significant clusters.
I'm interested in graphing these results. Based on archived questions to this mailing list the individual values for each cluster can be found within the ocn.dat file and the cluster information can be found in the summary file. My analysis looked at volume, thickness and surface area. Since mean volume and area is difficult to interpret I want to convert the values to a total measure. It has been suggested in the past this can be done by multiplying each individuals values within the ocn.dat file by the number of vertices the cluster has. However, from my understanding this could be done by altering the mri_segstats command (that mri_glmfit-sim automatically runs) to include the --accumulate option.
When I do these methods to convert mean area and volume to total area and volume the results are different.
My first question is 1) Shouldn't these values be identical? The values from multiplying mean volume by number of vertices are roughly around 3500. Whereas using the --accumulate in mri_segstats are around 2500. What could be causing this discrepancy?
They should not, but the reason it fairly convoluted. When you get a cluster after running mri_glmfit-sim, that cluster is on fsaverage which is an average of 40 subjects. The area of a vertex is computed as the average of the areas of the vertices from the 40 that mapped into that vertex. This is the number that is used to compute the surface area of the cluster in the summary file. Now, when you map your subjects into the fsaverage space, they may have more or less surface area mapping into that cluster relative to the 40 (looks like more from #2 below). Also, you probably smoothed the surface area, which could have an unpredictable effect.
- If my cluster has a size of 1500 mm^2 (in a model for area) does it
make sense that every individual's values after extraction and conversion to total area are larger than the cluster size?
Yes, see above.
- ocn.dat files are the input values meaning they're raw and would
need to be corrected in a statistically (in a similar way that I modeled it in freesurfer) before graphing right?
Not sure what you mean by "corrected" here. In general, you need to be very careful when you extract data from a cluster. It would be circular to do the same test that you used to generate the cluster, though this happens a lot (see "VooDoo correlations" by Ed Vul).
Freesurfer mailing list Freesurfer@nmr.mgh.harvard.edu https://mail.nmr.mgh.harvard.edu/mailman/listinfo/freesurfer
External Email - Use Caution
Hi Douglas, thanks for your response.
They should not, but the reason it fairly convoluted. When you get a cluster after running mri_glmfit-sim, that cluster is on fsaverage which is an average of 40 subjects. The area of a vertex is computed as the average of the areas of the vertices from the 40 that mapped into that vertex. This is the number that is used to compute the surface area of the cluster in the summary file. Now, when you map your subjects into the fsaverage space, they may have more or less surface area mapping into that cluster relative to the 40 (looks like more from #2 below). Also, you probably smoothed the surface area, which could have an unpredictable effect.
Thanks that makes sense.
- ocn.dat files are the input values meaning they're raw and would
need to be corrected in a statistically (in a similar way that I modeled it in freesurfer) before graphing right?
Not sure what you mean by "corrected" here. In general, you need to be very careful when you extract data from a cluster. It would be circular to do the same test that you used to generate the cluster, though this happens a lot (see "VooDoo correlations" by Ed Vul).
My apologies corrected wasn't the best way to phrase that question. My interpretation of the ocn.dat file is that the each row contains the average input value for a subject prior to controlling for covariates. Therefore, to graph these results wouldn't these values need to undergo some method to control for covariates such as ICV, sex or age to better reflect the clusters observed from the GLM?
Or are the values in the ocn.dat file already reflective of the test/glm used to generate the cluster?
Right, the ocn.dat files have data that is uncorrected in that sense and might need to nuisance factors removed before plotting. There is a design matrix in there (Xg.dat). You can load that into matlab along with the ocn.dat, compute beta = inv(X'*X)*(X'*ocn) to get the betas. You can then compute yhat = X2*beta2 where X2 has nuisance columns removed and beta2 has the same nuisance coefficients removed, then treat yhat as your data to be plotted.
Note that plotting the results is still a form a voodoo correlations because your eye will compute the correlation even if you don't explicitly do so (though it generally does not stop anyone:).
On 9/8/2019 7:37 PM, cody samth wrote:
External Email - Use Caution
Hi Douglas, thanks for your response.
They should not, but the reason it fairly convoluted. When you get a cluster after running mri_glmfit-sim, that cluster is on fsaverage which is an average of 40 subjects. The area of a vertex is computed as the average of the areas of the vertices from the 40 that mapped into that vertex. This is the number that is used to compute the surface area of the cluster in the summary file. Now, when you map your subjects into the fsaverage space, they may have more or less surface area mapping into that cluster relative to the 40 (looks like more from #2 below). Also, you probably smoothed the surface area, which could have an unpredictable effect.
Thanks that makes sense.
- ocn.dat files are the input values meaning they're raw and would
need to be corrected in a statistically (in a similar way that I modeled it in freesurfer) before graphing right?
Not sure what you mean by "corrected" here. In general, you need to be very careful when you extract data from a cluster. It would be circular to do the same test that you used to generate the cluster, though this happens a lot (see "VooDoo correlations" by Ed Vul).
My apologies corrected wasn't the best way to phrase that question. My interpretation of the ocn.dat file is that the each row contains the average input value for a subject prior to controlling for covariates. Therefore, to graph these results wouldn't these values need to undergo some method to control for covariates such as ICV, sex or age to better reflect the clusters observed from the GLM?
Or are the values in the ocn.dat file already reflective of the test/glm used to generate the cluster?
_______________________________________________ Freesurfer mailing list Freesurfer@nmr.mgh.harvard.edumailto:Freesurfer@nmr.mgh.harvard.edu https://mail.nmr.mgh.harvard.edu/mailman/listinfo/freesurfer
External Email - Use Caution
Hi Douglas,
That's good to know its still a form of voodoo correlation. If researchers wanted to avoid this, when looking for an interaction effect in a DODS model with a continuous variable yet still wanted to know the direction of the relationships how could that be done? As currently if group 1>group 2. That could theoretically be interpreted as 1) both groups are negative 2) both groups are positive 3) one group positive one group negative. Would this be done by looking at the beta values for the slope of that variable?
Regarding graphing the results I'm not too familiar with matlab however after running the contrast you suggested in matlab and plotting the yhat value for 2 of my clusters the R2 is 1.000 for both which leads me to believe I somehow ended up saving the predicted values. (Rather than the actual values of thickness for each participant) Did I load everything into matlab correctly? These were my inputs
X = load('Xg.dat') Y = load('ocn.dat') beta = inv(X'*X)*(X'*Y) beta2 = load('beta2') ; file where I saved the beta values for the mean thickness and slope of my variable of interest; that was computed in the previous step X2 = load('X2.dat') ; removed nuisance columns from the Xg.dat file yhat = X2*beta2 ' saved these values and plotted them against my variable of interest
On Mon, Sep 9, 2019 at 10:37 AM Greve, Douglas N.,Ph.D. < DGREVE@mgh.harvard.edu> wrote:
Right, the ocn.dat files have data that is uncorrected in that sense and might need to nuisance factors removed before plotting. There is a design matrix in there (Xg.dat). You can load that into matlab along with the ocn.dat, compute beta = inv(X'*X)*(X'*ocn) to get the betas. You can then compute yhat = X2*beta2 where X2 has nuisance columns removed and beta2 has the same nuisance coefficients removed, then treat yhat as your data to be plotted.
Note that plotting the results is still a form a voodoo correlations because your eye will compute the correlation even if you don't explicitly do so (though it generally does not stop anyone:).
On 9/8/2019 7:37 PM, cody samth wrote:
External Email - Use CautionHi Douglas, thanks for your response.
They should not, but the reason it fairly convoluted. When you get a cluster after running mri_glmfit-sim, that cluster is on fsaverage which is an average of 40 subjects. The area of a vertex is computed as the average of the areas of the vertices from the 40 that mapped into that vertex. This is the number that is used to compute the surface area of the cluster in the summary file. Now, when you map your subjects into the fsaverage space, they may have more or less surface area mapping into that cluster relative to the 40 (looks like more from #2 below). Also, you probably smoothed the surface area, which could have an unpredictable effect.
Thanks that makes sense.
- ocn.dat files are the input values meaning they're raw and would
need to be corrected in a statistically (in a similar way that I modeled it in freesurfer) before graphing right?
Not sure what you mean by "corrected" here. In general, you need to be very careful when you extract data from a cluster. It would be circular to do the same test that you used to generate the cluster, though this happens a lot (see "VooDoo correlations" by Ed Vul).
My apologies corrected wasn't the best way to phrase that question. My interpretation of the ocn.dat file is that the each row contains the average input value for a subject prior to controlling for covariates. Therefore, to graph these results wouldn't these values need to undergo some method to control for covariates such as ICV, sex or age to better reflect the clusters observed from the GLM?
Or are the values in the ocn.dat file already reflective of the test/glm used to generate the cluster?
Freesurfer mailing listFreesurfer@nmr.mgh.harvard.eduhttps://mail.nmr.mgh.harvard.edu/mailman/listinfo/freesurfer
Freesurfer mailing list Freesurfer@nmr.mgh.harvard.edu https://mail.nmr.mgh.harvard.edu/mailman/listinfo/freesurfer
It is voodoo to do the same test that you used to generate the cluster (or have a graph that implies such a test). If you want to do a post-hoc test, that is totally fair. Eg, if you do an unsigned test between the two groups, you could then go back and do a signed test on the extraction.
On 9/9/19 1:31 PM, cody samth wrote:
External Email - Use Caution
Hi Douglas,
That's good to know its still a form of voodoo correlation. If researchers wanted to avoid this, when looking for an interaction effect in a DODS model with a continuous variable yet still wanted to know the direction of the relationships how could that be done? As currently if group 1>group 2. That could theoretically be interpreted as 1) both groups are negative 2) both groups are positive 3) one group positive one group negative. Would this be done by looking at the beta values for the slope of that variable?
Regarding graphing the results I'm not too familiar with matlab however after running the contrast you suggested in matlab and plotting the yhat value for 2 of my clusters the R2 is 1.000 for both which leads me to believe I somehow ended up saving the predicted values. (Rather than the actual values of thickness for each participant) Did I load everything into matlab correctly? These were my inputs
X = load('Xg.dat') Y = load('ocn.dat') beta = inv(X'*X)*(X'*Y) beta2 = load('beta2') ; file where I saved the beta values for the mean thickness and slope of my variable of interest; that was computed in the previous step X2 = load('X2.dat') ; removed nuisance columns from the Xg.dat file yhat = X2*beta2 ' saved these values and plotted them against my variable of interest
On Mon, Sep 9, 2019 at 10:37 AM Greve, Douglas N.,Ph.D. <DGREVE@mgh.harvard.edu mailto:DGREVE@mgh.harvard.edu> wrote:
Right, the ocn.dat files have data that is uncorrected in that sense and might need to nuisance factors removed before plotting. There is a design matrix in there (Xg.dat). You can load that into matlab along with the ocn.dat, compute beta = inv(X'*X)*(X'*ocn) to get the betas. You can then compute yhat = X2*beta2 where X2 has nuisance columns removed and beta2 has the same nuisance coefficients removed, then treat yhat as your data to be plotted. Note that plotting the results is still a form a voodoo correlations because your eye will compute the correlation even if you don't explicitly do so (though it generally does not stop anyone:). On 9/8/2019 7:37 PM, cody samth wrote:External Email - Use Caution Hi Douglas, thanks for your response. >They should not, but the reason it fairly convoluted. When you get a >cluster after running mri_glmfit-sim, that cluster is on fsaverage which >is an average of 40 subjects. The area of a vertex is computed as the >average of the areas of the vertices from the 40 that mapped into that >vertex. This is the number that is used to compute the surface area of >the cluster in the summary file. Now, when you map your subjects into >the fsaverage space, they may have more or less surface area mapping >into that cluster relative to the 40 (looks like more from #2 below). >Also, you probably smoothed the surface area, which could have an >unpredictable effect. Thanks that makes sense. >> 3) ocn.dat files are the input values meaning they're raw and would >> need to be corrected in a statistically (in a similar way that I >> modeled it in freesurfer) before graphing right? >Not sure what you mean by "corrected" here. In general, you need to be >very careful when you extract data from a cluster. It would be circular >to do the same test that you used to generate the cluster, though this >happens a lot (see "VooDoo correlations" by Ed Vul). My apologies corrected wasn't the best way to phrase that question. My interpretation of the ocn.dat file is that the each row contains the average input value for a subject prior to controlling for covariates. Therefore, to graph these results wouldn't these values need to undergo some method to control for covariates such as ICV, sex or age to better reflect the clusters observed from the GLM? Or are the values in the ocn.dat file already reflective of the test/glm used to generate the cluster? _______________________________________________ Freesurfer mailing list Freesurfer@nmr.mgh.harvard.edu <mailto:Freesurfer@nmr.mgh.harvard.edu> https://mail.nmr.mgh.harvard.edu/mailman/listinfo/freesurfer_______________________________________________ Freesurfer mailing list Freesurfer@nmr.mgh.harvard.edu <mailto:Freesurfer@nmr.mgh.harvard.edu> https://mail.nmr.mgh.harvard.edu/mailman/listinfo/freesurfer
Freesurfer mailing list Freesurfer@nmr.mgh.harvard.edu https://mail.nmr.mgh.harvard.edu/mailman/listinfo/freesurfer
freesurfer@nmr.mgh.harvard.edu