[Mne_analysis] Temporal Generalization - Different results with and without using cross validation

Wed Mar 25 11:38:15 EDT 2020

        External Email - Use Caution        

Hi MNE experts,

I am using the temporal generalization
<https://mne.tools/dev/auto_tutorials/machine-learning/plot_sensors_decoding.html#temporal-generalization>
approach.
I have plotted scores' output from cross_val_multiscore
<https://mne.tools/dev/generated/mne.decoding.cross_val_multiscore.html#mne.decoding.cross_val_multiscore>
with a RepeatedStratifiedKFold
<https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.RepeatedStratifiedKFold.html#sklearn.model_selection.RepeatedStratifiedKFold>
 cv parameter.  I have also plotted training scores

I assumed that I should get the same result and the only difference would
be the diagonal results where the diagonal training scores will be all 1.
However, the general results are quite different (you can still see some
fade underlying pattern similar in both).

Any idea of why plotting scores using cross-validation v.s. only
plotting fitting/training scores will give different results?

This is my understanding of what should be going on: in the training case
without using any cross-validation, on each time point, there was a
classifier/decoder that was trained by seeing all EEG channels' data over
all epochs at that train time point, therefore it would give a perfect
score on the same test time point.  However, a different time point
(testing times) has different data that can be seen as a test set for this
decoder. Right? (Even if there was an autocorrelation between EEG data over
time and still see some meaningful pattern in time generalization matrix,
it means that EEG data had task-related information over time which is
still meaningful).

---------
I have also put my code here:

*Scores using cross-validation:*

*clf_SVC  = make_pipeline(*
*                        StandardScaler(),*
*                        LinearModel(LinearSVC(random_state=0,
max_iter=10000)))*

*temp_gen = GeneralizingEstimator(clf_SVC, scoring='roc_auc',
n_jobs=1,verbose=True)*

*cv = StratifiedKFold(n_splits=5, shuffle=True)*
*scores = cross_val_multiscore(temp_gen, X, y, cv=cv, n_jobs=1)*

*Only fitting scores:*

*temp_gen.fit(X=X ,y=y)*
*scores = temp_gen.score(X=X, y=y) #scores without cv*
*-----------*

- I will appreciate any comments,
Thanks
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.nmr.mgh.harvard.edu/pipermail/mne_analysis/attachments/20200325/1a30eb6b/attachment.html