[Mne_analysis] CSP with rest state EEG data for predicting stage of disease

Wed Aug 28 04:31:20 EDT 2019

        External Email - Use Caution        

Greetings.
It'd be really helpful if an expert here shed some light on using CSP for
resting state EEG data. I am trying to predict early and final stages of
brain disease. I though i could use CSP for this task as i find some new
papers in CSP that work with rest eeg data. Each instance (or patient whose
eeg is taken) is a (19*number of sampling time points) matrix. 19 is the
number of channels.
I do a train-test split of 70%-30%, eventually, would like to learn from a
50-50% split. For each of the train and test sets, a band pass filter is
applied for each frequency range and events are created with
make_fixed_length_events(). We tried 30s events with 15s overlap and
lesser. I also tried 5s events with 2s overlaps and so on. Now, i
understand that in resting state, no events are there actually but i did it
this way to use the CSP API. Of course, all epochs for a given instance, in
this case, is strictly binary (early/final)

A brief snapshot of my code is as follows:

# for each frequency band
idd=1
for freq, (fmin,fmax) in enumerate(freq_ranges):
    # for each instance/patient
    for raw in trainrawarr:
        picks = pick_types(raw.info, meg=False, eeg=True, stim=False,
eog=False,               exclude='bads')
        raw_filter=raw.copy().filter(fmin, fmax, n_jobs=-1, fir_design="firwin")
        events=make_fixed_length_events(raw_filter, id=idd,
duration=30., overlap=15.)
        epochs=Epochs(raw_filter, events, picks=picks)
       # epochs contains a 3d array, (number of epochs, 19, number of
time points)
       # all epochs in this instance are 1 or 0 indicating stage of disease.
idd=idd+1 # not sure if using unique ids is helpful.

We repeat the above for the testrawarr also.

I use sklearn GridSearch with KFoldK=2,5,10 using different classifiers
with parameter tuning, such as lda, qda, decision tree, svc, knn. The
training roc_auc is either very poor or very good. Test ROC is less than
50%. I use the np.vstack() method to concatenate all the 3 dimensional
epochs in train or test.

csp=CSP(reg=None, log=True, norm_trace=False, cov_est="epoch")
param_grid_lda = [{'lda__solver': ['lsqr'], 'lda__shrinkage': [0.0001,
0.001, 0.010, 0.1, 1],
                  'csp__n_components': [5,10, 20, 30], 'csp__reg':
[0.00001, 0.0001,0.001,0.01,0.1,1]
                  }]
for freq in range(8): # there are 8 frequency bands
    for split in [5,10]:
        print("Evaluating cv split: ", split, " in freq range: ",
freq_ranges[freq])
        clf=Pipeline([('csp', csp), ('lda', lda)])
        search=GridSearchCV(clf, cv=StratifiedKFold(split),
param_grid=param_grid_lda, scoring='roc_auc', n_jobs=-1)
        model=search.fit(np.vstack(epochs_all_freq_ranges[freq]),
epoch_labels[freq])
        testpred=search.predict(np.vstack(test_epochs_all_freq_ranges[freq]))

Can someone please explain what is wrong with this code or approach ? Would
be really helpful. I noticed similar questions, especially this -
https://mail.nmr.mgh.harvard.edu/pipermail/mne_analysis/2019-February/005630.html

but some sample code could be helpful.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.nmr.mgh.harvard.edu/pipermail/mne_analysis/attachments/20190828/424ccf31/attachment.html