[Mne_analysis] CSP with rest state EEG data for predicting stage of disease

Wed Aug 28 21:05:51 EDT 2019

        External Email - Use Caution        

Greetings. Posted this question earlier but i needed to register with the
list first. So, posting again.

I am trying to predict early and final stages of brain disease. I though i
could use CSP for this task as i find some new papers in CSP that work with
rest eeg data. Each instance (or patient whose eeg is taken) is a
(19*number of sampling time points) matrix. 19 is the number of channels. I
do a train-test split of 70%-30%, eventually, would like to learn from a
50-50% split. For each of the train and test sets, a band pass filter is
applied for each frequency range and events are created with
make_fixed_length_events(). We tried 30s events with 15s overlap and
lesser. I also tried 5s events with 2s overlaps and so on. Now,
i understand that in resting state, no events are there actually but i did
it this way to use the CSP API. Of course, all epochs for a given instance,
in this case, is strictly binary (early/final)

A brief snapshot of my code is as follows:

# for each frequency band
for freq, (fmin,fmax) in enumerate(freq_ranges):
    # for each instance/patient
    for raw in trainrawarr:
        picks = pick_types(raw.info, meg=False, eeg=True,
stim=False,eog=False,exclude='bads')
        raw_filter=raw.copy().filter(fmin, fmax, n_jobs=-1,
fir_design="firwin")
        events=make_fixed_length_events(raw_filter, id=1,duration=30.,
overlap=15.)
        epochs=Epochs(raw_filter, events, picks=picks)
       # epochs contains a 3d array, (number of epochs, 19, number of time
points)
       # all epochs in this instance are 1 or 0 indicating stage of disease.

We repeat the above for the testrawarr also.

I use sklearn GridSearch with KFoldK=2,5,10 using different classifiers
with parameter tuning, such as lda, qda, decision tree,svc, knn. The
training roc_auc is either very poor or very good. Test ROC is less than
50%. I use the np.vstack() method to concatenate all the 3 dimensional
epochs in train or test.

csp=CSP(reg=None, log=True, norm_trace=False, cov_est="epoch")
param_grid_lda = [{'lda__solver': ['lsqr'], 'lda__shrinkage': [0.0001,
0.001, 0.010, 0.1, 1], 'csp__n_components': [5,10, 20, 30],
'csp__reg':[0.00001, 0.0001,0.001,0.01,0.1,1] }]
for freq in range(8): # there are 8 frequency bands
    for split in [5,10]:
        print("Evaluating cv split: ", split, " in freq range:
",freq_ranges[freq])
        clf=Pipeline([('csp', csp), ('lda', lda)])
        search=GridSearchCV(clf,

cv=StratifiedKFold(split),param_grid=param_grid_lda,scoring='roc_auc',
n_jobs=-1)

model=search.fit(np.vstack(epochs_all_freq_ranges[freq]),epoch_labels[freq])

testpred=search.predict(np.vstack(test_epochs_all_freq_ranges[freq]))

Can someone please explain what is wrong with this code or approach ?Would
be really helpful. I noticed similar questions, especially this
-
https://mail.nmr.mgh.harvard.edu/pipermail/mne_analysis/2019-February/005630.html
but some sample code could be helpful.

Thanks
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.nmr.mgh.harvard.edu/pipermail/mne_analysis/attachments/20190829/a14d1ad0/attachment.html