[Mne_analysis] Source Space Decoding Classification Timecourse

Mon Aug 7 18:28:42 EDT 2017

Hi Cody,

Scikit-learn 'roc_auc' metric necessitates to have y values in [0, 1],
that's probably the issue:
http://scikit-learn.org/stable/modules/generated/sklearn.metrics.roc_auc_score.html

I updated and simplified the source decoding example: the PR is still under
review here:
https://github.com/mne-tools/mne-python/pull/4465

Comments and complaints are more than welcome!

HTH
JR

On 7 August 2017 at 23:38, Cushing, Cody <CCUSHING1 at mgh.harvard.edu> wrote:

> Hi all,
>
> Thanks for all the helpful suggestions.  Everyone brought up imbalanced
> datasets as a possible source of the problem, but trial counts are
> equalized both in the example and in my personal dataset where the problem
> is a bit worse (baseline accuracies are averaging around 65%, even with the
> code changes JR suggested and with more samples and less features).  I also
> know I shouldn't jump to any conclusions without doing actual stats, and
> indeed I don't really expect these baseline periods to show up as
> significantly above chance. However, I figured a reviewer would nail me if
> I tried to report classification timecourses with that high of a baseline
> accuracy, even if it was statistically meaningless.
>
> JR, thanks for those bits of code, that definitely cleans it up a lot.  At
> Alex and your's suggestions, I'm trying to use the 'roc_auc' scoring method
> to see if it calms the baseline down a bit, but I'm getting some
> inconsistent behavior out of cross_val_multiscore when trying to use that
> metric.  Attached is the same source space decoding tutorial from my
> original message but now modified to run using these functions from the
> master branch that JR suggested.  The plot sensors decoding tutorial you
> linked runs just fine for me, but when I try to run this on the modified
> source space tutorial (attached), I get the following error:
>
> Value Error: roc_auc scoring can only be computed for two-class problems
>
> It doesn't seem to like the data tag variable y.  Strangely enough, if I
> define y as such:
>
> y=epochs.events[:,2], as it is defined in the sensor space tutorial
>
> the cross_val_multiscore function does not return the error (the scores
> are obviously bad since the labeling is wrong though).  In both cases y is
> just a simple numpy array with identical shape (112,) and the same number
> of unique digits, just in different orders.  So, I'm not really sure what's
> happening there, but hopefully others can replicate the problem.
>
> Cheers,
> Cody
>
>
>
>
>
> ________________________________
> From: mne_analysis-bounces at nmr.mgh.harvard.edu [
> mne_analysis-bounces at nmr.mgh.harvard.edu] on behalf of
> alexandre.barachant at gmail.com [alexandre.barachant at gmail.com]
> Sent: Saturday, August 05, 2017 4:08 PM
> To: Discussion and support forum for the users of MNE Software
> Subject: Re: [Mne_analysis] Source Space Decoding Classification Timecourse
>
> Hi Cody,
>
> Depending on your number of trials, the number of feature and the cross
> validation procedure, you can have fairly high decoding results just by
> chance.
> You should never interpret a results without running a statistical test.
> One good way to get the chance level of your classification pipeline is to
> run a permutation test : http://scikit-learn.org/stable/modules/generated/
> sklearn.model_selection.permutation_test_score.html
> the idea is to shuffle the labels, and train again the model to see what
> score you get 'by chance'. it is sometimes surprising how high you can get.
>
> If you have an unbalanced number of trial per class, i would also suggest
> to use the AUC as a metric instead of the accuracy.
>
> Alex
>
> On Sat, Aug 5, 2017 at 9:11 AM JR KING <jeanremi.king at gmail.com<mailto:
> jeanremi.king at gmail.com>> wrote:
> Hi Cody,
>
> Overall, your baseline doesn't look too bad - you would need to do a
> statistical test to check whether it is just noise variation or
> above-chance decoding scores.
>
> Still there could be multiple reasons behind a significant accuracy before
> t0 here:
> - accuracy is biased for imbalanced datasets. You can either use
> epochs.equalize_event_counts before your cross validation, or better, use a
> 'roc_auc' scoring metrics
> - filtering the data can spread information over time. Try changing your
> filtering parameters
> - IIRC, the 'sample' protocol is actually not randomized, and it is
> possible to predict the simulus category in advance.
>
> If you're using the MNE master branch, then I would recommend simplfy
> using this instead of your big loop (see https://martinos.org/mne/dev/
> auto_tutorials/plot_sensors_decoding.html#temporal-decoding for more
> details):
>
> clf = make_pipeline(StandardScaler(), SelectKBest(f_classif, k=500),
> SVC(kernel='linear'))
> time_decod = SlidingEstimator(clf, scoring='roc_auc')
> scores = cross_val_multiscore(clf, X, y, cv=5)
> plt.plot(times, scores.mean(0))
>
> (Note that I would personnally recommend clf =
> make_pipeline(StandardScaler(), LogisticRegression(C=1)) which should be
> better)
>
> Else, I believe we will be releasing the next version of MNE this month,
> so you'll just have to update MNE.
>
> Hope that helps,
>
> Jean-Rémi
>
>
>
>
>
> On 4 August 2017 at 17:19, Ghuman, Avniel <ghumana at upmc.edu<mailto:ghuma
> na at upmc.edu>> wrote:
> Hi Cody,
>
> Do you have the same number of trials in each condition after any trial
> rejection you do? If not, then the issue might be that 50% is not the
> correct chance level to think about, rather the correct chance level is the
> proportion of trials that is in your more frequent condition (eyeballing,
> maybe like 55%?). There are unbiased classifiers you can use, but I am not
> sure if they are built into MNE python...
>
> Best wishes,
> Avniel
>
> ________________________________
> From: mne_analysis-bounces at nmr.mgh.harvard.edu<mailto:mne_
> analysis-bounces at nmr.mgh.harvard.edu> [mne_analysis-bounces at nmr.mgh.
> harvard.edu<mailto:mne_analysis-bounces at nmr.mgh.harvard.edu>] on behalf
> of Cushing, Cody [CCUSHING1 at mgh.harvard.edu<mailto:CCUSHING1 at mgh.harvard.
> edu>]
> Sent: Friday, August 04, 2017 5:11 PM
> To: mne_analysis at nmr.mgh.harvard.edu<mailto:mne_analysis at nmr.
> mgh.harvard.edu>
> Subject: [Mne_analysis] Source Space Decoding Classification Timecourse
>
> Hi,
>
> I've been trying to modify the following example:
>
> http://martinos.org/mne/dev/auto_examples/decoding/plot_
> decoding_spatio_temporal_source.html
>
> to yield a time resolved classification accuracy.  I'm new to decoding so
> I've done it in a fairly brute way (just iterating this script over every
> time point), which yields a fairly convincing classification accuracy
> timecourse.  However, I'm a bit concerned at how high the accuracy is
> during the baseline, pre-stim period.  See attached for the modified script
> using the sample data and an example of the output.  I'm new to decoding,
> but the best answer I've been able to find for abnormally high pre-stim
> accuracy is failing to cross validate, but that shouldn't be the case as
> cross validation is being performed (but perhaps I'm doing it wrong) .  Is
> there something improper about my strategy here?  Thanks for any input.
>
> Cheers,
> Cody
>
> _______________________________________________
> Mne_analysis mailing list
> Mne_analysis at nmr.mgh.harvard.edu<mailto:Mne_analysis at nmr.mgh.harvard.edu>
> https://mail.nmr.mgh.harvard.edu/mailman/listinfo/mne_analysis
>
>
> The information in this e-mail is intended only for the person to whom it
> is
> addressed. If you believe this e-mail was sent to you in error and the
> e-mail
> contains patient information, please contact the Partners Compliance
> HelpLine at
> http://www.partners.org/complianceline . If the e-mail was sent to you in
> error
> but does not contain patient information, please contact the sender and
> properly
> dispose of the e-mail.
>
>
> _______________________________________________
> Mne_analysis mailing list
> Mne_analysis at nmr.mgh.harvard.edu<mailto:Mne_analysis at nmr.mgh.harvard.edu>
> https://mail.nmr.mgh.harvard.edu/mailman/listinfo/mne_analysis
>
>
> The information in this e-mail is intended only for the person to whom it
> is
> addressed. If you believe this e-mail was sent to you in error and the
> e-mail
> contains patient information, please contact the Partners Compliance
> HelpLine at
> http://www.partners.org/complianceline . If the e-mail was sent to you in
> error
> but does not contain patient information, please contact the sender and
> properly
> dispose of the e-mail.
>
> _______________________________________________
> Mne_analysis mailing list
> Mne_analysis at nmr.mgh.harvard.edu
> https://mail.nmr.mgh.harvard.edu/mailman/listinfo/mne_analysis
>
>
> The information in this e-mail is intended only for the person to whom it
> is
> addressed. If you believe this e-mail was sent to you in error and the
> e-mail
> contains patient information, please contact the Partners Compliance
> HelpLine at
> http://www.partners.org/complianceline . If the e-mail was sent to you in
> error
> but does not contain patient information, please contact the sender and
> properly
> dispose of the e-mail.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.nmr.mgh.harvard.edu/pipermail/mne_analysis/attachments/20170808/a43950cc/attachment-0001.html