[Mne_analysis] Source Space Decoding Classification Timecourse

Mon Sep 11 10:15:17 EDT 2017

Hi Cody,

Is your problem solved now? Else, can you open an issue on github for us to
replicate the error?

Thanks

JR

On 8 August 2017 at 10:18, Cushing, Cody <CCUSHING1 at mgh.harvard.edu> wrote:

> Hey JR,
>
> That was what I was thinking at first, but I actually initially first got
> the error using y values in [0,1].  Plus, the sensor tutorial (
> https://martinos.org/mne/dev/auto_tutorials/plot_sensors_
> decoding.html#temporal-decoding), which runs fine has the y array filled
> with 1's and 3's.  Also, the updated tutorial you linked (which thanks for
> doing), is also defining y based on
>
> y=epochs.events[:,2] before using the cross_val_multiscore function with
> 'roc_auc' scoring, so those values aren't in range [0,1].
>
> Even if I change the definition of y in the example with the sample data I
> attached to my last message to:
>
> y = np.repeat([0,1], len(X) / 2)   # belongs to the second class
>
>  I still get the error.  But that's all presuming others are able to
> replicate the error and its not just my system being weird.
>
> Cheers,
> Cody
>
> ------------------------------
> *From:* mne_analysis-bounces at nmr.mgh.harvard.edu [
> mne_analysis-bounces at nmr.mgh.harvard.edu] on behalf of JR KING [
> jeanremi.king at gmail.com]
> *Sent:* Monday, August 07, 2017 6:28 PM
>
> *To:* Discussion and support forum for the users of MNE Software
> *Subject:* Re: [Mne_analysis] Source Space Decoding Classification
> Timecourse
>
> Hi Cody,
>
> Scikit-learn 'roc_auc' metric necessitates to have y values in [0, 1],
> that's probably the issue: http://scikit-learn.
> org/stable/modules/generated/sklearn.metrics.roc_auc_score.html
>
> I updated and simplified the source decoding example: the PR is still
> under review here:
> https://github.com/mne-tools/mne-python/pull/4465
>
> Comments and complaints are more than welcome!
>
> HTH
> JR
>
>
> On 7 August 2017 at 23:38, Cushing, Cody <CCUSHING1 at mgh.harvard.edu>
> wrote:
>
>> Hi all,
>>
>> Thanks for all the helpful suggestions.  Everyone brought up imbalanced
>> datasets as a possible source of the problem, but trial counts are
>> equalized both in the example and in my personal dataset where the problem
>> is a bit worse (baseline accuracies are averaging around 65%, even with the
>> code changes JR suggested and with more samples and less features).  I also
>> know I shouldn't jump to any conclusions without doing actual stats, and
>> indeed I don't really expect these baseline periods to show up as
>> significantly above chance. However, I figured a reviewer would nail me if
>> I tried to report classification timecourses with that high of a baseline
>> accuracy, even if it was statistically meaningless.
>>
>> JR, thanks for those bits of code, that definitely cleans it up a lot.
>> At Alex and your's suggestions, I'm trying to use the 'roc_auc' scoring
>> method to see if it calms the baseline down a bit, but I'm getting some
>> inconsistent behavior out of cross_val_multiscore when trying to use that
>> metric.  Attached is the same source space decoding tutorial from my
>> original message but now modified to run using these functions from the
>> master branch that JR suggested.  The plot sensors decoding tutorial you
>> linked runs just fine for me, but when I try to run this on the modified
>> source space tutorial (attached), I get the following error:
>>
>> Value Error: roc_auc scoring can only be computed for two-class problems
>>
>> It doesn't seem to like the data tag variable y.  Strangely enough, if I
>> define y as such:
>>
>> y=epochs.events[:,2], as it is defined in the sensor space tutorial
>>
>> the cross_val_multiscore function does not return the error (the scores
>> are obviously bad since the labeling is wrong though).  In both cases y is
>> just a simple numpy array with identical shape (112,) and the same number
>> of unique digits, just in different orders.  So, I'm not really sure what's
>> happening there, but hopefully others can replicate the problem.
>>
>> Cheers,
>> Cody
>>
>>
>>
>>
>>
>> ________________________________
>> From: mne_analysis-bounces at nmr.mgh.harvard.edu [
>> mne_analysis-bounces at nmr.mgh.harvard.edu] on behalf of
>> alexandre.barachant at gmail.com [alexandre.barachant at gmail.com]
>> Sent: Saturday, August 05, 2017 4:08 PM
>> To: Discussion and support forum for the users of MNE Software
>> Subject: Re: [Mne_analysis] Source Space Decoding Classification
>> Timecourse
>>
>> Hi Cody,
>>
>> Depending on your number of trials, the number of feature and the cross
>> validation procedure, you can have fairly high decoding results just by
>> chance.
>> You should never interpret a results without running a statistical test.
>> One good way to get the chance level of your classification pipeline is to
>> run a permutation test : http://scikit-learn.org/stable
>> /modules/generated/sklearn.model_selection.permutation_test_score.html
>> the idea is to shuffle the labels, and train again the model to see what
>> score you get 'by chance'. it is sometimes surprising how high you can get.
>>
>> If you have an unbalanced number of trial per class, i would also suggest
>> to use the AUC as a metric instead of the accuracy.
>>
>> Alex
>>
>> On Sat, Aug 5, 2017 at 9:11 AM JR KING <jeanremi.king at gmail.com<mailto:
>> jeanremi.king at gmail.com>> wrote:
>> Hi Cody,
>>
>> Overall, your baseline doesn't look too bad - you would need to do a
>> statistical test to check whether it is just noise variation or
>> above-chance decoding scores.
>>
>> Still there could be multiple reasons behind a significant accuracy
>> before t0 here:
>> - accuracy is biased for imbalanced datasets. You can either use
>> epochs.equalize_event_counts before your cross validation, or better, use a
>> 'roc_auc' scoring metrics
>> - filtering the data can spread information over time. Try changing your
>> filtering parameters
>> - IIRC, the 'sample' protocol is actually not randomized, and it is
>> possible to predict the simulus category in advance.
>>
>> If you're using the MNE master branch, then I would recommend simplfy
>> using this instead of your big loop (see https://martinos.org/mne/dev/a
>> uto_tutorials/plot_sensors_decoding.html#temporal-decoding for more
>> details):
>>
>> clf = make_pipeline(StandardScaler(), SelectKBest(f_classif, k=500),
>> SVC(kernel='linear'))
>> time_decod = SlidingEstimator(clf, scoring='roc_auc')
>> scores = cross_val_multiscore(clf, X, y, cv=5)
>> plt.plot(times, scores.mean(0))
>>
>> (Note that I would personnally recommend clf =
>> make_pipeline(StandardScaler(), LogisticRegression(C=1)) which should be
>> better)
>>
>> Else, I believe we will be releasing the next version of MNE this month,
>> so you'll just have to update MNE.
>>
>> Hope that helps,
>>
>> Jean-Rémi
>>
>>
>>
>>
>>
>> On 4 August 2017 at 17:19, Ghuman, Avniel <ghumana at upmc.edu<mailto:ghuma
>> na at upmc.edu>> wrote:
>> Hi Cody,
>>
>> Do you have the same number of trials in each condition after any trial
>> rejection you do? If not, then the issue might be that 50% is not the
>> correct chance level to think about, rather the correct chance level is the
>> proportion of trials that is in your more frequent condition (eyeballing,
>> maybe like 55%?). There are unbiased classifiers you can use, but I am not
>> sure if they are built into MNE python...
>>
>> Best wishes,
>> Avniel
>>
>> ________________________________
>> From: mne_analysis-bounces at nmr.mgh.harvard.edu<mailto:mne_analysis
>> -bounces at nmr.mgh.harvard.edu> [mne_analysis-bounces at nmr.mgh.harvard.edu
>> <mailto:mne_analysis-bounces at nmr.mgh.harvard.edu>] on behalf of Cushing,
>> Cody [CCUSHING1 at mgh.harvard.edu<mailto:CCUSHING1 at mgh.harvard.edu>]
>> Sent: Friday, August 04, 2017 5:11 PM
>> To: mne_analysis at nmr.mgh.harvard.edu<mailto:mne_analysis at nmr.mgh
>> .harvard.edu>
>> Subject: [Mne_analysis] Source Space Decoding Classification Timecourse
>>
>> Hi,
>>
>> I've been trying to modify the following example:
>>
>> http://martinos.org/mne/dev/auto_examples/decoding/plot_deco
>> ding_spatio_temporal_source.html
>>
>> to yield a time resolved classification accuracy.  I'm new to decoding so
>> I've done it in a fairly brute way (just iterating this script over every
>> time point), which yields a fairly convincing classification accuracy
>> timecourse.  However, I'm a bit concerned at how high the accuracy is
>> during the baseline, pre-stim period.  See attached for the modified script
>> using the sample data and an example of the output.  I'm new to decoding,
>> but the best answer I've been able to find for abnormally high pre-stim
>> accuracy is failing to cross validate, but that shouldn't be the case as
>> cross validation is being performed (but perhaps I'm doing it wrong) .  Is
>> there something improper about my strategy here?  Thanks for any input.
>>
>> Cheers,
>> Cody
>>
>> _______________________________________________
>> Mne_analysis mailing list
>> Mne_analysis at nmr.mgh.harvard.edu<mailto:Mne_analysis at nmr.mgh.harvard.edu>
>> https://mail.nmr.mgh.harvard.edu/mailman/listinfo/mne_analysis
>>
>>
>> The information in this e-mail is intended only for the person to whom it
>> is
>> addressed. If you believe this e-mail was sent to you in error and the
>> e-mail
>> contains patient information, please contact the Partners Compliance
>> HelpLine at
>> http://www.partners.org/complianceline . If the e-mail was sent to you
>> in error
>> but does not contain patient information, please contact the sender and
>> properly
>> dispose of the e-mail.
>>
>>
>> _______________________________________________
>> Mne_analysis mailing list
>> Mne_analysis at nmr.mgh.harvard.edu<mailto:Mne_analysis at nmr.mgh.harvard.edu>
>> https://mail.nmr.mgh.harvard.edu/mailman/listinfo/mne_analysis
>>
>>
>> The information in this e-mail is intended only for the person to whom it
>> is
>> addressed. If you believe this e-mail was sent to you in error and the
>> e-mail
>> contains patient information, please contact the Partners Compliance
>> HelpLine at
>> http://www.partners.org/complianceline . If the e-mail was sent to you
>> in error
>> but does not contain patient information, please contact the sender and
>> properly
>> dispose of the e-mail.
>>
>> _______________________________________________
>> Mne_analysis mailing list
>> Mne_analysis at nmr.mgh.harvard.edu
>> https://mail.nmr.mgh.harvard.edu/mailman/listinfo/mne_analysis
>>
>>
>> The information in this e-mail is intended only for the person to whom it
>> is
>> addressed. If you believe this e-mail was sent to you in error and the
>> e-mail
>> contains patient information, please contact the Partners Compliance
>> HelpLine at
>> http://www.partners.org/complianceline . If the e-mail was sent to you
>> in error
>> but does not contain patient information, please contact the sender and
>> properly
>> dispose of the e-mail.
>>
>>
>
> _______________________________________________
> Mne_analysis mailing list
> Mne_analysis at nmr.mgh.harvard.edu
> https://mail.nmr.mgh.harvard.edu/mailman/listinfo/mne_analysis
>
>
> The information in this e-mail is intended only for the person to whom it
> is
> addressed. If you believe this e-mail was sent to you in error and the
> e-mail
> contains patient information, please contact the Partners Compliance
> HelpLine at
> http://www.partners.org/complianceline . If the e-mail was sent to you in
> error
> but does not contain patient information, please contact the sender and
> properly
> dispose of the e-mail.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.nmr.mgh.harvard.edu/pipermail/mne_analysis/attachments/20170911/7b3e4485/attachment-0001.html