[Mne_analysis] GeneralizingEstimator with incremental learning / .partial_fit

Thu Aug 20 07:19:34 EDT 2020

        External Email - Use Caution        

Dear  Jean-Rémi,

Thank you for the suggestion and, above all, thank you so much for your
help and assistance!
My scripts have been working just fine while I would have never been able
to implement my current analysis without your prompts.

All the best,

Giulia

On Fri, Aug 7, 2020 at 3:10 PM Jean-Rémi KING <jeanremi.king at gmail.com>
wrote:

>         External Email - Use Caution
>
> Hi Giula,
>
> In the long run, for batch optimization of parallel tasks (here each
> slided time sample), I would encourage you to have a look at pytorch;
> sklearn is not really optimal for this because it can't make use of gpu.
>
>
> In the meantime, here is a solution to your problem: simply put your new
> class in a separate script e.g.
>
> # in mymodel.py
> import numpy as np
> from sklearn.linear_model import SGDClassifier
>
> class MyModel(SGDClassifier):
>     def fit(self, X, y):
>         if not hasattr(self, 'classes_'):
>             self.classes_ = np.unique(y)
>         super().partial_fit(X, y, self.classes_)
>         return self
>
> # main script
> import numpy as np
> from mne.decoding import SlidingEstimator
> from mymodel import MyModel
> model = MyModel()
> slider = SlidingEstimator(model, scoring='roc_auc', n_jobs=2)
>
> X = np.random.randn(100, 10, 3)
> y = np.random.randint(0, 2, 100)
> slider.fit(X, y)
> slider.score(X, y)
>
> hope that helps
>
> JR
>
>
> On Fri, 7 Aug 2020 at 14:34, Giulia Gennari <giulia.gennari1991 at gmail.com>
> wrote:
>
>>         External Email - Use Caution
>>
>> Dear Jean-Rémi and dear Alex,
>>
>> *Thank you!*
>>
>> A solution based on this:
>> class MyModel(SGDClassifier):
>>     def fit(self, X, y):
>>         super().partial_fit(X, y)
>>         return self
>>
>> ..works fine!
>> Except for the crucial fact that parallel processing (n_jobs>1) seems not
>> feasible.
>> This is what I get when I try to score the slider (apologies for the
>> ugliness, I copy-paste everything since it might be meaningful to catch
>> what is wrong):
>>
>> ---------------------------------------------------------------------------
>> _RemoteTraceback                          Traceback (most recent call
>> last)
>> _RemoteTraceback:
>> """
>> Traceback (most recent call last):
>>   File
>> "/usr/local/anaconda3/lib/python3.7/site-packages/joblib/externals/loky/backend/queues.py",
>> line 150, in _feed
>>     obj_ = dumps(obj, reducers=reducers)
>>   File
>> "/usr/local/anaconda3/lib/python3.7/site-packages/joblib/externals/loky/backend/reduction.py",
>> line 243, in dumps
>>     dump(obj, buf, reducers=reducers, protocol=protocol)
>>   File
>> "/usr/local/anaconda3/lib/python3.7/site-packages/joblib/externals/loky/backend/reduction.py",
>> line 236, in dump
>>     _LokyPickler(file, reducers=reducers, protocol=protocol).dump(obj)
>>   File
>> "/usr/local/anaconda3/lib/python3.7/site-packages/joblib/externals/cloudpickle/cloudpickle.py",
>> line 267, in dump
>>     return Pickler.dump(self, obj)
>>   File "/usr/local/anaconda3/lib/python3.7/pickle.py", line 437, in dump
>>     self.save(obj)
>>   File "/usr/local/anaconda3/lib/python3.7/pickle.py", line 549, in save
>>     self.save_reduce(obj=obj, *rv)
>>   File "/usr/local/anaconda3/lib/python3.7/pickle.py", line 662, in
>> save_reduce
>>     save(state)
>>   File "/usr/local/anaconda3/lib/python3.7/pickle.py", line 504, in save
>>     f(self, obj) # Call unbound method with explicit self
>>   File "/usr/local/anaconda3/lib/python3.7/pickle.py", line 859, in
>> save_dict
>>     self._batch_setitems(obj.items())
>>   File "/usr/local/anaconda3/lib/python3.7/pickle.py", line 885, in
>> _batch_setitems
>>     save(v)
>>   File "/usr/local/anaconda3/lib/python3.7/pickle.py", line 549, in save
>>     self.save_reduce(obj=obj, *rv)
>>   File "/usr/local/anaconda3/lib/python3.7/pickle.py", line 662, in
>> save_reduce
>>     save(state)
>>   File "/usr/local/anaconda3/lib/python3.7/pickle.py", line 504, in save
>>     f(self, obj) # Call unbound method with explicit self
>>   File "/usr/local/anaconda3/lib/python3.7/pickle.py", line 859, in
>> save_dict
>>     self._batch_setitems(obj.items())
>>   File "/usr/local/anaconda3/lib/python3.7/pickle.py", line 890, in
>> _batch_setitems
>>     save(v)
>>   File "/usr/local/anaconda3/lib/python3.7/pickle.py", line 549, in save
>>     self.save_reduce(obj=obj, *rv)
>>   File "/usr/local/anaconda3/lib/python3.7/pickle.py", line 662, in
>> save_reduce
>>     save(state)
>>   File "/usr/local/anaconda3/lib/python3.7/pickle.py", line 504, in save
>>     f(self, obj) # Call unbound method with explicit self
>>   File "/usr/local/anaconda3/lib/python3.7/pickle.py", line 859, in
>> save_dict
>>     self._batch_setitems(obj.items())
>>   File "/usr/local/anaconda3/lib/python3.7/pickle.py", line 885, in
>> _batch_setitems
>>     save(v)
>>   File "/usr/local/anaconda3/lib/python3.7/pickle.py", line 504, in save
>>     f(self, obj) # Call unbound method with explicit self
>>   File "/usr/local/anaconda3/lib/python3.7/pickle.py", line 819, in
>> save_list
>>     self._batch_appends(obj)
>>   File "/usr/local/anaconda3/lib/python3.7/pickle.py", line 846, in
>> _batch_appends
>>     save(tmp[0])
>>   File "/usr/local/anaconda3/lib/python3.7/pickle.py", line 504, in save
>>     f(self, obj) # Call unbound method with explicit self
>>   File "/usr/local/anaconda3/lib/python3.7/pickle.py", line 774, in
>> save_tuple
>>     save(element)
>>   File "/usr/local/anaconda3/lib/python3.7/pickle.py", line 504, in save
>>     f(self, obj) # Call unbound method with explicit self
>>   File "/usr/local/anaconda3/lib/python3.7/pickle.py", line 789, in
>> save_tuple
>>     save(element)
>>   File "/usr/local/anaconda3/lib/python3.7/pickle.py", line 510, in save
>>     rv = reduce(obj)
>>   File
>> "/usr/local/anaconda3/lib/python3.7/site-packages/joblib/_memmapping_reducer.py",
>> line 361, in __call__
>>     return (loads, (dumps(a, protocol=HIGHEST_PROTOCOL),))
>> _pickle.PicklingError: Can't pickle <class '__main__.MyModel'>: it's not
>> the same object as __main__.MyModel
>> """
>>
>> The above exception was the direct cause of the following exception:
>>
>> PicklingError                             Traceback (most recent call
>> last)
>> /neurospin/grip/protocols/EEG/Giulia_NUM_MUSIK/NUM_MUSIK_DECODING_incremental_learning_test_on_VISUAL_DRAFT.py
>> in <module>
>>     278         y_test = test_epochs.events[:,2]
>>     279
>> --> 280         scores = time_gen.score(X_test, y_test)
>>     281         all_scores_D.append(scores)
>>     282
>>
>> <decorator-gen-375> in score(self, X, y)
>>
>> ~/.local/lib/python3.7/site-packages/mne/decoding/search_light.py in
>> score(self, X, y)
>>     583                              for pb_idx, x in array_split_idx(
>>     584                                  X, n_jobs, axis=-1,
>> --> 585
>>  n_per_split=len(self.estimators_)))
>>     586
>>     587         score = np.concatenate(score, axis=1)
>>
>> ~/.local/lib/python3.7/site-packages/mne/parallel.py in run(*args,
>> **kwargs)
>>     126     def run(*args, **kwargs):
>>     127         try:
>> --> 128             return fun(*args, **kwargs)
>>     129         except RuntimeError as err:
>>     130             msg = str(err.args[0]) if err.args else ''
>>
>> /usr/local/anaconda3/lib/python3.7/site-packages/joblib/parallel.py in
>> __call__(self, iterable)
>>     932
>>     933             with self._backend.retrieval_context():
>> --> 934                 self.retrieve()
>>     935             # Make sure that we get a last message telling us we
>> are done
>>     936             elapsed_time = time.time() - self._start_time
>>
>> /usr/local/anaconda3/lib/python3.7/site-packages/joblib/parallel.py in
>> retrieve(self)
>>     831             try:
>>     832                 if getattr(self._backend, 'supports_timeout',
>> False):
>> --> 833
>> self._output.extend(job.get(timeout=self.timeout))
>>     834                 else:
>>     835                     self._output.extend(job.get())
>>
>> /usr/local/anaconda3/lib/python3.7/site-packages/joblib/_parallel_backends.py
>> in wrap_future_result(future, timeout)
>>     519         AsyncResults.get from multiprocessing."""
>>     520         try:
>> --> 521             return future.result(timeout=timeout)
>>     522         except LokyTimeoutError:
>>     523             raise TimeoutError()
>>
>> /usr/local/anaconda3/lib/python3.7/concurrent/futures/_base.py in
>> result(self, timeout)
>>     433                 raise CancelledError()
>>     434             elif self._state == FINISHED:
>> --> 435                 return self.__get_result()
>>     436             else:
>>     437                 raise TimeoutError()
>>
>> /usr/local/anaconda3/lib/python3.7/concurrent/futures/_base.py in
>> __get_result(self)
>>     382     def __get_result(self):
>>     383         if self._exception:
>> --> 384             raise self._exception
>>     385         else:
>>     386             return self._result
>>
>> PicklingError: Could not pickle the task to send it to the workers.
>>
>> Would you know to solve it?
>> Without parallel processing I don't think I can get to the end of the
>> analysis before Christmas 😌
>>
>> Thank you very much again!!!!
>>
>> Giulia
>>
>> On Thu, Aug 6, 2020 at 4:12 PM Jean-Rémi KING <jeanremi.king at gmail.com>
>> wrote:
>>
>>>         External Email - Use Caution
>>>
>>> Hi Giula,
>>>
>>> good catch, I had forgotten that we're cloning the estimator for each
>>> time sample; you'll thus need to do this:
>>>
>>> class MyModel(SGDClassifier):
>>>     def fit(self, X, y):
>>>         super().partial_fit(X, y)
>>>         return self
>>>
>>> model = MyModel(loss='log', class_weight='balanced')
>>> slider = SlidingEstimator(model, scoring='roc_auc')
>>>
>>> Hope that helps
>>>
>>> JR
>>>
>>>
>>> On Thu, 6 Aug 2020 at 15:56, Giulia Gennari <
>>> giulia.gennari1991 at gmail.com> wrote:
>>>
>>>>         External Email - Use Caution
>>>>
>>>> Dear Jean-Rémi,
>>>>
>>>> Thank you for the nice suggestion!
>>>>
>>>> Just to make sure that this is working (I apologize for my ignorance):
>>>>
>>>> When I run:
>>>> model = SGDClassifier(loss='log', class_weight='balanced')
>>>> model.fit = model.partial_fit
>>>> slider1 = SlidingEstimator(model, scoring='roc_auc')
>>>> slider1.fit(X_train, y_train)
>>>>
>>>> or
>>>>
>>>> clf = make_pipeline(Vectorizer(), StandardScaler(), model)
>>>> slider2 = SlidingEstimator(clf, scoring='roc_auc')
>>>> slider2.fit(X_train, y_train)
>>>>
>>>> I do not get any error, while I would expect:
>>>>
>>>> ValueError: class_weight 'balanced' is not supported for partial_fit. In order to use 'balanced' weights, use compute_class_weight('balanced', classes, y). Pass the resulting weights as the class_weight parameter.
>>>>
>>>>
>>>> Since this is what I get with:
>>>> model.fit(X_train[:,:,single_time_point], y_train)
>>>>
>>>> Is there a good reason for that? E.g. class weights are computed
>>>> internally beforehand by SlidingEstimator?
>>>>
>>>> Thank you again!
>>>>
>>>> Giulia
>>>>
>>>> On Wed, Aug 5, 2020 at 7:18 PM Jean-Rémi KING <jeanremi.king at gmail.com>
>>>> wrote:
>>>>
>>>>>         External Email - Use Caution
>>>>>
>>>>> Hi Giulia,
>>>>>
>>>>> I think you should be able to change the method:
>>>>>
>>>>> model = sklearn.linear_model.SGDClassifier()
>>>>> model.fit = model.partial_fit
>>>>> slider = mne.decoding.SlidingEstimator(model)
>>>>> for X, y in train_batches:
>>>>>     slider.fit(X, y)
>>>>>
>>>>> Best
>>>>>
>>>>> JR
>>>>>
>>>>> On Wed, 5 Aug 2020 at 18:40, Giulia Gennari <
>>>>> giulia.gennari1991 at gmail.com> wrote:
>>>>>
>>>>>>         External Email - Use Caution
>>>>>>
>>>>>> Hi!
>>>>>>
>>>>>> I would need to try decoding with incremental learning (EEG data).
>>>>>> I was planning to use logistic regression by means of the
>>>>>> SGDClassifier
>>>>>> <https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.SGDClassifier.html>
>>>>>>  .
>>>>>> I would then need to call .partial_fit to make my estimator learn on
>>>>>> each of my training sets.
>>>>>> However:
>>>>>>
>>>>>> 'GeneralizingEstimator' object has no attribute 'partial_fit'
>>>>>>
>>>>>> Same issue for SlidingEstimator.
>>>>>> Is there a way to work around this limitation?
>>>>>>
>>>>>> Thank you so so much in advance!
>>>>>>
>>>>>> Giulia Gennari
>>>>>> _______________________________________________
>>>>>> Mne_analysis mailing list
>>>>>> Mne_analysis at nmr.mgh.harvard.edu
>>>>>> https://mail.nmr.mgh.harvard.edu/mailman/listinfo/mne_analysis
>>>>>
>>>>> _______________________________________________
>>>>> Mne_analysis mailing list
>>>>> Mne_analysis at nmr.mgh.harvard.edu
>>>>> https://mail.nmr.mgh.harvard.edu/mailman/listinfo/mne_analysis
>>>>
>>>> _______________________________________________
>>>> Mne_analysis mailing list
>>>> Mne_analysis at nmr.mgh.harvard.edu
>>>> https://mail.nmr.mgh.harvard.edu/mailman/listinfo/mne_analysis
>>>
>>> _______________________________________________
>>> Mne_analysis mailing list
>>> Mne_analysis at nmr.mgh.harvard.edu
>>> https://mail.nmr.mgh.harvard.edu/mailman/listinfo/mne_analysis
>>
>> _______________________________________________
>> Mne_analysis mailing list
>> Mne_analysis at nmr.mgh.harvard.edu
>> https://mail.nmr.mgh.harvard.edu/mailman/listinfo/mne_analysis
>
> _______________________________________________
> Mne_analysis mailing list
> Mne_analysis at nmr.mgh.harvard.edu
> https://mail.nmr.mgh.harvard.edu/mailman/listinfo/mne_analysis
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.nmr.mgh.harvard.edu/pipermail/mne_analysis/attachments/20200820/ef24b66f/attachment-0001.html