[Mne_analysis] GeneralizingEstimator with incremental learning / .partial_fit

Fri Aug 7 08:57:03 EDT 2020

        External Email - Use Caution        

Hi Giula,

In the long run, for batch optimization of parallel tasks (here each slided
time sample), I would encourage you to have a look at pytorch; sklearn is
not really optimal for this because it can't make use of gpu.

In the meantime, here is a solution to your problem: simply put your new
class in a separate script e.g.

# in mymodel.py
import numpy as np
from sklearn.linear_model import SGDClassifier

class MyModel(SGDClassifier):
    def fit(self, X, y):
        if not hasattr(self, 'classes_'):
            self.classes_ = np.unique(y)
        super().partial_fit(X, y, self.classes_)
        return self

# main script
import numpy as np
from mne.decoding import SlidingEstimator
from mymodel import MyModel
model = MyModel()
slider = SlidingEstimator(model, scoring='roc_auc', n_jobs=2)

X = np.random.randn(100, 10, 3)
y = np.random.randint(0, 2, 100)
slider.fit(X, y)
slider.score(X, y)

hope that helps

JR

On Fri, 7 Aug 2020 at 14:34, Giulia Gennari <giulia.gennari1991 at gmail.com>
wrote:

>         External Email - Use Caution
>
> Dear Jean-Rémi and dear Alex,
>
> *Thank you!*
>
> A solution based on this:
> class MyModel(SGDClassifier):
>     def fit(self, X, y):
>         super().partial_fit(X, y)
>         return self
>
> ..works fine!
> Except for the crucial fact that parallel processing (n_jobs>1) seems not
> feasible.
> This is what I get when I try to score the slider (apologies for the
> ugliness, I copy-paste everything since it might be meaningful to catch
> what is wrong):
> ---------------------------------------------------------------------------
> _RemoteTraceback                          Traceback (most recent call last)
> _RemoteTraceback:
> """
> Traceback (most recent call last):
>   File
> "/usr/local/anaconda3/lib/python3.7/site-packages/joblib/externals/loky/backend/queues.py",
> line 150, in _feed
>     obj_ = dumps(obj, reducers=reducers)
>   File
> "/usr/local/anaconda3/lib/python3.7/site-packages/joblib/externals/loky/backend/reduction.py",
> line 243, in dumps
>     dump(obj, buf, reducers=reducers, protocol=protocol)
>   File
> "/usr/local/anaconda3/lib/python3.7/site-packages/joblib/externals/loky/backend/reduction.py",
> line 236, in dump
>     _LokyPickler(file, reducers=reducers, protocol=protocol).dump(obj)
>   File
> "/usr/local/anaconda3/lib/python3.7/site-packages/joblib/externals/cloudpickle/cloudpickle.py",
> line 267, in dump
>     return Pickler.dump(self, obj)
>   File "/usr/local/anaconda3/lib/python3.7/pickle.py", line 437, in dump
>     self.save(obj)
>   File "/usr/local/anaconda3/lib/python3.7/pickle.py", line 549, in save
>     self.save_reduce(obj=obj, *rv)
>   File "/usr/local/anaconda3/lib/python3.7/pickle.py", line 662, in
> save_reduce
>     save(state)
>   File "/usr/local/anaconda3/lib/python3.7/pickle.py", line 504, in save
>     f(self, obj) # Call unbound method with explicit self
>   File "/usr/local/anaconda3/lib/python3.7/pickle.py", line 859, in
> save_dict
>     self._batch_setitems(obj.items())
>   File "/usr/local/anaconda3/lib/python3.7/pickle.py", line 885, in
> _batch_setitems
>     save(v)
>   File "/usr/local/anaconda3/lib/python3.7/pickle.py", line 549, in save
>     self.save_reduce(obj=obj, *rv)
>   File "/usr/local/anaconda3/lib/python3.7/pickle.py", line 662, in
> save_reduce
>     save(state)
>   File "/usr/local/anaconda3/lib/python3.7/pickle.py", line 504, in save
>     f(self, obj) # Call unbound method with explicit self
>   File "/usr/local/anaconda3/lib/python3.7/pickle.py", line 859, in
> save_dict
>     self._batch_setitems(obj.items())
>   File "/usr/local/anaconda3/lib/python3.7/pickle.py", line 890, in
> _batch_setitems
>     save(v)
>   File "/usr/local/anaconda3/lib/python3.7/pickle.py", line 549, in save
>     self.save_reduce(obj=obj, *rv)
>   File "/usr/local/anaconda3/lib/python3.7/pickle.py", line 662, in
> save_reduce
>     save(state)
>   File "/usr/local/anaconda3/lib/python3.7/pickle.py", line 504, in save
>     f(self, obj) # Call unbound method with explicit self
>   File "/usr/local/anaconda3/lib/python3.7/pickle.py", line 859, in
> save_dict
>     self._batch_setitems(obj.items())
>   File "/usr/local/anaconda3/lib/python3.7/pickle.py", line 885, in
> _batch_setitems
>     save(v)
>   File "/usr/local/anaconda3/lib/python3.7/pickle.py", line 504, in save
>     f(self, obj) # Call unbound method with explicit self
>   File "/usr/local/anaconda3/lib/python3.7/pickle.py", line 819, in
> save_list
>     self._batch_appends(obj)
>   File "/usr/local/anaconda3/lib/python3.7/pickle.py", line 846, in
> _batch_appends
>     save(tmp[0])
>   File "/usr/local/anaconda3/lib/python3.7/pickle.py", line 504, in save
>     f(self, obj) # Call unbound method with explicit self
>   File "/usr/local/anaconda3/lib/python3.7/pickle.py", line 774, in
> save_tuple
>     save(element)
>   File "/usr/local/anaconda3/lib/python3.7/pickle.py", line 504, in save
>     f(self, obj) # Call unbound method with explicit self
>   File "/usr/local/anaconda3/lib/python3.7/pickle.py", line 789, in
> save_tuple
>     save(element)
>   File "/usr/local/anaconda3/lib/python3.7/pickle.py", line 510, in save
>     rv = reduce(obj)
>   File
> "/usr/local/anaconda3/lib/python3.7/site-packages/joblib/_memmapping_reducer.py",
> line 361, in __call__
>     return (loads, (dumps(a, protocol=HIGHEST_PROTOCOL),))
> _pickle.PicklingError: Can't pickle <class '__main__.MyModel'>: it's not
> the same object as __main__.MyModel
> """
>
> The above exception was the direct cause of the following exception:
>
> PicklingError                             Traceback (most recent call last)
> /neurospin/grip/protocols/EEG/Giulia_NUM_MUSIK/NUM_MUSIK_DECODING_incremental_learning_test_on_VISUAL_DRAFT.py
> in <module>
>     278         y_test = test_epochs.events[:,2]
>     279
> --> 280         scores = time_gen.score(X_test, y_test)
>     281         all_scores_D.append(scores)
>     282
>
> <decorator-gen-375> in score(self, X, y)
>
> ~/.local/lib/python3.7/site-packages/mne/decoding/search_light.py in
> score(self, X, y)
>     583                              for pb_idx, x in array_split_idx(
>     584                                  X, n_jobs, axis=-1,
> --> 585
>  n_per_split=len(self.estimators_)))
>     586
>     587         score = np.concatenate(score, axis=1)
>
> ~/.local/lib/python3.7/site-packages/mne/parallel.py in run(*args,
> **kwargs)
>     126     def run(*args, **kwargs):
>     127         try:
> --> 128             return fun(*args, **kwargs)
>     129         except RuntimeError as err:
>     130             msg = str(err.args[0]) if err.args else ''
>
> /usr/local/anaconda3/lib/python3.7/site-packages/joblib/parallel.py in
> __call__(self, iterable)
>     932
>     933             with self._backend.retrieval_context():
> --> 934                 self.retrieve()
>     935             # Make sure that we get a last message telling us we
> are done
>     936             elapsed_time = time.time() - self._start_time
>
> /usr/local/anaconda3/lib/python3.7/site-packages/joblib/parallel.py in
> retrieve(self)
>     831             try:
>     832                 if getattr(self._backend, 'supports_timeout',
> False):
> --> 833
> self._output.extend(job.get(timeout=self.timeout))
>     834                 else:
>     835                     self._output.extend(job.get())
>
> /usr/local/anaconda3/lib/python3.7/site-packages/joblib/_parallel_backends.py
> in wrap_future_result(future, timeout)
>     519         AsyncResults.get from multiprocessing."""
>     520         try:
> --> 521             return future.result(timeout=timeout)
>     522         except LokyTimeoutError:
>     523             raise TimeoutError()
>
> /usr/local/anaconda3/lib/python3.7/concurrent/futures/_base.py in
> result(self, timeout)
>     433                 raise CancelledError()
>     434             elif self._state == FINISHED:
> --> 435                 return self.__get_result()
>     436             else:
>     437                 raise TimeoutError()
>
> /usr/local/anaconda3/lib/python3.7/concurrent/futures/_base.py in
> __get_result(self)
>     382     def __get_result(self):
>     383         if self._exception:
> --> 384             raise self._exception
>     385         else:
>     386             return self._result
>
> PicklingError: Could not pickle the task to send it to the workers.
>
> Would you know to solve it?
> Without parallel processing I don't think I can get to the end of the
> analysis before Christmas 😌
>
> Thank you very much again!!!!
>
> Giulia
>
> On Thu, Aug 6, 2020 at 4:12 PM Jean-Rémi KING <jeanremi.king at gmail.com>
> wrote:
>
>>         External Email - Use Caution
>>
>> Hi Giula,
>>
>> good catch, I had forgotten that we're cloning the estimator for each
>> time sample; you'll thus need to do this:
>>
>> class MyModel(SGDClassifier):
>>     def fit(self, X, y):
>>         super().partial_fit(X, y)
>>         return self
>>
>> model = MyModel(loss='log', class_weight='balanced')
>> slider = SlidingEstimator(model, scoring='roc_auc')
>>
>> Hope that helps
>>
>> JR
>>
>>
>> On Thu, 6 Aug 2020 at 15:56, Giulia Gennari <giulia.gennari1991 at gmail.com>
>> wrote:
>>
>>>         External Email - Use Caution
>>>
>>> Dear Jean-Rémi,
>>>
>>> Thank you for the nice suggestion!
>>>
>>> Just to make sure that this is working (I apologize for my ignorance):
>>>
>>> When I run:
>>> model = SGDClassifier(loss='log', class_weight='balanced')
>>> model.fit = model.partial_fit
>>> slider1 = SlidingEstimator(model, scoring='roc_auc')
>>> slider1.fit(X_train, y_train)
>>>
>>> or
>>>
>>> clf = make_pipeline(Vectorizer(), StandardScaler(), model)
>>> slider2 = SlidingEstimator(clf, scoring='roc_auc')
>>> slider2.fit(X_train, y_train)
>>>
>>> I do not get any error, while I would expect:
>>>
>>> ValueError: class_weight 'balanced' is not supported for partial_fit. In order to use 'balanced' weights, use compute_class_weight('balanced', classes, y). Pass the resulting weights as the class_weight parameter.
>>>
>>>
>>> Since this is what I get with:
>>> model.fit(X_train[:,:,single_time_point], y_train)
>>>
>>> Is there a good reason for that? E.g. class weights are computed
>>> internally beforehand by SlidingEstimator?
>>>
>>> Thank you again!
>>>
>>> Giulia
>>>
>>> On Wed, Aug 5, 2020 at 7:18 PM Jean-Rémi KING <jeanremi.king at gmail.com>
>>> wrote:
>>>
>>>>         External Email - Use Caution
>>>>
>>>> Hi Giulia,
>>>>
>>>> I think you should be able to change the method:
>>>>
>>>> model = sklearn.linear_model.SGDClassifier()
>>>> model.fit = model.partial_fit
>>>> slider = mne.decoding.SlidingEstimator(model)
>>>> for X, y in train_batches:
>>>>     slider.fit(X, y)
>>>>
>>>> Best
>>>>
>>>> JR
>>>>
>>>> On Wed, 5 Aug 2020 at 18:40, Giulia Gennari <
>>>> giulia.gennari1991 at gmail.com> wrote:
>>>>
>>>>>         External Email - Use Caution
>>>>>
>>>>> Hi!
>>>>>
>>>>> I would need to try decoding with incremental learning (EEG data).
>>>>> I was planning to use logistic regression by means of the
>>>>> SGDClassifier
>>>>> <https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.SGDClassifier.html>
>>>>>  .
>>>>> I would then need to call .partial_fit to make my estimator learn on
>>>>> each of my training sets.
>>>>> However:
>>>>>
>>>>> 'GeneralizingEstimator' object has no attribute 'partial_fit'
>>>>>
>>>>> Same issue for SlidingEstimator.
>>>>> Is there a way to work around this limitation?
>>>>>
>>>>> Thank you so so much in advance!
>>>>>
>>>>> Giulia Gennari
>>>>> _______________________________________________
>>>>> Mne_analysis mailing list
>>>>> Mne_analysis at nmr.mgh.harvard.edu
>>>>> https://mail.nmr.mgh.harvard.edu/mailman/listinfo/mne_analysis
>>>>
>>>> _______________________________________________
>>>> Mne_analysis mailing list
>>>> Mne_analysis at nmr.mgh.harvard.edu
>>>> https://mail.nmr.mgh.harvard.edu/mailman/listinfo/mne_analysis
>>>
>>> _______________________________________________
>>> Mne_analysis mailing list
>>> Mne_analysis at nmr.mgh.harvard.edu
>>> https://mail.nmr.mgh.harvard.edu/mailman/listinfo/mne_analysis
>>
>> _______________________________________________
>> Mne_analysis mailing list
>> Mne_analysis at nmr.mgh.harvard.edu
>> https://mail.nmr.mgh.harvard.edu/mailman/listinfo/mne_analysis
>
> _______________________________________________
> Mne_analysis mailing list
> Mne_analysis at nmr.mgh.harvard.edu
> https://mail.nmr.mgh.harvard.edu/mailman/listinfo/mne_analysis
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.nmr.mgh.harvard.edu/pipermail/mne_analysis/attachments/20200807/2b722b22/attachment-0001.html