[Mne_analysis] GeneralizingEstimator with incremental learning / .partial_fit
Giulia Gennari
giulia.gennari1991 at gmail.com
Thu Aug 20 07:19:34 EDT 2020
External Email - Use Caution
Dear Jean-Rémi,
Thank you for the suggestion and, above all, thank you so much for your
help and assistance!
My scripts have been working just fine while I would have never been able
to implement my current analysis without your prompts.
All the best,
Giulia
On Fri, Aug 7, 2020 at 3:10 PM Jean-Rémi KING <jeanremi.king at gmail.com>
wrote:
> External Email - Use Caution
>
> Hi Giula,
>
> In the long run, for batch optimization of parallel tasks (here each
> slided time sample), I would encourage you to have a look at pytorch;
> sklearn is not really optimal for this because it can't make use of gpu.
>
>
> In the meantime, here is a solution to your problem: simply put your new
> class in a separate script e.g.
>
> # in mymodel.py
> import numpy as np
> from sklearn.linear_model import SGDClassifier
>
> class MyModel(SGDClassifier):
> def fit(self, X, y):
> if not hasattr(self, 'classes_'):
> self.classes_ = np.unique(y)
> super().partial_fit(X, y, self.classes_)
> return self
>
> # main script
> import numpy as np
> from mne.decoding import SlidingEstimator
> from mymodel import MyModel
> model = MyModel()
> slider = SlidingEstimator(model, scoring='roc_auc', n_jobs=2)
>
> X = np.random.randn(100, 10, 3)
> y = np.random.randint(0, 2, 100)
> slider.fit(X, y)
> slider.score(X, y)
>
> hope that helps
>
> JR
>
>
> On Fri, 7 Aug 2020 at 14:34, Giulia Gennari <giulia.gennari1991 at gmail.com>
> wrote:
>
>> External Email - Use Caution
>>
>> Dear Jean-Rémi and dear Alex,
>>
>> *Thank you!*
>>
>> A solution based on this:
>> class MyModel(SGDClassifier):
>> def fit(self, X, y):
>> super().partial_fit(X, y)
>> return self
>>
>> ..works fine!
>> Except for the crucial fact that parallel processing (n_jobs>1) seems not
>> feasible.
>> This is what I get when I try to score the slider (apologies for the
>> ugliness, I copy-paste everything since it might be meaningful to catch
>> what is wrong):
>>
>> ---------------------------------------------------------------------------
>> _RemoteTraceback Traceback (most recent call
>> last)
>> _RemoteTraceback:
>> """
>> Traceback (most recent call last):
>> File
>> "/usr/local/anaconda3/lib/python3.7/site-packages/joblib/externals/loky/backend/queues.py",
>> line 150, in _feed
>> obj_ = dumps(obj, reducers=reducers)
>> File
>> "/usr/local/anaconda3/lib/python3.7/site-packages/joblib/externals/loky/backend/reduction.py",
>> line 243, in dumps
>> dump(obj, buf, reducers=reducers, protocol=protocol)
>> File
>> "/usr/local/anaconda3/lib/python3.7/site-packages/joblib/externals/loky/backend/reduction.py",
>> line 236, in dump
>> _LokyPickler(file, reducers=reducers, protocol=protocol).dump(obj)
>> File
>> "/usr/local/anaconda3/lib/python3.7/site-packages/joblib/externals/cloudpickle/cloudpickle.py",
>> line 267, in dump
>> return Pickler.dump(self, obj)
>> File "/usr/local/anaconda3/lib/python3.7/pickle.py", line 437, in dump
>> self.save(obj)
>> File "/usr/local/anaconda3/lib/python3.7/pickle.py", line 549, in save
>> self.save_reduce(obj=obj, *rv)
>> File "/usr/local/anaconda3/lib/python3.7/pickle.py", line 662, in
>> save_reduce
>> save(state)
>> File "/usr/local/anaconda3/lib/python3.7/pickle.py", line 504, in save
>> f(self, obj) # Call unbound method with explicit self
>> File "/usr/local/anaconda3/lib/python3.7/pickle.py", line 859, in
>> save_dict
>> self._batch_setitems(obj.items())
>> File "/usr/local/anaconda3/lib/python3.7/pickle.py", line 885, in
>> _batch_setitems
>> save(v)
>> File "/usr/local/anaconda3/lib/python3.7/pickle.py", line 549, in save
>> self.save_reduce(obj=obj, *rv)
>> File "/usr/local/anaconda3/lib/python3.7/pickle.py", line 662, in
>> save_reduce
>> save(state)
>> File "/usr/local/anaconda3/lib/python3.7/pickle.py", line 504, in save
>> f(self, obj) # Call unbound method with explicit self
>> File "/usr/local/anaconda3/lib/python3.7/pickle.py", line 859, in
>> save_dict
>> self._batch_setitems(obj.items())
>> File "/usr/local/anaconda3/lib/python3.7/pickle.py", line 890, in
>> _batch_setitems
>> save(v)
>> File "/usr/local/anaconda3/lib/python3.7/pickle.py", line 549, in save
>> self.save_reduce(obj=obj, *rv)
>> File "/usr/local/anaconda3/lib/python3.7/pickle.py", line 662, in
>> save_reduce
>> save(state)
>> File "/usr/local/anaconda3/lib/python3.7/pickle.py", line 504, in save
>> f(self, obj) # Call unbound method with explicit self
>> File "/usr/local/anaconda3/lib/python3.7/pickle.py", line 859, in
>> save_dict
>> self._batch_setitems(obj.items())
>> File "/usr/local/anaconda3/lib/python3.7/pickle.py", line 885, in
>> _batch_setitems
>> save(v)
>> File "/usr/local/anaconda3/lib/python3.7/pickle.py", line 504, in save
>> f(self, obj) # Call unbound method with explicit self
>> File "/usr/local/anaconda3/lib/python3.7/pickle.py", line 819, in
>> save_list
>> self._batch_appends(obj)
>> File "/usr/local/anaconda3/lib/python3.7/pickle.py", line 846, in
>> _batch_appends
>> save(tmp[0])
>> File "/usr/local/anaconda3/lib/python3.7/pickle.py", line 504, in save
>> f(self, obj) # Call unbound method with explicit self
>> File "/usr/local/anaconda3/lib/python3.7/pickle.py", line 774, in
>> save_tuple
>> save(element)
>> File "/usr/local/anaconda3/lib/python3.7/pickle.py", line 504, in save
>> f(self, obj) # Call unbound method with explicit self
>> File "/usr/local/anaconda3/lib/python3.7/pickle.py", line 789, in
>> save_tuple
>> save(element)
>> File "/usr/local/anaconda3/lib/python3.7/pickle.py", line 510, in save
>> rv = reduce(obj)
>> File
>> "/usr/local/anaconda3/lib/python3.7/site-packages/joblib/_memmapping_reducer.py",
>> line 361, in __call__
>> return (loads, (dumps(a, protocol=HIGHEST_PROTOCOL),))
>> _pickle.PicklingError: Can't pickle <class '__main__.MyModel'>: it's not
>> the same object as __main__.MyModel
>> """
>>
>> The above exception was the direct cause of the following exception:
>>
>> PicklingError Traceback (most recent call
>> last)
>> /neurospin/grip/protocols/EEG/Giulia_NUM_MUSIK/NUM_MUSIK_DECODING_incremental_learning_test_on_VISUAL_DRAFT.py
>> in <module>
>> 278 y_test = test_epochs.events[:,2]
>> 279
>> --> 280 scores = time_gen.score(X_test, y_test)
>> 281 all_scores_D.append(scores)
>> 282
>>
>> <decorator-gen-375> in score(self, X, y)
>>
>> ~/.local/lib/python3.7/site-packages/mne/decoding/search_light.py in
>> score(self, X, y)
>> 583 for pb_idx, x in array_split_idx(
>> 584 X, n_jobs, axis=-1,
>> --> 585
>> n_per_split=len(self.estimators_)))
>> 586
>> 587 score = np.concatenate(score, axis=1)
>>
>> ~/.local/lib/python3.7/site-packages/mne/parallel.py in run(*args,
>> **kwargs)
>> 126 def run(*args, **kwargs):
>> 127 try:
>> --> 128 return fun(*args, **kwargs)
>> 129 except RuntimeError as err:
>> 130 msg = str(err.args[0]) if err.args else ''
>>
>> /usr/local/anaconda3/lib/python3.7/site-packages/joblib/parallel.py in
>> __call__(self, iterable)
>> 932
>> 933 with self._backend.retrieval_context():
>> --> 934 self.retrieve()
>> 935 # Make sure that we get a last message telling us we
>> are done
>> 936 elapsed_time = time.time() - self._start_time
>>
>> /usr/local/anaconda3/lib/python3.7/site-packages/joblib/parallel.py in
>> retrieve(self)
>> 831 try:
>> 832 if getattr(self._backend, 'supports_timeout',
>> False):
>> --> 833
>> self._output.extend(job.get(timeout=self.timeout))
>> 834 else:
>> 835 self._output.extend(job.get())
>>
>> /usr/local/anaconda3/lib/python3.7/site-packages/joblib/_parallel_backends.py
>> in wrap_future_result(future, timeout)
>> 519 AsyncResults.get from multiprocessing."""
>> 520 try:
>> --> 521 return future.result(timeout=timeout)
>> 522 except LokyTimeoutError:
>> 523 raise TimeoutError()
>>
>> /usr/local/anaconda3/lib/python3.7/concurrent/futures/_base.py in
>> result(self, timeout)
>> 433 raise CancelledError()
>> 434 elif self._state == FINISHED:
>> --> 435 return self.__get_result()
>> 436 else:
>> 437 raise TimeoutError()
>>
>> /usr/local/anaconda3/lib/python3.7/concurrent/futures/_base.py in
>> __get_result(self)
>> 382 def __get_result(self):
>> 383 if self._exception:
>> --> 384 raise self._exception
>> 385 else:
>> 386 return self._result
>>
>> PicklingError: Could not pickle the task to send it to the workers.
>>
>> Would you know to solve it?
>> Without parallel processing I don't think I can get to the end of the
>> analysis before Christmas 😌
>>
>> Thank you very much again!!!!
>>
>> Giulia
>>
>> On Thu, Aug 6, 2020 at 4:12 PM Jean-Rémi KING <jeanremi.king at gmail.com>
>> wrote:
>>
>>> External Email - Use Caution
>>>
>>> Hi Giula,
>>>
>>> good catch, I had forgotten that we're cloning the estimator for each
>>> time sample; you'll thus need to do this:
>>>
>>> class MyModel(SGDClassifier):
>>> def fit(self, X, y):
>>> super().partial_fit(X, y)
>>> return self
>>>
>>> model = MyModel(loss='log', class_weight='balanced')
>>> slider = SlidingEstimator(model, scoring='roc_auc')
>>>
>>> Hope that helps
>>>
>>> JR
>>>
>>>
>>> On Thu, 6 Aug 2020 at 15:56, Giulia Gennari <
>>> giulia.gennari1991 at gmail.com> wrote:
>>>
>>>> External Email - Use Caution
>>>>
>>>> Dear Jean-Rémi,
>>>>
>>>> Thank you for the nice suggestion!
>>>>
>>>> Just to make sure that this is working (I apologize for my ignorance):
>>>>
>>>> When I run:
>>>> model = SGDClassifier(loss='log', class_weight='balanced')
>>>> model.fit = model.partial_fit
>>>> slider1 = SlidingEstimator(model, scoring='roc_auc')
>>>> slider1.fit(X_train, y_train)
>>>>
>>>> or
>>>>
>>>> clf = make_pipeline(Vectorizer(), StandardScaler(), model)
>>>> slider2 = SlidingEstimator(clf, scoring='roc_auc')
>>>> slider2.fit(X_train, y_train)
>>>>
>>>> I do not get any error, while I would expect:
>>>>
>>>> ValueError: class_weight 'balanced' is not supported for partial_fit. In order to use 'balanced' weights, use compute_class_weight('balanced', classes, y). Pass the resulting weights as the class_weight parameter.
>>>>
>>>>
>>>> Since this is what I get with:
>>>> model.fit(X_train[:,:,single_time_point], y_train)
>>>>
>>>> Is there a good reason for that? E.g. class weights are computed
>>>> internally beforehand by SlidingEstimator?
>>>>
>>>> Thank you again!
>>>>
>>>> Giulia
>>>>
>>>> On Wed, Aug 5, 2020 at 7:18 PM Jean-Rémi KING <jeanremi.king at gmail.com>
>>>> wrote:
>>>>
>>>>> External Email - Use Caution
>>>>>
>>>>> Hi Giulia,
>>>>>
>>>>> I think you should be able to change the method:
>>>>>
>>>>> model = sklearn.linear_model.SGDClassifier()
>>>>> model.fit = model.partial_fit
>>>>> slider = mne.decoding.SlidingEstimator(model)
>>>>> for X, y in train_batches:
>>>>> slider.fit(X, y)
>>>>>
>>>>> Best
>>>>>
>>>>> JR
>>>>>
>>>>> On Wed, 5 Aug 2020 at 18:40, Giulia Gennari <
>>>>> giulia.gennari1991 at gmail.com> wrote:
>>>>>
>>>>>> External Email - Use Caution
>>>>>>
>>>>>> Hi!
>>>>>>
>>>>>> I would need to try decoding with incremental learning (EEG data).
>>>>>> I was planning to use logistic regression by means of the
>>>>>> SGDClassifier
>>>>>> <https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.SGDClassifier.html>
>>>>>> .
>>>>>> I would then need to call .partial_fit to make my estimator learn on
>>>>>> each of my training sets.
>>>>>> However:
>>>>>>
>>>>>> 'GeneralizingEstimator' object has no attribute 'partial_fit'
>>>>>>
>>>>>> Same issue for SlidingEstimator.
>>>>>> Is there a way to work around this limitation?
>>>>>>
>>>>>> Thank you so so much in advance!
>>>>>>
>>>>>> Giulia Gennari
>>>>>> _______________________________________________
>>>>>> Mne_analysis mailing list
>>>>>> Mne_analysis at nmr.mgh.harvard.edu
>>>>>> https://mail.nmr.mgh.harvard.edu/mailman/listinfo/mne_analysis
>>>>>
>>>>> _______________________________________________
>>>>> Mne_analysis mailing list
>>>>> Mne_analysis at nmr.mgh.harvard.edu
>>>>> https://mail.nmr.mgh.harvard.edu/mailman/listinfo/mne_analysis
>>>>
>>>> _______________________________________________
>>>> Mne_analysis mailing list
>>>> Mne_analysis at nmr.mgh.harvard.edu
>>>> https://mail.nmr.mgh.harvard.edu/mailman/listinfo/mne_analysis
>>>
>>> _______________________________________________
>>> Mne_analysis mailing list
>>> Mne_analysis at nmr.mgh.harvard.edu
>>> https://mail.nmr.mgh.harvard.edu/mailman/listinfo/mne_analysis
>>
>> _______________________________________________
>> Mne_analysis mailing list
>> Mne_analysis at nmr.mgh.harvard.edu
>> https://mail.nmr.mgh.harvard.edu/mailman/listinfo/mne_analysis
>
> _______________________________________________
> Mne_analysis mailing list
> Mne_analysis at nmr.mgh.harvard.edu
> https://mail.nmr.mgh.harvard.edu/mailman/listinfo/mne_analysis
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.nmr.mgh.harvard.edu/pipermail/mne_analysis/attachments/20200820/ef24b66f/attachment-0001.html
More information about the Mne_analysis
mailing list