[Mne_analysis] GeneralizingEstimator with incremental learning / .partial_fit
Jean-Rémi KING
jeanremi.king at gmail.com
Fri Aug 7 08:57:03 EDT 2020
External Email - Use Caution
Hi Giula,
In the long run, for batch optimization of parallel tasks (here each slided
time sample), I would encourage you to have a look at pytorch; sklearn is
not really optimal for this because it can't make use of gpu.
In the meantime, here is a solution to your problem: simply put your new
class in a separate script e.g.
# in mymodel.py
import numpy as np
from sklearn.linear_model import SGDClassifier
class MyModel(SGDClassifier):
def fit(self, X, y):
if not hasattr(self, 'classes_'):
self.classes_ = np.unique(y)
super().partial_fit(X, y, self.classes_)
return self
# main script
import numpy as np
from mne.decoding import SlidingEstimator
from mymodel import MyModel
model = MyModel()
slider = SlidingEstimator(model, scoring='roc_auc', n_jobs=2)
X = np.random.randn(100, 10, 3)
y = np.random.randint(0, 2, 100)
slider.fit(X, y)
slider.score(X, y)
hope that helps
JR
On Fri, 7 Aug 2020 at 14:34, Giulia Gennari <giulia.gennari1991 at gmail.com>
wrote:
> External Email - Use Caution
>
> Dear Jean-Rémi and dear Alex,
>
> *Thank you!*
>
> A solution based on this:
> class MyModel(SGDClassifier):
> def fit(self, X, y):
> super().partial_fit(X, y)
> return self
>
> ..works fine!
> Except for the crucial fact that parallel processing (n_jobs>1) seems not
> feasible.
> This is what I get when I try to score the slider (apologies for the
> ugliness, I copy-paste everything since it might be meaningful to catch
> what is wrong):
> ---------------------------------------------------------------------------
> _RemoteTraceback Traceback (most recent call last)
> _RemoteTraceback:
> """
> Traceback (most recent call last):
> File
> "/usr/local/anaconda3/lib/python3.7/site-packages/joblib/externals/loky/backend/queues.py",
> line 150, in _feed
> obj_ = dumps(obj, reducers=reducers)
> File
> "/usr/local/anaconda3/lib/python3.7/site-packages/joblib/externals/loky/backend/reduction.py",
> line 243, in dumps
> dump(obj, buf, reducers=reducers, protocol=protocol)
> File
> "/usr/local/anaconda3/lib/python3.7/site-packages/joblib/externals/loky/backend/reduction.py",
> line 236, in dump
> _LokyPickler(file, reducers=reducers, protocol=protocol).dump(obj)
> File
> "/usr/local/anaconda3/lib/python3.7/site-packages/joblib/externals/cloudpickle/cloudpickle.py",
> line 267, in dump
> return Pickler.dump(self, obj)
> File "/usr/local/anaconda3/lib/python3.7/pickle.py", line 437, in dump
> self.save(obj)
> File "/usr/local/anaconda3/lib/python3.7/pickle.py", line 549, in save
> self.save_reduce(obj=obj, *rv)
> File "/usr/local/anaconda3/lib/python3.7/pickle.py", line 662, in
> save_reduce
> save(state)
> File "/usr/local/anaconda3/lib/python3.7/pickle.py", line 504, in save
> f(self, obj) # Call unbound method with explicit self
> File "/usr/local/anaconda3/lib/python3.7/pickle.py", line 859, in
> save_dict
> self._batch_setitems(obj.items())
> File "/usr/local/anaconda3/lib/python3.7/pickle.py", line 885, in
> _batch_setitems
> save(v)
> File "/usr/local/anaconda3/lib/python3.7/pickle.py", line 549, in save
> self.save_reduce(obj=obj, *rv)
> File "/usr/local/anaconda3/lib/python3.7/pickle.py", line 662, in
> save_reduce
> save(state)
> File "/usr/local/anaconda3/lib/python3.7/pickle.py", line 504, in save
> f(self, obj) # Call unbound method with explicit self
> File "/usr/local/anaconda3/lib/python3.7/pickle.py", line 859, in
> save_dict
> self._batch_setitems(obj.items())
> File "/usr/local/anaconda3/lib/python3.7/pickle.py", line 890, in
> _batch_setitems
> save(v)
> File "/usr/local/anaconda3/lib/python3.7/pickle.py", line 549, in save
> self.save_reduce(obj=obj, *rv)
> File "/usr/local/anaconda3/lib/python3.7/pickle.py", line 662, in
> save_reduce
> save(state)
> File "/usr/local/anaconda3/lib/python3.7/pickle.py", line 504, in save
> f(self, obj) # Call unbound method with explicit self
> File "/usr/local/anaconda3/lib/python3.7/pickle.py", line 859, in
> save_dict
> self._batch_setitems(obj.items())
> File "/usr/local/anaconda3/lib/python3.7/pickle.py", line 885, in
> _batch_setitems
> save(v)
> File "/usr/local/anaconda3/lib/python3.7/pickle.py", line 504, in save
> f(self, obj) # Call unbound method with explicit self
> File "/usr/local/anaconda3/lib/python3.7/pickle.py", line 819, in
> save_list
> self._batch_appends(obj)
> File "/usr/local/anaconda3/lib/python3.7/pickle.py", line 846, in
> _batch_appends
> save(tmp[0])
> File "/usr/local/anaconda3/lib/python3.7/pickle.py", line 504, in save
> f(self, obj) # Call unbound method with explicit self
> File "/usr/local/anaconda3/lib/python3.7/pickle.py", line 774, in
> save_tuple
> save(element)
> File "/usr/local/anaconda3/lib/python3.7/pickle.py", line 504, in save
> f(self, obj) # Call unbound method with explicit self
> File "/usr/local/anaconda3/lib/python3.7/pickle.py", line 789, in
> save_tuple
> save(element)
> File "/usr/local/anaconda3/lib/python3.7/pickle.py", line 510, in save
> rv = reduce(obj)
> File
> "/usr/local/anaconda3/lib/python3.7/site-packages/joblib/_memmapping_reducer.py",
> line 361, in __call__
> return (loads, (dumps(a, protocol=HIGHEST_PROTOCOL),))
> _pickle.PicklingError: Can't pickle <class '__main__.MyModel'>: it's not
> the same object as __main__.MyModel
> """
>
> The above exception was the direct cause of the following exception:
>
> PicklingError Traceback (most recent call last)
> /neurospin/grip/protocols/EEG/Giulia_NUM_MUSIK/NUM_MUSIK_DECODING_incremental_learning_test_on_VISUAL_DRAFT.py
> in <module>
> 278 y_test = test_epochs.events[:,2]
> 279
> --> 280 scores = time_gen.score(X_test, y_test)
> 281 all_scores_D.append(scores)
> 282
>
> <decorator-gen-375> in score(self, X, y)
>
> ~/.local/lib/python3.7/site-packages/mne/decoding/search_light.py in
> score(self, X, y)
> 583 for pb_idx, x in array_split_idx(
> 584 X, n_jobs, axis=-1,
> --> 585
> n_per_split=len(self.estimators_)))
> 586
> 587 score = np.concatenate(score, axis=1)
>
> ~/.local/lib/python3.7/site-packages/mne/parallel.py in run(*args,
> **kwargs)
> 126 def run(*args, **kwargs):
> 127 try:
> --> 128 return fun(*args, **kwargs)
> 129 except RuntimeError as err:
> 130 msg = str(err.args[0]) if err.args else ''
>
> /usr/local/anaconda3/lib/python3.7/site-packages/joblib/parallel.py in
> __call__(self, iterable)
> 932
> 933 with self._backend.retrieval_context():
> --> 934 self.retrieve()
> 935 # Make sure that we get a last message telling us we
> are done
> 936 elapsed_time = time.time() - self._start_time
>
> /usr/local/anaconda3/lib/python3.7/site-packages/joblib/parallel.py in
> retrieve(self)
> 831 try:
> 832 if getattr(self._backend, 'supports_timeout',
> False):
> --> 833
> self._output.extend(job.get(timeout=self.timeout))
> 834 else:
> 835 self._output.extend(job.get())
>
> /usr/local/anaconda3/lib/python3.7/site-packages/joblib/_parallel_backends.py
> in wrap_future_result(future, timeout)
> 519 AsyncResults.get from multiprocessing."""
> 520 try:
> --> 521 return future.result(timeout=timeout)
> 522 except LokyTimeoutError:
> 523 raise TimeoutError()
>
> /usr/local/anaconda3/lib/python3.7/concurrent/futures/_base.py in
> result(self, timeout)
> 433 raise CancelledError()
> 434 elif self._state == FINISHED:
> --> 435 return self.__get_result()
> 436 else:
> 437 raise TimeoutError()
>
> /usr/local/anaconda3/lib/python3.7/concurrent/futures/_base.py in
> __get_result(self)
> 382 def __get_result(self):
> 383 if self._exception:
> --> 384 raise self._exception
> 385 else:
> 386 return self._result
>
> PicklingError: Could not pickle the task to send it to the workers.
>
> Would you know to solve it?
> Without parallel processing I don't think I can get to the end of the
> analysis before Christmas 😌
>
> Thank you very much again!!!!
>
> Giulia
>
> On Thu, Aug 6, 2020 at 4:12 PM Jean-Rémi KING <jeanremi.king at gmail.com>
> wrote:
>
>> External Email - Use Caution
>>
>> Hi Giula,
>>
>> good catch, I had forgotten that we're cloning the estimator for each
>> time sample; you'll thus need to do this:
>>
>> class MyModel(SGDClassifier):
>> def fit(self, X, y):
>> super().partial_fit(X, y)
>> return self
>>
>> model = MyModel(loss='log', class_weight='balanced')
>> slider = SlidingEstimator(model, scoring='roc_auc')
>>
>> Hope that helps
>>
>> JR
>>
>>
>> On Thu, 6 Aug 2020 at 15:56, Giulia Gennari <giulia.gennari1991 at gmail.com>
>> wrote:
>>
>>> External Email - Use Caution
>>>
>>> Dear Jean-Rémi,
>>>
>>> Thank you for the nice suggestion!
>>>
>>> Just to make sure that this is working (I apologize for my ignorance):
>>>
>>> When I run:
>>> model = SGDClassifier(loss='log', class_weight='balanced')
>>> model.fit = model.partial_fit
>>> slider1 = SlidingEstimator(model, scoring='roc_auc')
>>> slider1.fit(X_train, y_train)
>>>
>>> or
>>>
>>> clf = make_pipeline(Vectorizer(), StandardScaler(), model)
>>> slider2 = SlidingEstimator(clf, scoring='roc_auc')
>>> slider2.fit(X_train, y_train)
>>>
>>> I do not get any error, while I would expect:
>>>
>>> ValueError: class_weight 'balanced' is not supported for partial_fit. In order to use 'balanced' weights, use compute_class_weight('balanced', classes, y). Pass the resulting weights as the class_weight parameter.
>>>
>>>
>>> Since this is what I get with:
>>> model.fit(X_train[:,:,single_time_point], y_train)
>>>
>>> Is there a good reason for that? E.g. class weights are computed
>>> internally beforehand by SlidingEstimator?
>>>
>>> Thank you again!
>>>
>>> Giulia
>>>
>>> On Wed, Aug 5, 2020 at 7:18 PM Jean-Rémi KING <jeanremi.king at gmail.com>
>>> wrote:
>>>
>>>> External Email - Use Caution
>>>>
>>>> Hi Giulia,
>>>>
>>>> I think you should be able to change the method:
>>>>
>>>> model = sklearn.linear_model.SGDClassifier()
>>>> model.fit = model.partial_fit
>>>> slider = mne.decoding.SlidingEstimator(model)
>>>> for X, y in train_batches:
>>>> slider.fit(X, y)
>>>>
>>>> Best
>>>>
>>>> JR
>>>>
>>>> On Wed, 5 Aug 2020 at 18:40, Giulia Gennari <
>>>> giulia.gennari1991 at gmail.com> wrote:
>>>>
>>>>> External Email - Use Caution
>>>>>
>>>>> Hi!
>>>>>
>>>>> I would need to try decoding with incremental learning (EEG data).
>>>>> I was planning to use logistic regression by means of the
>>>>> SGDClassifier
>>>>> <https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.SGDClassifier.html>
>>>>> .
>>>>> I would then need to call .partial_fit to make my estimator learn on
>>>>> each of my training sets.
>>>>> However:
>>>>>
>>>>> 'GeneralizingEstimator' object has no attribute 'partial_fit'
>>>>>
>>>>> Same issue for SlidingEstimator.
>>>>> Is there a way to work around this limitation?
>>>>>
>>>>> Thank you so so much in advance!
>>>>>
>>>>> Giulia Gennari
>>>>> _______________________________________________
>>>>> Mne_analysis mailing list
>>>>> Mne_analysis at nmr.mgh.harvard.edu
>>>>> https://mail.nmr.mgh.harvard.edu/mailman/listinfo/mne_analysis
>>>>
>>>> _______________________________________________
>>>> Mne_analysis mailing list
>>>> Mne_analysis at nmr.mgh.harvard.edu
>>>> https://mail.nmr.mgh.harvard.edu/mailman/listinfo/mne_analysis
>>>
>>> _______________________________________________
>>> Mne_analysis mailing list
>>> Mne_analysis at nmr.mgh.harvard.edu
>>> https://mail.nmr.mgh.harvard.edu/mailman/listinfo/mne_analysis
>>
>> _______________________________________________
>> Mne_analysis mailing list
>> Mne_analysis at nmr.mgh.harvard.edu
>> https://mail.nmr.mgh.harvard.edu/mailman/listinfo/mne_analysis
>
> _______________________________________________
> Mne_analysis mailing list
> Mne_analysis at nmr.mgh.harvard.edu
> https://mail.nmr.mgh.harvard.edu/mailman/listinfo/mne_analysis
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.nmr.mgh.harvard.edu/pipermail/mne_analysis/attachments/20200807/2b722b22/attachment-0001.html
More information about the Mne_analysis
mailing list