[Mne_analysis] capping number of trials per condition in -gave.fif using mne_process_raw

Mon Jul 23 14:11:46 EDT 2012

Hi Eric,

Thanks for the additional details. The code I have in place handles 
rejected trials by reading off the 'omit' markers from the log file and 
preventing those trials from being included in the 'capped' event files, 
like you suggest.

If anyone else is looking to implement something similar, I've attached 
a sample m file, although I can't make any promises about efficiency or 
readability.

Thanks,

Matt

On 7/23/12 1:15 PM, Eric Larson wrote:
> Since you're setting up the infrastructure anyway, what we do in
> practice is also run mne_process_raw using all the events first, then
> parse the resulting log files to determine which trials were omitted
> (e.g., due to excessive channel activations or flat channels). You
> could also in theory do this by pulling the raw data into MATLAB and
> checking for trials that will be omitted from there. Then we use that
> information to pare down the list of possible events, and equalize
> those trial counts. We do this because if you just equalize original
> trial counts, different numbers of trials from each condition could be
> disqualified, resulting in final trial count differences. This is
> probably excessive for now, but you could to consider it (or a
> similar/better approach) for dealing with discarded trials at some
> point in the future.
>
> Cheers,
> Eric
>
>
> On Mon, Jul 23, 2012 at 9:25 AM, Matthew Panichello
> <panichem at nmr.mgh.harvard.edu>  wrote:
>> Hi Eric,
>>
>> Thanks for the information. I am interested in capping the trial count
>> because I had heard someone mention that it can bias the estimates, so it's
>> good to hear from another source that this is worth spending some time on.
>> Sampling evenly across the recording session also makes sense and shouldn't
>> be too hard to do.
>>
>> I was hoping that there might be a convenience function in MNE that could be
>> used to cap the trial count, but if selecting the events using matlab is the
>> best option it should be possible to work around the empty event files, even
>> if it's not elegant...
>>
>> Thanks again,
>>
>> Matt
>>
>>
>>
>>
>> On 7/23/12 11:50 AM, Eric Larson wrote:
>>> Hey Matt,
>>>
>>> Here's what we do when dealing with trial counts. First, we equalize
>>> trial counts across conditions to avoid bias. We do this because we
>>> noticed that when using the absolute value of MNE/dSPM estimates to
>>> perform paired comparisons (condition A versus condition B) across
>>> vertices and time points, using the absolute value produces a bias
>>> toward more activation in the lower-trial-count condition. For
>>> example, imagine if noise were normally distributed for the MNE or
>>> dSPM score (with mean equal to the true activation level), then the
>>> magnitude of those estimates will yield from a folded normal
>>> distribution, where the mean (which we want to think of as just the
>>> magnitude of the real activation) is now influenced by the standard
>>> deviation of the original noise. This is a problem because that
>>> effective standard deviation depends in part on the trial count used
>>> to calculate the mean.
>>>
>>> In any case, to equalize the trial counts, we subsample our data and
>>> underpower our analysis by taking trials that are equally spaced
>>> throughout the runs. This is done because in our data, the
>>> lower-trial-count conditions typically are spread throughout the
>>> recording session, as well. By undersampling evenly across the
>>> recording session, we're trying to maximize how well we match the
>>> noise characteristics under all conditions. This becomes a tad bit
>>> more difficult to do correctly because MNE rejecting trials under
>>> certain criteria also removes trials from analysis, but it can be
>>> done.
>>>
>>> But, either way it's done---with our spread undersampling or your
>>> using the first N trials---you're eventually going to hit the problem
>>> you're dealing with, where there are no trials to average over for
>>> particular runs. The solution we use is to check to see if a run is
>>> used, and if it's not, remove its parameters from the call to
>>> mne_process_raw. Since you're coding MATLAB anyway, hopefully this
>>> isn't too big a pain.
>>>
>>> Cheers,
>>> Eric
>>>
>>>
>>> On Mon, Jul 23, 2012 at 7:49 AM, Matti Hamalainen
>>> <msh at nmr.mgh.harvard.edu>   wrote:
>>>> On Jul 23, 2012, at 5:05 PM, Matthew Panichello wrote:
>>>>
>>>> Hello,
>>>>
>>>> I'm using mne_process_raw to create grandaveraged data for each
>>>> individual
>>>> subject. Each subject's data were collected over five runs and so are
>>>> saved
>>>> in 5 raw fiff files.
>>>>
>>>> The task included 5 different trial types presented in a mixed-event
>>>> design.
>>>> I'd like to set a cap on the number of trials per condition, such that
>>>> even
>>>> if a subject completed 90 trials of Condition #1 over the 5 runs, for
>>>> example, only the first 75 will be saved in his grandaverage file.
>>>>
>>>> I tried to implement this by writing a matlab script that reads the event
>>>> files for each run and removes events from each condition until the cap
>>>> is
>>>> met. But, for some subjects, this results in a completely empty event
>>>> file
>>>> for run 5 (the script begins with the last run), causing mne_process_raw
>>>> to
>>>> exit when I actually try to produce the average.
>>>>
>>>> Is anyone aware of a simpler way to control the number of trials used to
>>>> compute the grandaverage using mne_process_raw?
>>>>
>>>> Also, while I was working on this, I noticed the following window output
>>>> by
>>>> process_raw:
>>>> "filter: .5 .... 120 Hz  bins : 6 .... 1636 of 4097 hpw : 3 lpw : 34
>>>> Highpass filter will work as specified
>>>> filter: 0 .... 40 Hz  bins : 0 .... 545 of 4097 hpw : 3 lpw : 34
>>>> Highpass filter will work as specified"
>>>>
>>>> I specify a 120 Hz lp filter and a .5 Hz hp filter in my input to
>>>> process_raw, so I was surprised to see what seems to be an additional 40
>>>> Hz
>>>> lp filter being applied. Can anyone comment on why this might be?
>>>>
>>>>
>>>> Hi Matt,
>>>>
>>>> You are using the nightly build which has a separate filter setting for
>>>> EOG.
>>>> mne_process_raw --help reveals:
>>>>
>>>>           --eoghighpass val/Hz      EOG highpass corner (default =    0.0
>>>> Hz)
>>>>           --eoglowpass  val/Hz      EOG lowpass  corner (default =   40.0
>>>> Hz)
>>>>           --eoghighpassw val/Hz     EOG highpass transition width (default
>>>> =
>>>> 0.0 Hz)
>>>>           --eoglowpassw val/Hz      EOG lowpass transition width (default
>>>> =
>>>> 5.0 Hz)
>>>>
>>>> In your particular example, the MEG/EEG data are filtered from 0.5 to 120
>>>> Hz
>>>> according to your specification but the default lowpass (40 Hz) is used
>>>> for
>>>> EOG.
>>>>
>>>> - Matti
>>>>
>>>>
>>>>
>>>>
>>>> ---------
>>>>
>>>> Matti Hamalainen, Ph.D.
>>>> Athinoula A. Martinos Center for Biomedical Imaging
>>>> Massachusetts General Hospital
>>>>
>>>> msh at nmr.mgh.harvard.edu
>>>> mhamalainen at partners.org
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Mne_analysis mailing list
>>>> Mne_analysis at nmr.mgh.harvard.edu
>>>> https://mail.nmr.mgh.harvard.edu/mailman/listinfo/mne_analysis
>>>>
>>>>
>>>> The information in this e-mail is intended only for the person to whom it
>>>> is
>>>> addressed. If you believe this e-mail was sent to you in error and the
>>>> e-mail
>>>> contains patient information, please contact the Partners Compliance
>>>> HelpLine at
>>>> http://www.partners.org/complianceline . If the e-mail was sent to you in
>>>> error
>>>> but does not contain patient information, please contact the sender and
>>>> properly
>>>> dispose of the e-mail.
>>>>
>>
>> --
>> Matthew Panichello
>> Research Coordinator, Bar Group
>> Massachusetts General Hospital
>> Phone: 617-726-9034
>>
>

-- 
Matthew Panichello
Research Coordinator, Bar Group
Massachusetts General Hospital
Phone: 617-726-9034

-------------- next part --------------
%Matt Panichello 7/22/12
%
%This script takes as inputes a .eve file and the associated .log file
%created post (preliminary) averaging. The script will grab a max of a certain number of events
%(the cap) from the original .eve file, so long as they are not marked as
%bad trials in the .log file and outputs a new set of .eve files for averaging. 
%
%
% 
clear;
%%%%% Set these priors %%%%%
eve_root    = 'pm_ROI';                    %identifier for event files
log_end     = 'clean_lp_pm_ROI-ave.log';   %identifier for log files
output_root = 'pm_ROIcap74';               %tag for output event files

cap = 74;

%%%%%%%%%%%%%%%%%%%%%%%%%%%%

root = 'path/to/MEG/data';
cd(root)

toto = dir('S*'); %common root of all subjects e.g. S* for S001,S002.. toto is a junk variable
subjects = {toto};

for i_subject = 1:length(subjects);
    subject = subjects{i_subject};

    toto = dir([subject filesep eve_root '_PM0*.eve']);
    event_files = {toto.name}; %grab the names of the event files corresponding to each run

    toto = dir([subject filesep 'PM0*_' log_end]);
    log_files = {toto.name}; %grab the names of the -ave.log files corresponding to each run

    allruns_data =[];
    event_data = [];
    runname ={};
    for i_run = 1:length(event_files)
        %choose the event and log files for one run
        event_file = event_files{i_run};
        log_file   = log_files{i_run};

        %create the corresponding outfile
        toto = findstr(event_file,'PM0');
        runname{i_run}  = event_file(toto:toto+3);

        %read in the event file
        event_data{i_run} = readtext([subject filesep event_file],'\t');
        %read in the log file
        [sample seconds weight code descrip toto toto toto toto toto toto toto toto omit] = textread([subject filesep log_file],'%d %f %d %d %s %s %s %s %s %s %s %s %s %s');

        %make sure there are the same number of events in the log and event
        %file
        if length(omit) ~= length(event_data{i_run})
            error(['Error: event file ' event_file ' does not have the same number of events as ' log_file]);
        end

        %remove trials from the event file that will be rejected during
        %averaging
        bad_trials = strcmpi(omit,'[omit]');
        good_trials = ~bad_trials;
        event_data{i_run} = event_data{i_run}(good_trials,:);

        allruns_data = [allruns_data; event_data{i_run}];

    end
    events = [allruns_data{:,4}]; %put all of the events codes across all runs into a vector
    trial_tallies = hist(events,5); %tally events by condition
    trial_nums = [1 101 201 301 401];%list out your unique event codes

    for i_trialtype = 1:5; %cycle through each trial type, removing events from the event_data cluster until your cap is met
        trial_tally = trial_tallies(i_trialtype);
        if trial_tally < cap;
            continue
        end
        trials_to_remove = trial_tally-cap;
        for i_run = 5:-1:1;
            if trials_to_remove ==0;
                break; %both this and the break below are needed!
            end
            for row = size(event_data{i_run},1):-1:1
                if event_data{i_run}{row,4} == trial_nums(i_trialtype);
                    if row == size(event_data{i_run},1);
                        event_data{i_run} = event_data{i_run}([1:row-1],:);
                    else
                        event_data{i_run} = event_data{i_run}([1:row-1 row+1:end],:);
                    end
                    trials_to_remove = trials_to_remove -1;
                    if trials_to_remove ==0;
                        break;
                    end
                end
            end
        end
    end
    %write it out
    for i_run = 1:length(event_files)
        output_filename = [output_root '_' runname{i_run} '.eve'];
        fid = fopen([subject filesep output_filename],'w');

        for row = 1:size(event_data{i_run},1);
            datatoprint = [event_data{i_run}{row,:}];
            fprintf(fid,'%d\t%.3f\t%d\t%d\n',datatoprint);
        end
        fclose(fid);
    end

end