Accessing training status through public database

I would like to grab the trial data from each BWM mouse’s last 3 sessions of trainingCW, before it was introduced to the biased blocks (basically, the sessions where it was declared trained).

When running this script 2023_choicehistory_HSSM/src/get_ibl_data.py at main · kiante-fernandez/2023_choicehistory_HSSM · GitHub

I get

number of subjects: 
137
['CSHL034' 'CSHL045' 'CSHL046' 'CSHL047' 'CSHL049' 'CSHL051' 'CSHL052'
 'CSHL053' 'CSHL054' 'CSHL055' 'CSHL056' 'CSHL057' 'CSHL058' 'CSHL059'
 'CSHL060' 'CSHL_011' 'CSHL_012' 'CSHL_013' 'CSHL_014' 'CSHL_015'
 'CSH_ZAD_001' 'CSH_ZAD_002' 'CSH_ZAD_003' 'CSH_ZAD_004' 'CSH_ZAD_005'
 'CSH_ZAD_006' 'CSH_ZAD_007' 'CSH_ZAD_009' 'CSH_ZAD_010' 'CSH_ZAD_015'
 'CSH_ZAD_016' 'CSH_ZAD_017' 'CSH_ZAD_018' 'CSH_ZAD_019' 'CSH_ZAD_021'
 'CSH_ZAD_022' 'CSH_ZAD_023' 'CSH_ZAD_024' 'DY_005' 'DY_006' 'DY_007'
 'DY_008' 'DY_009' 'DY_010' 'DY_011' 'DY_012' 'DY_013' 'DY_014' 'DY_015'
 'IBL-T1' 'IBL-T2' 'IBL-T3' 'IBL_001' 'NYU-04' 'NYU-06' 'NYU-07' 'NYU-09'
 'NYU-11' 'NYU-12' 'NYU-14' 'NYU-20' 'NYU-21' 'NYU-23' 'NYU-24' 'NYU-25'
 'NYU-26' 'SWC_001' 'SWC_002' 'SWC_003' 'SWC_004' 'SWC_006' 'SWC_007'
 'SWC_010' 'SWC_011' 'SWC_013' 'SWC_014' 'SWC_015' 'SWC_016' 'SWC_017'
 'SWC_018' 'SWC_019' 'SWC_020' 'SWC_021' 'SWC_022' 'SWC_023' 'SWC_027'
 'SWC_028' 'SWC_029' 'SWC_030' 'SWC_032' 'SWC_033' 'SWC_034' 'SWC_035'
 'SWC_036' 'SWC_038' 'SWC_039' 'SWC_040' 'SWC_041' 'SWC_042' 'SWC_043'
 'ZM_1367' 'ZM_1369' 'ZM_1371' 'ZM_1372' 'ZM_1743' 'ZM_1745' 'ZM_1897'
 'ZM_1898' 'ZM_1928' 'ZM_2106' 'ZM_2107' 'ZM_2240' 'ZM_2241' 'ZM_2245'
 'ZM_3001' 'ZM_3002' 'ZM_3003' 'ZM_3004' 'ZM_3005' 'ZM_3006'
 'ibl_witten_05' 'ibl_witten_06' 'ibl_witten_07' 'ibl_witten_11'
 'ibl_witten_12' 'ibl_witten_13' 'ibl_witten_14' 'ibl_witten_15'
 'ibl_witten_16' 'ibl_witten_17' 'ibl_witten_18' 'ibl_witten_19'
 'ibl_witten_20' 'ibl_witten_21' 'ibl_witten_22' 'ibl_witten_23'
 'ibl_witten_24']

But many of them do not have the _ibl_subjectTraining.table available to load

(giving the error The ALF object was not found. This may occur if the object or namespace or incorrectly formatted e.g. the object _ibl_trials.intervals.npy would be found with the filters object="trials", namespace="ibl" ):

skipping CSHL034, could not load subjectTraining.table
skipping CSHL046, could not load subjectTraining.table
skipping CSHL056, could not load subjectTraining.table
skipping CSHL057, could not load subjectTraining.table
skipping CSHL_011, could not load subjectTraining.table
skipping CSHL_012, could not load subjectTraining.table
skipping CSHL_013, could not load subjectTraining.table
skipping CSH_ZAD_002, could not load subjectTraining.table
skipping CSH_ZAD_003, could not load subjectTraining.table
skipping CSH_ZAD_004, could not load subjectTraining.table
skipping CSH_ZAD_005, could not load subjectTraining.table
skipping CSH_ZAD_006, could not load subjectTraining.table
skipping CSH_ZAD_007, could not load subjectTraining.table
skipping CSH_ZAD_009, could not load subjectTraining.table
skipping CSH_ZAD_010, could not load subjectTraining.table
skipping CSH_ZAD_015, could not load subjectTraining.table
skipping CSH_ZAD_016, could not load subjectTraining.table
skipping CSH_ZAD_018, could not load subjectTraining.table
skipping CSH_ZAD_019, could not identify first day trained
skipping CSH_ZAD_021, could not load subjectTraining.table
skipping CSH_ZAD_023, could not load subjectTraining.table
skipping CSH_ZAD_024, could not identify first day trained
skipping DY_005, could not load subjectTraining.table
skipping DY_006, could not load subjectTraining.table
skipping DY_007, could not load subjectTraining.table
skipping DY_012, could not load subjectTraining.table
skipping DY_015, could not load subjectTraining.table
skipping IBL-T1, could not load subjectTraining.table
skipping IBL-T2, could not load subjectTraining.table
skipping IBL-T3, could not load subjectTraining.table
skipping IBL_001, could not load subjectTraining.table
skipping NYU-04, could not load subjectTraining.table
skipping NYU-06, could not identify first day trained
skipping NYU-07, could not load subjectTraining.table
skipping NYU-09, could not load subjectTraining.table
skipping NYU-12, did not find 3 sessions before trained
skipping NYU-14, could not load subjectTraining.table
skipping NYU-20, could not load subjectTraining.table
skipping NYU-21, could not identify first day trained
skipping NYU-23, could not load subjectTraining.table
skipping NYU-24, could not load subjectTraining.table
skipping NYU-25, could not load subjectTraining.table
skipping NYU-26, could not load subjectTraining.table
skipping SWC_001, could not load subjectTraining.table
skipping SWC_002, could not load subjectTraining.table
skipping SWC_003, could not load subjectTraining.table
skipping SWC_004, could not load subjectTraining.table
skipping SWC_006, could not load subjectTraining.table
skipping SWC_007, could not load subjectTraining.table
skipping SWC_010, could not load subjectTraining.table
skipping SWC_011, could not load subjectTraining.table
skipping SWC_013, could not load subjectTraining.table
skipping SWC_014, could not load subjectTraining.table
skipping SWC_015, could not load subjectTraining.table
skipping SWC_016, could not load subjectTraining.table
skipping SWC_017, could not load subjectTraining.table
skipping SWC_018, could not load subjectTraining.table
skipping SWC_019, could not load subjectTraining.table
skipping SWC_020, could not load subjectTraining.table
skipping SWC_027, could not load subjectTraining.table
skipping SWC_028, could not load subjectTraining.table
skipping SWC_029, could not load subjectTraining.table
skipping SWC_030, could not load subjectTraining.table
skipping SWC_032, could not load subjectTraining.table
skipping SWC_033, could not load subjectTraining.table
skipping SWC_034, could not load subjectTraining.table
skipping SWC_035, could not load subjectTraining.table
skipping SWC_036, could not load subjectTraining.table
skipping SWC_040, could not load subjectTraining.table
skipping SWC_041, could not load subjectTraining.table
skipping SWC_043, could not identify first day trained
skipping ZM_1367, could not load subjectTraining.table
skipping ZM_1369, could not load subjectTraining.table
skipping ZM_1371, could not load subjectTraining.table
skipping ZM_1372, could not load subjectTraining.table
skipping ZM_1743, could not load subjectTraining.table
skipping ZM_1745, could not load subjectTraining.table
skipping ZM_1928, could not load subjectTraining.table
skipping ZM_2106, could not load subjectTraining.table
skipping ZM_2107, could not load subjectTraining.table
skipping ZM_3001, could not load subjectTraining.table
skipping ZM_3002, could not load subjectTraining.table
skipping ZM_3004, could not load subjectTraining.table
skipping ZM_3005, could not load subjectTraining.table
skipping ZM_3006, could not load subjectTraining.table
skipping ibl_witten_05, could not load subjectTraining.table
skipping ibl_witten_06, could not load subjectTraining.table
skipping ibl_witten_07, could not load subjectTraining.table
skipping ibl_witten_11, could not load subjectTraining.table
skipping ibl_witten_12, could not load subjectTraining.table
skipping ibl_witten_15, could not load subjectTraining.table
skipping ibl_witten_21, could not load subjectTraining.table
skipping ibl_witten_22, could not load subjectTraining.table
skipping ibl_witten_23, could not load subjectTraining.table
skipping ibl_witten_24, could not load subjectTraining.table

Would there be a way to request that these BWM subjects get a subjectTrainingTable computed and released on the public database? In the eLife behavior paper we already had ~100 mice, it would be great to have the training tables available for those. Or am I missing something?

Hi Anne,

Currently we only have released the subjectTraining tables that were used in the analysis for Bruijns_et_al. These are the following subjects with the tables

subjects = [‘CSHL045’, ‘CSHL047’, ‘CSHL049’, ‘CSHL051’, ‘CSHL052’, ‘CSHL053’, ‘CSHL054’, ‘CSHL055’, ‘CSHL058’,
‘CSHL059’, ‘CSHL060’, ‘CSHL_007’, ‘CSHL_014’, ‘CSHL_015’, ‘CSHL_020’, ‘CSH_ZAD_001’, ‘CSH_ZAD_011’,
‘CSH_ZAD_017’, ‘CSH_ZAD_019’, ‘CSH_ZAD_022’, ‘CSH_ZAD_024’, ‘CSH_ZAD_025’, ‘CSH_ZAD_026’, ‘CSH_ZAD_029’,
‘DY_008’, ‘DY_009’, ‘DY_010’, ‘DY_011’, ‘DY_013’, ‘DY_014’, ‘DY_016’, ‘DY_018’, ‘DY_020’, ‘KS014’, ‘KS015’,
‘KS016’, ‘KS017’, ‘KS019’, ‘KS021’, ‘KS022’, ‘KS023’, ‘KS042’, ‘KS043’, ‘KS044’, ‘KS045’, ‘KS046’, ‘KS051’,
‘KS052’, ‘KS055’, ‘KS084’, ‘KS086’, ‘KS091’, ‘KS094’, ‘KS096’, ‘MFD_05’, ‘MFD_06’, ‘MFD_07’, ‘MFD_08’,
‘MFD_09’, ‘NR_0017’, ‘NR_0019’, ‘NR_0020’, ‘NR_0021’, ‘NR_0024’, ‘NR_0027’, ‘NR_0028’, ‘NR_0029’,
‘NR_0031’, ‘NYU-06’, ‘NYU-11’, ‘NYU-12’, ‘NYU-21’, ‘NYU-27’, ‘NYU-30’, ‘NYU-37’, ‘NYU-39’, ‘NYU-40’,
‘NYU-45’, ‘NYU-46’, ‘NYU-47’, ‘NYU-48’, ‘NYU-65’, ‘PL015’, ‘PL016’, ‘PL017’, ‘PL024’, ‘PL030’, ‘PL031’,
‘PL033’, ‘PL034’, ‘PL035’, ‘PL037’, ‘PL050’, ‘SWC_021’, ‘SWC_022’, ‘SWC_023’, ‘SWC_038’, ‘SWC_039’,
‘SWC_042’, ‘SWC_043’, ‘SWC_052’, ‘SWC_053’, ‘SWC_054’, ‘SWC_058’, ‘SWC_060’, ‘SWC_061’, ‘SWC_065’,
‘SWC_066’, ‘UCLA005’, ‘UCLA006’, ‘UCLA011’, ‘UCLA012’, ‘UCLA014’, ‘UCLA015’, ‘UCLA017’, ‘UCLA030’,
‘UCLA033’, ‘UCLA034’, ‘UCLA035’, ‘UCLA036’, ‘UCLA037’, ‘UCLA044’, ‘UCLA048’, ‘UCLA049’, ‘UCLA052’,
‘ZFM-01576’, ‘ZFM-01577’, ‘ZFM-01592’, ‘ZFM-01935’, ‘ZFM-01936’, ‘ZFM-01937’, ‘ZFM-02368’, ‘ZFM-02369’,
‘ZFM-02370’, ‘ZFM-02372’, ‘ZFM-02373’, ‘ZFM-04308’, ‘ZFM-05236’, ‘ZM_1897’, ‘ZM_1898’, ‘ZM_2240’,
‘ZM_2241’, ‘ZM_2245’, ‘ZM_3003’, ‘ibl_witten_13’, ‘ibl_witten_14’, ‘ibl_witten_16’, ‘ibl_witten_17’,
‘ibl_witten_18’, ‘ibl_witten_19’, ‘ibl_witten_20’, ‘ibl_witten_25’, ‘ibl_witten_26’, ‘ibl_witten_27’,
‘ibl_witten_29’, ‘ibl_witten_32’
]

For the other mice used in the behavior paper we have released the trials table available per session but as you say not the full subjectTrial table. I’ll put it in the backlog for us to discuss if we can release these tables also.

Thanks Mayo, that makes a lot of sense - and should be enough for now.

One related question, is it expected behavior that for the same subjects, the query

    eids, info = one.search(subject=subject, 
                            dataset=['trials.table'], 
                            details=True)

returns different sessions than

trials_agg = one.load_aggregate('subjects', subject, '_ibl_subjectTrials.table')

I’ve attached a figure with the dates for all sessions in the Bruijns release, from the trials.table (blue), subjectTrials.table (orange), and subjectTraining.table (green). I was naively assuming the first 2 would be identical, but that doesn’t seem to be the case.

This code reproduces the figure


regexp = re.compile(r'Subjects/\w*/((\w|-)+)/_ibl') 
datasets = one.alyx.rest('datasets', 'list', tag='2023_Q4_Bruijns_et_al') # extract subject names 
subjects = np.unique(np.sort([regexp.search(ds['file_records'][0]['relative_path']).group(1) for ds in datasets])) # reduce to list of unique names 

print('number of subjects: ')
print(len(subjects))
print(subjects)

#%% SECOND, FIND WHICH SESSIONS WE'D LIKE TO USE
date_check = []
for subject in tqdm(subjects):

    # what was the training status at each session?
    eids, info = one.search(subject=subject, 
                            dataset=['trials.table'], 
                            details=True)
    try: assert(len(info) > 0); #check we get something back that makes sense
    except: continue
    df_info = pd.DataFrame(info).sort_values(by=['lab', 'subject', 'date', 'number'])
    df_info['date'] = pd.to_datetime(df_info['date'], utc = True)
    date_check.append({'subject': np.repeat(subject, df_info['date'].nunique()),
                        'source':np.repeat('trials_table', df_info['date'].nunique()),
                        'date': df_info['date'].unique()})

    #print('info from trials.table')
    #print(df_info['date'].describe())
    trials_agg = one.load_aggregate('subjects', subject, '_ibl_subjectTrials.table')
    #date_check.append({'subject': subject, 'source':'subjecttrials_table','date': trials_agg['session_start_time'].unique()})

    date_check.append({'subject': np.repeat(subject, trials_agg['session_start_time'].nunique()),
                        'source':np.repeat('subjecttrials_table', trials_agg['session_start_time'].nunique()),
                        'date': trials_agg['session_start_time'].unique()})
    
    #print('info from subjectTrials.table')
    #print(trials_agg['session_start_time'].describe())
    training_status = one.load_aggregate('subjects', subject, 
                                        '_ibl_subjectTraining.table').reset_index()
    training_status['date'] = pd.to_datetime(training_status['date'], utc = True)
    #date_check.append({'subject': subject, 'source':'subjecttraining_table','date': training_status['date'].unique()})

    date_check.append({'subject': np.repeat(subject, training_status['date'].nunique()),
                        'source':np.repeat('subjecttraining_table', training_status['date'].nunique()),
                        'date': training_status['date'].unique()})

    
    #print('info from subjectTraining.table')
    #print(training_status['date'].describe())

date_df = pd.concat([pd.DataFrame(d) for d in date_check])
fig = sns.FacetGrid(date_df, col='subject', col_wrap=8, sharey=False, hue='source')
fig.map(sns.swarmplot, 'source', 'date').add_legend()
plt.savefig(os.path.join(fig_folder_path, 'ibl_date_check.png'))

Yes this makes sense. It really depends on what other projects the data for these subjects have been released.

For example I reckon CSH_ZAD_017 has only had the trials data used in the behavior paper released while NR_0027 is part of the BWM paper and so has only had the ephys sessions released.

Makes sense, thanks!