I am trying to download several lfp files from a brain region. In the tutorila they show CA1 as an example so I used it here. There are supposedly 173 sessions worth of data and 173 insertions which have a channel in CA1. But when I got to loop through those to pull each LFP many of the .bin files appear to be missing and I get an error. I think I am doing something wrong.
My code:
from one.api import ONE
import spikeglx
from brainbox.io.one import load_channel_locations
one = ONE(password='international')
#Searching for datasets
brain_acronym = 'CA1'
# query sessions endpoint
sessions, sess_details = one.search(atlas_acronym=brain_acronym, query_type='remote', details=True)
print(f'No. of detected sessions: {len(sessions)}')
# query insertions endpoint
insertions = one.search_insertions(atlas_acronym=brain_acronym)
print(f'No. of detected insertions: {len(insertions)}')
Returns:
No. of detected sessions: 173
No. of detected insertions: 173
But then I get an error when I search for these insertions:
session_list = [x for x in sessions]
# probe id and experiment id
eid = session_list[0]
pid, probename = one.eid2pid(eid)
band = 'lf' # either 'ap','lf'
# Find the relevant datasets and download them
dsets = one.list_datasets(eid, collection=f'raw_ephys_data/{probename}', filename='*.lf.*')
data_files, _ = one.load_datasets(eid, dsets, download_only=False)
bin_file = next(df for df in data_files if df.suffix == '.cbin')
# Use spikeglx reader to read in the whole raw data
sr = spikeglx.Reader(bin_file)
Returns an error:
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) Cell In[34], [line 10](vscode-notebook-cell:?execution_count=34&line=10) [8](vscode-notebook-cell:?execution_count=34&line=8) dsets = one.list_datasets(eid, collection=f'raw_ephys_data/{probename}', filename='*.lf.*') [9](vscode-notebook-cell:?execution_count=34&line=9) data_files, _ = one.load_datasets(eid, dsets, download_only=False) ---> [10](vscode-notebook-cell:?execution_count=34&line=10) bin_file = next(df for df in data_files if df.suffix == '.cbin') [12](vscode-notebook-cell:?execution_count=34&line=12) # Use spikeglx reader to read in the whole raw data [13](vscode-notebook-cell:?execution_count=34&line=13) sr = spikeglx.Reader(bin_file) TypeError: 'NoneType' object is not iterable
Dear Angus,
The issue here is that you are converting an EID (session) to a PID (insertion), however there can be multiple probe insertions done within a single session.
This is what happens here,
The variables pid and probename contains 2 insertions (len(probename) == 2).
As a result, the query you subsequently make on the dataset is invalid, as probename is now a list of length 2 (and not a string):
# Find the relevant datasets and download them
dsets = one.list_datasets(eid, collection=f'raw_ephys_data/{probename}', filename='*.lf.*')
as a result dsets is an empty list [] and the following part of the code cannot run.
You can use directly the insertions variable you created in the example code above to query for the datasets. Let us know if you run into issues this way.
Cheers
I’m working on a shared server and I don’t want to eat up too much memory. So one last thing that would be useful for me to know though is if my strategy for wiping the data after I use it is ok. Since I’m looping through all the LFP data and eventually the rest of the data I am going to delete what gets called after I’ve run an analysis on the session and save the outputs to .csvs and .pngs.
I’m planning on making bash calls inside my python script that finds the data for that mouse and wipes it out at the end of ever loop or thread. It will takes the path from the session info dictionaries to build a path to the data and then wipe the folder containing everything the LFP, spiking and task data. Will a simple bash subprocess call running rm -r cortexlab/Subjects/KS020/2020-02-07 for instance mess with any of the cache management scripts? Are there specific functions in the One-api I could use to do that which would play nice with the rest of the code?
@owinter So I have tried to remove the files but some process is still using them, can’t quite idenitify it since I deleted all the variables related to the
# note session_id is the eid
# dont_wipe_these_sessions is a list for debugging to avoid repeatedly downloading during debugging
# or f I want to keep that session around for some other reason
if (session_id not in dont_wipe_these_sessions):
session_path = str(one.eid2path(eid))
# create the full path to the directory you want to delete
# create the full path to the directory you want to delete
dir_to_delete = f"{session_path}/raw_ephys_data"
print
# call the bash command to reove files so directory can also be removed
remove_from_path_command = "find " + dir_to_delete + " -type d -exec rm -rf {} +"
subprocess.run(remove_from_path_command, shell=True)
# Then, remove the directory itself
remove_dir_command = "find " + session_path + " -type d -name 'raw_ephys_data' -exec rm -r {} +"
subprocess.run(remove_dir_command, shell=True)
But this results in this error:
rm: cannot remove ‘/space/scratch/IBL_data_cache/churchlandlab_ucla/Subjects/UCLA033/2022-02-15/001/raw_ephys_data/probe00/.nfs00000000000cdb1800000286’: Device or resource busy
rm: cannot remove ‘/space/scratch/IBL_data_cache/churchlandlab_ucla/Subjects/UCLA033/2022-02-15/001/raw_ephys_data/probe01/.nfs00000000000cdb2400000287’: Device or resource busy
rm: cannot remove ‘/space/scratch/IBL_data_cache/churchlandlab_ucla/Subjects/UCLA033/2022-02-15/001/raw_ephys_data/probe00/.nfs00000000000cdb1800000286’: Device or resource busy
rm: cannot remove ‘/space/scratch/IBL_data_cache/churchlandlab_ucla/Subjects/UCLA033/2022-02-15/001/raw_ephys_data/probe01/.nfs00000000000cdb2400000287’: Device or resource busy
rm: cannot remove ‘/space/scratch/IBL_data_cache/churchlandlab_ucla/Subjects/UCLA033/2022-02-15/001/raw_ephys_data/probe00/.nfs00000000000cdb1800000286’: Device or resource busy
rm: cannot remove ‘/space/scratch/IBL_data_cache/churchlandlab_ucla/Subjects/UCLA033/2022-02-15/001/raw_ephys_data/probe01/.nfs00000000000cdb2400000287’: Device or resource busy
I ended up having to delete the one variable and restart it at every loop. I am a bit concerned for when I try to implement mulitprocessing or multithreading if having another one process running would interfere with that process or threads ability to clear the memory.
I guess I can just wipe all the sessions from memory at the end, but I prefer to never really be eating up too much shared memory. The Dandi-cli has this funtionality to just call what you need then wipe it and I think it’s a nice feature. I jury-rigged it into the allen brain sdk script I wrote.
Will try running this is a multiprocessing function to see what happens.