Inconsistency in ephysData.raw.timestamps object in IBL

babou_osellot · January 14, 2024, 11:47pm

I am using theephysData.raw.timestamps object to compute a time array for the LFP series so I can relate events back to the behavioural and spiking data. The documentation describes it as…2 columns file containing time synchronisation information for the AP binary file: sample index in the first column and session time in the second column. Note that sample indices may not be integers But I have found it stored in two different ways. This caused a bug in my preprocessing script to make the times array. I solved it but now I am concerned I am using the wrong data to interpolate a time array.

Documentaion:

How ever I discovered that the object was saved in an inconsistent manner. For instance for eid = '572a95d1-39ca-42e1-8424-5c9ffcb2df87' it is saved as described in the documentation sort of. It is a list object with two equally long 1d numpy arrays. Presumably the first beign the samples in AP band and the sencod being the reference time.

How ever most of the eid’s I used had the object saved in a different manner. This eid, eid = '0c828385-6dd6-4842-a702-c5075f5f5e81’ the object is also a two entry list. But the first entry is a 1dnumpy array of (1392,) and the other is a 2d array of shape (229, 2). I have been using the second array as the two column array for computing times.

These are the experiments I’ve run my pipeline on:

testing_list = ['0c828385-6dd6-4842-a702-c5075f5f5e81','111c1762-7908-47e0-9f40-2f2ee55b6505','8a3a0197-b40a-449f-be55-c00b23253bbf','1a507308-c63a-4e02-8f32-3239a07dc578','1a507308-c63a-4e02-8f32-3239a07dc578','73918ae1-e4fd-4c18-b132-00cb555b1ad2',
 '73918ae1-e4fd-4c18-b132-00cb555b1ad2','09b2c4d1-058d-4c84-9fd4-97530f85baf6','5339812f-8b91-40ba-9d8f-a559563cc46b','034e726f-b35f-41e0-8d6c-a22cc32391fb','83e77b4b-dfa0-4af9-968b-7ea0c7a0c7e4','83e77b4b-dfa0-4af9-968b-7ea0c7a0c7e4','931a70ae-90ee-448e-bedb-9d41f3eda647',
 'd2832a38-27f6-452d-91d6-af72d794136c','dda5fc59-f09a-4256-9fb5-66c67667a466','e2b845a1-e313-4a08-bc61-a5f662ed295e','a4a74102-2af5-45dc-9e41-ef7f5aed88be','572a95d1-39ca-42e1-8424-5c9ffcb2df87','781b35fd-e1f0-4d14-b2bb-95b7263082bb',
 'b01df337-2d31-4bcc-a1fe-7112afd50c50','e535fb62-e245-4a48-b119-88ce62a6fe67','614e1937-4b24-4ad3-9055-c8253d089919','7f6b86f9-879a-4ea2-8531-294a221af5d0','824cf03d-4012-4ab1-b499-c83a92c5589e','4b00df29-3769-43be-bb40-128b1cba6d35','ff96bfe1-d925-4553-94b5-bf8297adf259']

Here is some example code you can use to verify what I mean:


#IBL SWR detector
import os
import subprocess 
import numpy as np
import pandas as pd
from scipy import io, signal, stats
from scipy.signal import lfilter
#from fitter import Fitter, get_common_distributions, get_distributions
import scipy.ndimage
from scipy.ndimage import gaussian_filter
from scipy.ndimage import gaussian_filter1d
from scipy import interpolate
import matplotlib.pyplot as plt
# for ripple detection
import ripple_detection
from ripple_detection import filter_ripple_band
import ripple_detection.simulate as ripsim # for making our time vectors
import piso #can be difficult to install, https://piso.readthedocs.io/en/latest/
from tqdm import tqdm
from iblatlas.atlas import AllenAtlas
from iblatlas.regions import BrainRegions
from one.api import ONE
import spikeglx
from brainbox.io.one import load_channel_locations
from brainbox.io.spikeglx import Streamer
from neurodsp.voltage import destripe_lfp

#THIS CODE WORKS THIS CODE LOOPS THROUGH THE SESSIONS AND DOWNLOADS THE DATA, WE NEED TO ADD THE RIPPLE DETECTION CODE TO REMOVE THE DATA AFTER 
from neurodsp.voltage import destripe_lfp
from ibllib.plots import Density
import time # for debugging
import traceback
import logging

from multiprocessing import Pool, Process, Queue

from one.api import ONE
ONE.setup(base_url='https://openalyx.internationalbrainlab.org', silent=True)
one = ONE(password='international')
# Parameters, (input output file paths and thresholds)



testing_list = ['0c828385-6dd6-4842-a702-c5075f5f5e81','111c1762-7908-47e0-9f40-2f2ee55b6505','8a3a0197-b40a-449f-be55-c00b23253bbf','1a507308-c63a-4e02-8f32-3239a07dc578','1a507308-c63a-4e02-8f32-3239a07dc578','73918ae1-e4fd-4c18-b132-00cb555b1ad2',
 '73918ae1-e4fd-4c18-b132-00cb555b1ad2','09b2c4d1-058d-4c84-9fd4-97530f85baf6','5339812f-8b91-40ba-9d8f-a559563cc46b','034e726f-b35f-41e0-8d6c-a22cc32391fb','83e77b4b-dfa0-4af9-968b-7ea0c7a0c7e4','83e77b4b-dfa0-4af9-968b-7ea0c7a0c7e4','931a70ae-90ee-448e-bedb-9d41f3eda647',
 'd2832a38-27f6-452d-91d6-af72d794136c','dda5fc59-f09a-4256-9fb5-66c67667a466','e2b845a1-e313-4a08-bc61-a5f662ed295e','a4a74102-2af5-45dc-9e41-ef7f5aed88be','572a95d1-39ca-42e1-8424-5c9ffcb2df87','781b35fd-e1f0-4d14-b2bb-95b7263082bb',
 'b01df337-2d31-4bcc-a1fe-7112afd50c50','e535fb62-e245-4a48-b119-88ce62a6fe67','614e1937-4b24-4ad3-9055-c8253d089919','7f6b86f9-879a-4ea2-8531-294a221af5d0','824cf03d-4012-4ab1-b499-c83a92c5589e','4b00df29-3769-43be-bb40-128b1cba6d35','ff96bfe1-d925-4553-94b5-bf8297adf259']

from one.api import ONE

ONE.setup(base_url='https://openalyx.internationalbrainlab.org', silent=True)

one = ONE(password='international')

eids_with_timestamps_error = ['572a95d1-39ca-42e1-8424-5c9ffcb2df87','0c828385-6dd6-4842-a702-c5075f5f5e81']

corresponding_probe_with_error = ['f9d8aacd-b2a0-49f2-bd71-c2f5aadcfdd1','0851db85-2889-4070-ac18-a40e8ebd96ba']

no_error_eid = ['0c828385-6dd6-4842-a702-c5075f5f5e81']

# pick one...
#eid = eids_with_timestamps_error[0] 
#eid = no_error_eid[0]
session_id = eid  # just to dealwith some other stuff later
probelist, probenames = one.eid2pid(eid) # probe_id is pid in the IBL tutorials

print(f'Probe IDs: {probelist}, Probe names: {probenames}')

band = 'lf' # either 'ap','lf'

#for probe_id in pid:
for this_probe in range(len(probelist)):
    probe_name = probenames[this_probe]
    probe_id = probelist[this_probe]
    print(probe_id)
    # first check if this probe even has CA1 channels on it, no need to process if not
    print("getting channels data")
    collectionname = f'alf/{probe_name}/pykilosort' # ensures channels are all from this probe
    channels = one.load_object(eid, 'channels', collection=collectionname)
    channels.allen2017_25um_acronym = br.id2acronym( channels['brainLocationIds_ccf_2017'] )
    
    # get the timestamps for this lfp recording
    #2 columns file containing time synchronisation information for the AP binary file: 
    # sample index in the first column and session time in the second column. Note that sample indices may not be integers
    lfp_time_timestamps_path = one.list_datasets(eid, collection=f'raw_ephys_data/{probe_name}', filename='*timestamps.npy')
    lfp_time_timestamps, _ = one.load_datasets(eid, lfp_time_timestamps_path, download_only=False)

# Now verify with .shape
print( lfp_time_timestamps[0].shape)
print( lfp_time_timestamps[1].shape)

This was causing a bug in my preprocessing script. My solution in my preprocessing script was as follows:


# get the timestamps for this lfp recording
            #2 columns file containing time synchronisation information for the AP binary file: 
            # sample index in the first column and session time in the second column. Note that sample indices may not be integers
            lfp_time_timestamps_path = one.list_datasets(eid, collection=f'raw_ephys_data/{probe_name}', filename='*timestamps.npy')
            lfp_time_timestamps, _ = one.load_datasets(eid, lfp_time_timestamps_path, download_only=False)

            # Divide the first column by 12 to get the LFP samples they correspond to
            # some timestamps obects are lsits witht he two columns in the second entry, others are just a list with two 1d arrays
            if len(lfp_time_timestamps[1].shape)==2:
                lfp_time_timestamps = lfp_time_timestamps[1]

                adjusted_samples = lfp_time_timestamps[:, 0] / 12 # devide by 12 for the LFP sampling rate, (1/12th of AP band)
                adjusted_times = lfp_time_timestamps[:, 1]
            else:
                adjusted_samples = lfp_time_timestamps[0] / 12 # devide by 12 for the LFP sampling rate, (1/12th of AP band)
                adjusted_times = lfp_time_timestamps[1]
            # Create an array of sample numbers for your signal
            sample_numbers = np.arange(destriped.shape[1])

Which version is the correct way to have saved the data and have I computed my times array correctly?

babou_osellot · January 15, 2024, 1:50am

I’ve also just realized that in eid = '572a95d1-39ca-42e1-8424-5c9ffcb2df87' the samples and times data don’t correspond to eachother and raise a value error if I try to interpolate a time array for them.

eid = '572a95d1-39ca-42e1-8424-5c9ffcb2df87'

# probe_id = 'ae252f7b-0224-4925-8174-7b25c2385bb7' # not used here

probe_name = 'probe01'

lfp_time_timestamps_path = one.list_datasets(eid, collection=f'raw_ephys_data/{probe_name}', filename='*timestamps.npy')

lfp_time_timestamps, _ = one.load_datasets(eid, lfp_time_timestamps_path, download_only=False)

lfp_time_timestamps, _ = one.load_datasets(eid, lfp_time_timestamps_path, download_only=False)

print("str(len(lfp_time_timestamps)) : " +str(len(lfp_time_timestamps)))

print("lfp_time_timestamps[0].shape" +str(lfp_time_timestamps[0].shape))

print("lfp_time_timestamps[1].shape" +str(lfp_time_timestamps[1].shape))

returns…

str(len(lfp_time_timestamps)) : 3
lfp_time_timestamps[0].shape(2121,)
lfp_time_timestamps[1].shape(1370,)

I make the time values array as shown below:

lfp_time_timestamps, _ = one.load_datasets(eid, lfp_time_timestamps_path, download_only=False)

print(lfp_time_timestamps)

print("len(lfp_time_timestamps[1].shape) returns... " + str(len(lfp_time_timestamps[1].shape)))
print("len(lfp_time_timestamps[1].shape)==2 returns... " + str(len(lfp_time_timestamps[1].shape)==2))
if len(lfp_time_timestamps[1].shape)==2:
    print(" first version...")
    lfp_time_timestamps = lfp_time_timestamps[1]

    adjusted_samples = lfp_time_timestamps[:, 0] / 12 # devide by 12 for the LFP sampling rate, (1/12th of AP band)
    adjusted_times = lfp_time_timestamps[:, 1]
else:
    print(" first version...")
    adjusted_samples = lfp_time_timestamps[0] / 12 # devide by 12 for the LFP sampling rate, (1/12th of AP band)
    adjusted_times = lfp_time_timestamps[1]
    # Create an array of sample numbers for your signal
    sample_numbers = np.arange(destriped.shape[1])

    # Interpolate to get lfp_time_index_og
    lfp_time_index_og = np.interp(sample_numbers, adjusted_samples, adjusted_times)

But this throws an error for this session:

Traceback (most recent call last):
  File "/home/acampbell/Stienmetz2019Reanalyzed/ExtractingSWRs/ibl_swr_data/ibl_swr_detector.py", line 637, in <module>
    p.starmap(process_session, [(session_id, queue) for session_id in session_list])
  File "/home/acampbell/miniconda3/envs/ONE_ibl_env/lib/python3.10/multiprocessing/pool.py", line 372, in starmap
    return self._map_async(func, iterable, starmapstar, chunksize).get()
  File "/home/acampbell/miniconda3/envs/ONE_ibl_env/lib/python3.10/multiprocessing/pool.py", line 771, in get
    raise self._value
ValueError: fp and xp are not of the same length.

babou_osellot · January 15, 2024, 5:55am

There are several more with this error:

[ 'b01df337-2d31-4bcc-a1fe-7112afd50c50',
    '8a3a0197-b40a-449f-be55-c00b23253bbf',
    '614e1937-4b24-4ad3-9055-c8253d089919',
    '781b35fd-e1f0-4d14-b2bb-95b7263082bb',
    'd2832a38-27f6-452d-91d6-af72d794136c',
    '824cf03d-4012-4ab1-b499-c83a92c5589e',
    'ff96bfe1-d925-4553-94b5-bf8297adf259',
    '572a95d1-39ca-42e1-8424-5c9ffcb2df87',
    'e535fb62-e245-4a48-b119-88ce62a6fe67']

I ran these in a for loops and none of them have arrays that are the same size in the timestamps.

Additionally there is one missing a ccf alignment.

The ‘b01df337-2d31-4bcc-a1fe-7112afd50c50’ has no ccf data.

probe_name = probenames[this_probe]
probe_id = probelist[this_probe]
collectionname = f'alf/{probe_name}/pykilosort' # ensures channels are all from this probe
channels = one.load_object(eid, 'channels', collection=collectionname)
brain_location = channels['brainLocationIds_ccf_2017']

This produces a key error and checking its keys gives…

dict_keys(['rawInd', 'localCoordinates'])

k1o0 · January 17, 2024, 1:15pm

Hello Cheryl, I believe the issue you are having is related to how you load the data. Currently you are using list_datasets to return all files in ‘raw_ephys_data’ that end in ‘timestamps.npy’. This may include several different dataset types, such as ‘_spikeglx_ephysData_g0_t0.imec1.timestamps.npy’, ‘_iblqc_ephysTimeRmsLF.timestamps.npy’, and ‘_iblqc_ephysTimeRmsAP.timestamps.npy’. You may want to further filter the datasets using a more specific filename pattern. Likewise, ‘load_datasets’ returns a list of loaded datasets the length of your input, and a list of dictionaries that contain info on those datasets (such as filename in the ‘rel_path’ key). Use the second output argument to assert that you have loaded the correct file.

Note that the 2 column ephysData.raw.timestamps dataset type has a filename such as this: _spikeglx_ephysData_g0_t0.imec0.timestamps.npy. You can filter for these files with the pattern ‘spikeglx_ephysData*.timestamps.npy’.

Finally, we have a notebook in the documentation on how to work with LFP data which may help you load the data correctly. In general you should not need to load the raw ephys data. Here’s the notebook: Loading Ephys Data (AP and LFP band) — IBL Library documentation (apologies for the formatting, we will fix this shortly).

Let us know if you have any followup questions,
Miles.

babou_osellot · January 22, 2024, 8:41am

Thank you for your answer.

I have read the tutorials it does not explain how to align the lfp in time to the data in spikes, or wheels etc etc. I think it is a gap in the documentation. I’m not certain you are looking at what I am looking at in the documentation. I am looking for an object called ephysData.raw.timestamps in the documentation it is here:

I see the file are pointing to in the file structure the _spikeglx_ephysData_g0_t0.imec.timestamps.npy inside IBL_data_cache/churchlandlab_ucla/Subjects/UCLA011/2021-07-20/001/raw_ephys_data/probe00 P.s. this is just an example it does not correspond to the eid I provided. But this file is never mentioned or described in the documentation, maybe I am accessing it incorrectly. Is there a way to call

If it is stored in a file called something else, it is confusing to list it this way in the docs. By the way the link to the SpikeGLX doc on page 26 is broken.

So the code should be…

#IBL SWR detector
import os
import subprocess 
import numpy as np
import pandas as pd
from scipy import io, signal, stats
from scipy.signal import lfilter
#from fitter import Fitter, get_common_distributions, get_distributions
import scipy.ndimage
from scipy.ndimage import gaussian_filter
from scipy.ndimage import gaussian_filter1d
from scipy import interpolate
import matplotlib.pyplot as plt
# for ripple detection
import ripple_detection
from ripple_detection import filter_ripple_band
import ripple_detection.simulate as ripsim # for making our time vectors
from tqdm import tqdm
from iblatlas.atlas import AllenAtlas
from iblatlas.regions import BrainRegions
from one.api import ONE
import spikeglx
from brainbox.io.one import load_channel_locations
from brainbox.io.spikeglx import Streamer
from neurodsp.voltage import destripe_lfp

#THIS CODE WORKS THIS CODE LOOPS THROUGH THE SESSIONS AND DOWNLOADS THE DATA, WE NEED TO ADD THE RIPPLE DETECTION CODE TO REMOVE THE DATA AFTER 
from neurodsp.voltage import destripe_lfp
from ibllib.plots import Density
import time # for debugging
import traceback
import logging

from multiprocessing import Pool, Process, Queue,  Manager

from one.api import ONE
ONE.setup(base_url='https://openalyx.internationalbrainlab.org', silent=True)
one = ONE(password='international')

eid = '0c828385-6dd6-4842-a702-c5075f5f5e81' # just to keep the naming similar to the IBL example scripts, bit silly but helps to write the code
probelist, probenames = one.eid2pid(eid)

probe_name = probenames[0]

lfp_time_timestamps_path = one.list_datasets(eid, collection=f'raw_ephys_data/{probe_name}', filename='spikeglx_ephysData*.timestamps.npy')
            lfp_time_timestamps, _ = one.load_datasets(eid, lfp_time_timestamps_path, download_only=False)

I am trying that code above but all I get is a Nonetype object for lfp_time_timestamps. I can see the file I want in the directory but the code is note loading it.

As a bit of an aside I would encourage the IBL to look at how the Allen Brain Institute has organized their api and data:
The ABI dataset automatically loads both the LFP and its timestamps all together in a single xarray object, ecephys_lfp_analysis see the 11th cell there. I would encourage the IBL to adopt a similar approach. Overall this method the IBL has settled on of calling individual files (not just in the LFP data) is bit confusing for an end user, especially because the naming scheme used is not intuitive or obvious unless you actually work with neuropixels yourself. The documentation for finding things is also incomplete, no where in it is there any mention of this file you’ve pointed out to me.

juhuntenburg · January 24, 2024, 1:09pm

Hi Cheryl,

There is a small glitch in your code, the underscore before spikeglx is missing. I think that’s because in Miles’ reply the underscore was understood as italics formatting by Markdown. It should work like this:

lfp_time_timestamps_path = one.list_datasets(eid, collection=f'raw_ephys_data/{probe_name}', filename='_spikeglx_ephysData*.timestamps.npy')
lfp_time_timestamps, _ = one.load_datasets(eid, lfp_time_timestamps_path, download_only=False)

Let me know if this works for you!
Best,
Julia

babou_osellot · January 24, 2024, 7:20pm

Oh sorry for bothering you for such an obvious oversight, and thanks so much for helping me.

I do think these timestamp files should be pointed out in the LFP documentation and maybe a suggestion for how to make a time series that lines up with the rest of the data. Seems like something most users would want to know about.

I’ve been using this…

lfp_time_timestamps_path = one.list_datasets(eid, collection=f'raw_ephys_data/{probe_name}', filename='_spikeglx_ephysData*.timestamps.npy')
 lfp_time_timestamps, _ = one.load_datasets(eid, lfp_time_timestamps_path, download_only=False)

adjusted_samples = lfp_time_timestamps[:, 0] / 12 # divide by 12 for the LFP sampling rate, (1/12th of AP band)
adjusted_times = lfp_time_timestamps[:, 1]

# destripped is the destriped lfp data for that probe
sample_numbers = destriped.shape[1]
lfp_time_index_og = np.interp(sample_numbers, adjusted_samples, adjusted_times)

If there’s a better solution let me know.

babou_osellot · January 27, 2024, 4:24am

The way the timestamps are stored is very confusing:

2 columns file containing time synchronization information for the AP binary file: sample index in the first column and session time in the second column. Note that sample indices may not be integers

How can the sample indices not be integers? Will the method I’m using to interpolate a sample index from this still work? I think it will but I’d like someone’s input on this before I go trusting it.

I’m curious, why the data stored like this? Are these from tests of the clocks which are occurring asyncrhonously to the sampling mechanism in the probe? Its hard to wrap my head around this and it makes me nervous about my methodology. This is very important so I can align events in LFP with the rest of the data.

babou_osellot · January 27, 2024, 5:32am

@k1o0 @juhuntenburg @owinter @Gaelle_Chapuis

I have the correct files now but I cannot interpolate the time series from the timestamps or at least it is not obvious how to do this. The sample numbers not being reported in integers is very strange, is it scaled in some way? I’ve devided the sample numbers by 12 as owinter suggested in another post as these numbers are supposed to be for AP Band and I am working with LFP band (which is downsampled from 30000hz to 1250hz).

The non-integer sample values is screwing up interpolation functions. I’ve tried the following and neither seem to work. I am reinterpolating the time to 1500 hz so it works with a filter from another lab. I do not think the reinterpolation is the issue as in the ABI data I run this and it works just fine. The times being interpolated from your timestamps are not accurate.
My code is referenceing a destripped lfp file but you don’t need to load that since for this section all you need is the number of samples in the lfp data which is…11397043 so if you just put…sample_numbers = np.arange(11397043) you should get the same results as me in the code below.

This code produces an array which is far too short after I reinterpolate it…

lfp_ca1 = destriped[ca1_chans,:]

# sample index in the first column and session time in the second column. Note that sample indices may not be integers

lfp_time_timestamps_path = one.list_datasets(eid, collection=f'raw_ephys_data/{probe_name}', filename='_spikeglx_ephysData*.timestamps.npy')

lfp_time_timestamps, fileinfo = one.load_datasets(eid, lfp_time_timestamps_path, download_only=False)

adjusted_samples = lfp_time_timestamps[0][0]/12 # devided by 12 to get the number ofr LFP instaed of AP

adjusted_times = lfp_time_timestamps[0][1]

sample_numbers = np.arange(lfp_ca1.shape[1])

del lfp_time_timestamps

del fileinfo

# Interpolate to get lfp_time_index_og

lfp_time_index_og = np.interp(sample_numbers, adjusted_samples, adjusted_times)

lfp_ca1, lfp_time_index = resample_signal(lfp_ca1, lfp_time_index_og, 1500.0)

print(lfp_ca1.shape)

print(lfp_time_index.shape)

When I resample this the vector is now…
(56, 29999)
(29999,)

and if I print the final value of the original time series we see it gives 19 seconds…lfp_time_index_og[-1]

Mean while this code makes a time series which is far to large…

from scipy.interpolate import interp1d

lfp_ca1 = destriped[ca1_chans,:]

# sample index in the first column and session time in the second column. Note that sample indices may not be integers

lfp_time_timestamps_path = one.list_datasets(eid, collection=f'raw_ephys_data/{probe_name}', filename='_spikeglx_ephysData*.timestamps.npy')

lfp_time_timestamps, fileinfo = one.load_datasets(eid, lfp_time_timestamps_path, download_only=False)

adjusted_samples = lfp_time_timestamps[0][0]/12 # devided by 12 to get the number ofr LFP instaed of AP

adjusted_times = lfp_time_timestamps[0][1]

sample_numbers = np.arange(lfp_ca1.shape[1])

del lfp_time_timestamps

del fileinfo

# Create an interpolation function

interp_func = interp1d(adjusted_samples, adjusted_times, fill_value="extrapolate")

# Use the interpolation function to get the times each sample was taken

lfp_time_index_og = interp_func(sample_numbers)

lfp_ca1, lfp_time_index = resample_signal(lfp_ca1, lfp_time_index_og, 1500.0)

print(lfp_ca1.shape)

print(lfp_time_index.shape)

And you get this error…
ValueError: Maximum allowed size exceeded

And sure enough looking at lfp_time_index_og[-1] we get a value of…2.381503231392968e+18 seconds.

In other words ~754,693,049,213 years.

babou_osellot · January 27, 2024, 9:17pm

This may be of use, there is a section on synchronizing timestamps…

owinter · January 29, 2024, 10:50am

Hello,

The timestamps are stored in a numpy array, which one column are the samples, the other the corresponding times for the main session clock.
Since the table is stored as a single array, there is no mixed typing and all values are stored as double.

The interpolation is correct. (we do it here in our codebase: https://github.com/int-brain-lab/ibllib/blob/31285365466f2ed473b572fcf4b3a97e73c97396/brainbox/io/one.py#L1091)

I recommend that you add an assertion to make sure that the output times are sampled at ~2500 Hz
(NB: 1250Hz is your Nyquist not the sampling frequency). If this assertion passes, the interpolation is solid.

lfp_time_timestamps = np.load("/datadisk/Data/NAWG/01_lick_artefacts/openalyx/angelakilab/Subjects/NYU-65/2022-09-12/002/raw_ephys_data/probe00/_spikeglx_ephysData_g1_t0.imec0.timestamps.npy")
adjusted_samples = lfp_time_timestamps[:, 0]/12 # devided by 12 to get the number ofr LFP instaed of AP
adjusted_times = lfp_time_timestamps[:, 1]
sample_numbers = np.arange(250_000)
times = np.interp(sample_numbers, adjusted_samples, adjusted_times)
assert np.all(np.isclose(1 / np.diff(times), 2500))

I ran this successfully on many insertions, if you have one you’d like me to try I can.

What happens after is a bit unclear as I do not know what is resample_signal nor the dimension of lfp_ca. Here I would check 2 things:

By design raw data has a dimension (nsamples, nchannels) when read in native numpy C-ordered fashion. So here I’d make sure I know which dimension is the time one in lfp_ca1 and which dimension resample_signal is actuating on. There are good reasons for acquiring the data in (nsamples, nchannels) c-order but processing in (nchannels, nsamples) c-order.
you’re actually downsampling from 2500Hz to 1500 Hz I would recommend to use scipy.signal.decimate — SciPy v1.12.0 Manual then resample, or make sure the resample_signal has an anti-alias, or design your own anti-alias filter before resampling.

Note that since you first contacted us, we have extended some tools from the AP to LF band that load in the raw data along with the interpolation functions.

babou_osellot · February 1, 2024, 6:31am

Sorry for my delayed reply I was travelling. I tried out using decimate as you suggested but I ended up going back to my resampling function because the fraction is a non-integer. Thank you for pointing out the actual sampling is 2500, although for my resample_signal doesn’t require the user to input the original signals sampling rate. It computes it automatically. I think my code is working now.

I guess the last thing I want to ask is if the

Note that since you first contacted us, we have extended some tools from the AP to LF band that load in the raw data along with the interpolation functions.

Excellent, I’ll read the docs on the AP methods. Do you have docs for the LF tools yet?

owinter · February 6, 2024, 1:00pm

Hello,

Yes the loading / synchronisation documentation for raw data (both AP and LF) are here:
https://int-brain-lab.github.io/iblenv/notebooks_external/loading_raw_ephys_data.html

For processing tools, we rely on existing toolboxes and don’t have a specific workflow.

babou_osellot · February 6, 2024, 5:54pm

Very nice thank you. I will try incorporating them into my workflow. I think my script is working though, I have been able to build regressions with effects using the LFP events I’ve been detecting and they seem to be associated with behaviour as expected.

Thanks for all your work in the last few months I think the pipeline is working as intended. I’d be happy to clean up my scripts and make some visualizations of the pipeline if you wanted documentation. So far I have a sharp wave ripple detector, filtering out motion artifacts, and scripts for finding different frequency bands. Basic stuff. Been thinking about adding one that detects cross frequency coupling and granger causality, I know those are popular analyses.

owinter · February 12, 2024, 11:38am

Hello,

Glad that you managed to get something working in the end.

I would be quite interested to see your methods an results when you are ready to share them (don’t want to add on top of your current analysis) !

If you feel comfortable sharing this, I’d be happy to hop on a short call to see the outcome !

Best of luck for your analysis !