Seeking Feedback: Out-of-Core Multimodal Alignment Pipeline (Neuropixels + BIDS fMRI + Video)

Category : Software / Pipeline

Hello Everyone,

As for my studies I am currently a Bachelor student in Computer Science and my research interests are in high performance computing and neuroinformatics. I have been investigating the memory constraints of loading and registration of huge multi gigabyte sets (for example Allen Neuropixels) together with behavioral video and BIDS compliant fMRI.

To overcome typical RAM constraints in multimodal synchronization I have created and released as open-source an architectural prototype: NeuroAlign

The fundamental methodology employs:

Operating System-Level Memory Mapping: Overcoming the traditional memory allocation with zero-copy mmap for binary ephys files and nibabel proxy objects for NIfTI files.

Scheduled Temporal Coordination: An Object-Oriented (via Python Abstract Base Classes) design for the mathematical alignment of different sampling rates (30kHz vs 60 frames per second) to a single event-oriented timeline.

Data Persistence: Converting the synchronized data segments into HDF5 format for subsequent machine learning processing.

The package includes a CLI for testing and is fully BIDS-aware for TR extraction.

Repository: GitHub - BitForge95/High-Performance-Neuro-Data-Pipeline: A high-speed Python bridge for Experanto designed to align massive neural recordings (like Neuropixels) with behavioral data. Using OOP and memory-mapping, it handles datasets larger than RAM, automating multimodal synchronization and complex filtering. Built for scalable neuro-AI research. · GitHub

Created it, in part, to investigate the building parameters given in the ecosystem of the Experanto project, but I want to make it as good as possible for general use.

Any input from developers dealing with out-of-core data pipelines right now would be very much appreciated. Particularly:

Are there any other temporal edge cases in BIDS temporal metadata besides standard TR that I should be aware of in the loader?

What is the community doing about floating point precision drift when time aligning 30kHz data over long periods of time?

Thank you for your time and any insights you can provide.

RepetitionTime may not always be defined, for example, in sparse acquisition paradigms. I’m not sure if there’s any value to those data in your use case; you may only need to be aware of them enough to raise an informative error when you encounter them. (See Magnetic Resonance Imaging - Brain Imaging Data Structure 1.11.1 for the different cases.)

Hi @effigies,

Thank you so much for taking the time to reply and for pointing me toward the 1.11.1 spec.

Because the synchronizer relies on a continuous event timeline, a missing RepetitionTime would have definitely caused a math failure downstream. I have just pushed an update to the repository that explicitly checks for this edge case during the BIDS JSON parsing. It now safely halts execution and raises an informative ValueError alerting the user that sparse acquisitions are not yet supported for continuous alignment.

Thank you for the guidance.

My pipeline currently exports to HDF5. But for massive datasets like the Brain Wide Map, are most teams still sticking with local HDF5 and NWB files, or moving toward cloud-native formats like Zarr?

I would love to know what the community prefers for production right now.