GSoC 2021 project idea 12.2: Conversion of public neurophysiology datasets to NeuroData Without Borders format

More and more of the experimental datasets behind publications in neuroscience are being publicly released, increasing transparency of the scientific process and allowing reuse of the data for new investigations. However, these datasets, which can include electrophysiology, 2D/3D imaging data and behavioural recordings, are not always in an accessible format, and it may take some effort for researchers to access and analyse the data before deciding to use it in their own research. The Neurodata Without Borders (NWB, https://www.nwb.org) initiative is developing a format for sharing data from neurophysiology experiments which, together with APIs for handling files in the format, promises to greatly facilitate the sharing and reuse of data in neuroscience.

This project will involve converting a number of publicly available datasets to NWB format, adding structured metadata to ensure maximal understandability and reusability of the data. The converted datasets will be made available through the new NWB Explorer on the Open Source Brain repository (http://nwbexplorer.opensourcebrain.org) which allows visualisation of the data as well as interactive analysis through an inbuilt Jupyter notebook.

Skills required: Python; open source development; neuroscience (experimental or computational) background; data analysis.

Aims:

  1. Select a number of publicly available datasets which require conversion to NWB format (see here for examples).

  2. Read, understand and appreciate original publications related to data, convert datasets to NWB format, adding annotations and metadata to facilitate interpretability & reuse of the data by others. Document process to aid others.

  3. Make data available via the NWB Explorer on the Open Source Brain repository

Mentors: Padraig Gleeson @pgleeson (lead), Ankur Sinha @sanjayankur31

Tags: OSB, NWB, Python, HDF5, data analysis, open access.

2 Likes

Hi everyone, I’m a PhD student in neuroscience at the Sorbonne in Paris, working on motor learning in zebrafish using calcium imaging. I have experience with neuroscience data as well as computational modelling as well as high-throughput behavioural experiments on rodents. I personally found the datasets on zebrafish calcium imaging data from the Ahrens Lab, as well as voltage imaging data in mice (a potential interest) as targets. Could you please tell me how I could go about starting to look at the conversion of these datasets?

Thank you,
Sharbat

1 Like

Hi @sharbatc , many thanks for reaching out. Also, please could you start reading up/researching about the topic from the links/resources shared in the project idea already. You can then followup with specific questions that come to your mind here so that the mentors can give you pointed answers for your queries.

I am tagging the mentors of the project so that they may reply to you. @pgleeson , @sanjayankur31

2 Likes

Hi @sharbatc,
Thanks for your interest in the project. I’ll add more info shortly after INCF gets accepted as an official organisation, but it will be similar advice to last year, see here: GSOC 2020 project idea 31: OSB: Conversion of public neurophysiology datasets to NeuroData Without Borders format - #4 by pgleeson.
Regards,
Padraig

2 Likes

Advice for OSB/NWB GSoC applicants

Background reading

Read the Open Source Brain paper as well as the recent Neurodata Without Borders paper. Note the OSB paper only briefly discusses extensions for NWB; OSB is undergoing a major expansion to allow sharing of data as well as models in neuroscience. The beta site for sharing NWB files on OSB is here: http://nwbexplorer.opensourcebrain.org .

Suggested activities prior to application

Sign up to GitHub if you’re not already there.

Create an OSB user account & link your GitHub account to it.

Have a look at the example converted data sets which have been put online here: http://nwbexplorer.opensourcebrain.org .

There are scripts for converting different data formats (e.g. Matlab, IgorPro) to NWB format here.

Install pynwb and get some of the above scripts/notebooks working locally.

Make a minor update to the existing scripts (or just README) to improve these existing examples.

There is also a list of potentially interesting datasets which could be converted to NWB here: https://github.com/OpenSourceBrain/NWBShowcase/issues.

Some datasets which were converted during last year’s GSoC project were: Ferguson et al. 2015 and Lantyer et al. 2018.

Find some other public datasets (e.g. single cell electrophysiology recordings, population (calcium) imaging, behavioural studies) which you think would be appropriate for conversion to NWB format, to list with your application. Focus on datasets that are well described/structured/annotated, but in a non-NWB format (to minimise need to involve original data producers)! Also open issues as outlined above with links to the data.

Note: Please share the draft of your application early to allow feedback before the application deadline!

Essential information to include in your application:

  1. The list of potential datasets to convert as discussed above
  2. Details on the course currently being followed and a link to the course webpage.
  3. What are your time commitments during the coding period? Please be specific about this, work/exam commitments etc. Are you planning any vacations this summer? How many classes are you taking this summer?
  4. How many hours per week will you be able to spend on this project?
  5. If you have any evidence of your coding abilities (e.g. contributions to open-source projects) and/or background in neuroscience, please let us know about it. Send links to specific public repositories showing commits by you.
  6. Details of any previous experience in data analysis or computational modelling.

Hi all, I’m a PhD candidate in Neuroscience at Emory University studying memory in health and disease. I’m really excited about this project idea and I have experience collecting and analyzing extracellular electrophysiology data during behavior in rodents. I’ve been going through the advice for applicants, and I had few questions as I begin working on my application.

  • Is there an expected number of datasets that can be converted over the course of the project? The explorer seems pretty flexible in accepting NWB files, is most of the project focused on converting the data or are there extra steps after that stage to make it suitable as an example for the explorer?
  • In the process of trying to convert my own lab’s data, I’ve been finding repositories from several labs that are in the process of having their in-house data format converted to the NWB format. Is this something to keep in mind for picking datasets? Or does it matter since many of the public datasets are more curated versions of lab data.

Hi @stephprince,
Thanks for your interest in the project.

There are certainly more NWB datasets around compared to last year, but there will always be more public datasets available in whatever original format the experimenters used, which they may not have the time or inclination to translate themselves. Translating at least one of these from original format to NWB would be a good first step. A second and possibly third dataset would be much quicker to convert than the first.

That said, testing NWBE with existing datasets and ensuring the data & metadata are presented well in the application will be important and some of the project will involve testing/debugging/updating the interface itself. This would be done in conjunction with the NWBE developers. An application would be strengthened if you could show an understanding of how this might happen (a simple PR to the NWBE repo could help), or had ideas on how to improve the overall experience (e.g. “I think feature X could be added to NWBE to improve accessibility of NWB datasets in general”).

1 Like

The student application period closes next Tues, so anyone interested in applying for this project should create a draft proposal on the GSoC website ASAP, where specific feedback can be provided.

Please don’t leave applications until the last minute!

Regards,
Padraig