GSOC 2020 project idea 31: OSB: Conversion of public neurophysiology datasets to NeuroData Without Borders format

More and more of the experimental datasets behind publications in neuroscience are being publicly released, increasing transparency of the scientific process and allowing reuse of the data for new investigations. However, these datasets, which can include electrophysiology, 2D/3D imaging data and behavioural recordings, are not always in an accessible format, and it may take some effort for researchers to access and analyse the data before deciding to use it in their own research. The Neurodata Without Borders (NWB, https://www.nwb.org) initiative is developing a format for sharing data from neurophysiology experiments which, together with APIs for handling files in the format, promises to greatly facilitate the sharing and reuse of data in neuroscience.

This project will involve converting a number of publicly available datasets to NWB format, adding structured metadata to ensure maximal understandability and reusability of the data. The converted datasets will be made available through the new NWB Explorer on the Open Source Brain repository (http://nwbexplorer.opensourcebrain.org) which allows visualisation of the data as well as interactive analysis through an inbuilt Jupyter notebook.

Skills required: Required: Python; open source development; science background. Desirable: neuroscience (experimental or computational) background; data analysis; MATLAB; JavaScript.

Mentors: @pgleeson Padraig Gleeson (p.gleeson@ucl.ac.uk), @Matteo_Cantarelli Matteo Cantarelli (matteo.cantarelli@gmail.com)

Hello, @pgleeson @Matteo_Cantarelli, I’m Shuo (Nino), a senior student from Sun Yat-sen University, PRC. I watch Neuro Stars GSoC 2020 for a long time and find your project pretty attractive to me. I see the required skills of this project and believe I am qualified for this project. I have done data analysis work such as NBA Salaries Prediction, Northern American Opioid Crisis Analysis, etc. I think my strengths are my active attitude, mastery of Python programming, and experience of data analysis work. However, my past data analysis projects are all built on neat datasets, meaning I lack experience in format conversion. I consider this project is a good opportunity to learn new things and I have a strong will to contribute to this work.

I want to know if there are any tests for me before joining this project? And what could I do to prepare for this project other than continue reading the materials given (if you think I’m qualified)?

I’m looking forward to your reply, and feel free to contact me if you think I’m competent! Thank you!

Other personal information:

Advice for OSB/NWB GSoC applicants

Background reading

Read the Open Source Brain paper as well as the Neurodata Without Borders paper. Note the OSB paper only briefly discusses extensions for NWB; OSB is undergoing a major expansion to allow sharing of data as well as models in neuroscience. The beta site for sharing NWB files on OSB is here: http://nwbexplorer.opensourcebrain.org.

Suggested activities prior to application

Sign up to GitHub if you’re not already there.

Create an OSB user account & link your GitHub account to it.

Have a look at the example converted data sets which have been put online here: http://nwbexplorer.opensourcebrain.org.

There are scripts for converting different data formats (e.g. Matlab, IgorPro) to NWB format here: https://github.com/OpenSourceBrain/NWBShowcase.

Install pynwb and get some of the above scripts/notebooks working locally.

Make a minor update to the existing scripts (or just README) to improve these existing examples.

There is also a list of potentially interesting datasets which could be converted to NWB here: https://github.com/OpenSourceBrain/NWBShowcase/issues.

Find some other public datasets (e.g. single cell electrophysiology recordings, population (calcium) imaging, behavioural studies) which you think would be appropriate for conversion to NWB format, to list with your application. Also open issues as outlined above with links to the data.

Note: Please share the draft of your application early to allow feedback before the application deadline!

Essential information to include in your application:

  1. The list of potential datasets to convert as discussed above
  2. Details on the course currently being followed and a link to the course webpage.
  3. What are your time commitments during the coding period? Please be specific about this, work/exam commitments etc. Are you planning any vacations this summer? How many classes are you taking this summer?
  4. How many hours per week will you be able to spend on this project?
  5. If you have any evidence of your coding abilities (e.g. contributions to open-source projects) and/or background in neuroscience, please let us know about it. Send links to specific public repositories showing commits by you.
  6. Details of any previous experience in data analysis or computational modelling.

Hi @LovelyBuggies,
Thanks for your interest in the project. Please have a look at the general advice above for some pointers on getting acquainted with what’s required for the project.
Padraig

Hello, @pgleeson @Matteo_Cantarelli, I am Tanishka Gupta. Currently pursuing Computer Engineering. I have went through the documents mentioned above. I find this project very interesting. I have experience in python, javascript and MATLAB as well. I have also done many projects of data analysis on Jupyter notebook. Kindly let me know if the project is still open and what is the criteria to be considered for GSoC’20.

Also, Can I please have the link to the repository? Or from where can I start contributing for the same? Thankyou.

Hi @tanishka2000,
Yes, the project is still open, all student proposals for this will be evaluated equally after submission and hopefully the INCF will get enough slots for a student for this project.
Please see the Advice for OSB/NWB GSoC applicants above for suggested first steps and links to the relevant repositories.

Hi @pgleeson, I’m Abhineet. I’ve been following this conversation for a while and spent a decent amount of time reading the papers you suggested. I went through the github repository you listed and I feel it is an interesting opportunity. Right now, I’m looking at running some of the python notebooks, and trying to improve them, and I shall be submitting PRs for the same.
I had a doubt about suggesting new datasets part. I consider myself decent at python and analysing data, as I have some amount of experience of deep learning, but I’m fairly new to neuroscience. How do you suggest going about exploring datasets? Any specific newsletter/journal I should look at ? Currently, I’m planning to look at proceeding in some of the venues which I could find in the references of the two papers you suggested for reading.

I forgot to introduce myself, I’m Abhineet, currently in my pre-final year here at Indian Institute of Technology Ropar. My major is computer science, and I have experience with python, cpp and deep learning. I’ve worked in the field of 3D point cloud registration as well. My github username: abhineet99

Hi Abhineet,
Thanks for your interest. I’ve seen your draft proposal on the GSoC site and will add comments there.
Regarding understanding the contents of the experimental datasets, some previous experience in neuroscience would be beneficial, to ensure the data can be presented in an accesible way for reuse by other neuroscientists. Have a read of the NWB v1 paper, they should point to papers/reviews about the types of data the format can contain, e.g. electrophysiology, calcium imaging, behavioural measurements.
Regards,
Padraig

Hi Padraig, thanks for the comments on the draft. I will read the papers/reviews on different formats and let you know if there are any doubts.

Currently I’m looking at https://github.com/rutishauserlab/recogmem-release-NWB . As the conversion code(to NWB) for this dataset is also available, I think it should give deeper insights into the format.

Thanks @abhineet99, that looks like a good dataset.

I’m doing the analysis in this github repo: https://github.com/abhineet99/NWB-data/ . You may give comments/suggestions on the same.

Hi @pgleeson , I did some analysis and was able to understand the structure of the files, for the RutishauserLab data. The paper for the same was very useful. Can you provide any feedback on the work, the same can be found in https://github.com/abhineet99/NWB-data/ ?

1 Like

Thanks for that @abhineet99! Good that the data can be analysed in a Python notebook.