GSoC 2023 Project Idea 21.2 Open Source Brain - Conversion of public neurophysiology datasets

arnab1896 · February 24, 2023, 11:45am

Title: Conversion of public neurophysiology datasets to Neurodata Without Borders (NWB) format

Description:
More and more of the experimental datasets behind publications in neuroscience are being publicly released, increasing transparency of the scientific process and allowing reuse of the data for new investigations. However, these datasets, which can include electrophysiology, 2D/3D imaging data and behavioural recordings, are not always in an accessible format, and it may take some effort for researchers to access and analyse the data before deciding to use it in their own research. The Neurodata Without Borders (NWB, https://www.nwb.org) initiative is developing a format for sharing data from neurophysiology experiments which, together with APIs for handling files in the format, promises to greatly facilitate the sharing and reuse of data in neuroscience.

This project will involve converting a number of publicly available datasets to NWB format, adding structured metadata to ensure maximal understandability and reusability of the data. The converted datasets will be made available through the NWB Explorer on the Open Source Brain repository (https://v2.opensourcebrain.org) which allows visualisation of the data as well as interactive analysis through an inbuilt Jupyter notebook.

Aims/objectives:

Select a number of publicly available datasets which require conversion to NWB format (see here for examples).
Read, understand and appreciate original publications related to data, convert datasets to NWB format, adding annotations and metadata to facilitate interpretability & reuse of the data by others. Document process to aid others.
Make data available via the NWB Explorer on the Open Source Brain repository

Skill level: Junior+/mid

Required skills: Python; open source development; neuroscience (experimental or computational) background; data analysis.

Time commitment: Flexible (175/350 h)

Lead mentor: Padraig Gleeson (@pgleeson on GitHub)

Project website:
https://v2.opensourcebrain.org
https://nwb.org

Backup mentors: Ankur Sinha (@sanjayankur31 on GitHub)

Tech keywords: Python, HDF5, data analysis, open access, NWB

yash786 · February 27, 2023, 7:12pm

Dear Mentors @pgleeson @sanjayankur31 ,
I am a 3rd-year engineering student at School of Engineering, Jawaharlal Nehru University pursuing Bachelor of Technology in Electronics and Communication Engineering.
I’m new to open source community but as a quick learner, I adapt to new environment easily!
I found this project interesting as well as it is necessary to be completed for neuro researchers to help them in their research.
I would love to apply for this project in GSOC 2023. Please suggest or advise me next steps to be taken to start contributing in this project.
Looking forward to the opportunity to work with the INCF community.

Thank & Regards,
Yash Kumar

sanjayankur31 · February 28, 2023, 9:17am

Hi @yash786, that’s great to hear! I think the first thing to do would be to familiarise yourself with the NWB ecosystem. So maybe start with a few tutorials from the documentation:

https://nwb-overview.readthedocs.io/en/latest/

and

https://www.nwb.org/how-to-use/

We tend to use Python for most of the work here, so it’ll be good to brush up on your Python skills too:

using virtual environments
general Python usage, and using modules, packages, classes, and so on. (search the Python documentation for more information on these)

Finally, tinker with Open Source Brain v2, and the apps there. See what you think—if you run into any issues, file them on GitHub and so on.

How does that sound for a start?

yash786 · February 28, 2023, 9:36am

Hello Sir @sanjayankur31 ,
It sounds good for a start, will update you as soon as possible on my learning status as well as on issues (if I face)!

Thank you for responding!

pgleeson · February 28, 2023, 5:56pm

Hi @yash786. Thanks for your interest. We’ll be adding more info here soon, but in the mean time, please see advice for applicants from a similar project last year: GSoC 2022 Project Idea 9.2: Conversion of public neurophysiology datasets to Neurodata Without Borders (NWB) format (175/350 h) - #5 by pgleeson

DakshRathore · March 1, 2023, 10:48am

Dear Mentors @sanjayankur31 @pgleeson ,

My name is Daksh Rathore, I’m a pre-final year undergraduate student with a deep interest in Data Science and Research and some experience in Analytics and Deep Learning. I am very interested in contributing to “Open Source Brain - Conversion of public neurophysiology datasets” under INCF for Google Summer of Code 2023 .

Gist about me:

• Currently, I’m in my 3rd year ,pursuing B.Tech in Electronics and Communication at Thapar Institute of Engineering, Patiala (CGPA: 9.02 ) . I have received a total scholarship of 1.2 Lakhs on account of being one of the top performers of our college .

• In my second year, I worked on a research project as an intern where I supposedly combined multiple PCA Scaled data from a custom Dataset containing sensor values of Diabetes patients .

• Currently pursuing research in PUF modelling attacks based on ML derivation under the guidance of Dr. Gaganpreet Kaur .

• I’ve taken part in various competitions including IIT-Bombay’s Techfest Weldright Machine Learning Competition where I qualified to reach the Finals from over 300+ teams that took part . The theme involved designing an accurate failsafe classification model that prevents the machines, at the Godrej facility, from running into defects .

• Worked on extensive Data Visualization using python and won coding contests organized by college in C++ .

Goals :

I’m passionate about data science , analytics as well as Open-Source based projects . I want to work with INCF to advance my knowledge in this field and contribute to new and exciting projects along the way .

What makes me suitable?

I have extensive experience in data visualization , data structures , algorithms, C++ and python . I am quite confident that I can grasp the more advanced topics needed to contribute effectively to the project quite quickly .
I have experience in working with teams and mentors from my research experience , which
Along with my strong problem-solving skills and the passion to learn and explore new tool and techniques , I believe I can be a great fit for the team .

Furthermore, I would like to ask if there are any specific areas in the project that require immediate attention or enhancements that I could work on. I am more than willing to work on any assigned task and contribute as much as possible to your project. I am looking forward to your response and guidance.

Thank you for considering my request to join your GSoC project as a contributor.

Best regards,

Daksh Rathore .

sanjayankur31 · March 1, 2023, 11:22am

Hi @DakshRathore . That’s great to hear. I think for the moment, please start by going through the link that pgleeson has noted in the post above to familiarise yourself with the NWB etc. ecosystem.

DakshRathore · March 1, 2023, 12:40pm

Thank you for your quick reply.
I appreciate your prompt response and will start working on this right away.

yash786 · March 1, 2023, 3:32pm

Sure sir, I will look into advices and it will be great if you can tell me some good first issues to start as there are 26 open issues currently and I’m little bit confused, on which issue I should work first to get better understanding and insights.

sanjayankur31 · March 2, 2023, 9:17am

@yash786 : are you well versed with the NWB ecosystem yet? If not, I’d first get to grips with that, and then try to work on issues. These projects are very domain specific, so you must know the domain basics before you can work on tasks.

yash786 · March 2, 2023, 4:06pm

Sure sir, I have covered the basic part of NWB ecosystem and have to implement the code part which is actually converting the data into NWB format and analyzing those files. Actually, I just got attracted towards the issue part by viewing some issues as this project is very interesting and have much more to explore but thanks to you for guiding me once again, will update you on my learning progress further as soon as possible.
I hope you will keep guiding me like this so that I can contribute my best to this project!

pgleeson · March 21, 2023, 12:57pm

Advice for 2023 OSB/NWB GSoC applicants

Background reading

Read the Open Source Brain paper as well as the recent Neurodata Without Borders ecosystem paper. Note the OSB paper only briefly discusses extensions for NWB; OSB is undergoing a major expansion (v2.0) to allow sharing of data as well as models in neuroscience. The beta site for sharing NWB files on OSB is here: http://v2.opensourcebrain.org and a standalone instance of the NWB Explorer (accessible without logging in) can be found here: http://nwbexplorer.opensourcebrain.org .

Suggested activities prior to application

Sign up to GitHub if you’re not already there.

Create an OSB v2 user account & link your GitHub account to it.

Have a look at the example converted data sets which have been put online here: http://nwbexplorer.opensourcebrain.org .

There are scripts for converting different data formats (e.g. Matlab, IgorPro) to NWB format here .

Install pynwb and get some of the above scripts/notebooks working locally.

Make a minor update to the existing scripts (or just README) to improve these existing examples.

There is also a list of potentially interesting datasets which could be converted to NWB here: OpenSourceBrain/NWBShowcase.

Some datasets which were converted during previous years’ GSoC project were:

2020: Ferguson et al. 2015 and Lantyer et al. 2018 .
2021: GSoC_2021_OSB_NWB, WormsenseLab_ASH.

Find some other public datasets (e.g. single cell electrophysiology recordings, population (calcium) imaging, behavioural studies) which you think would be appropriate for conversion to NWB format, to list with your application. Focus on datasets that are well described/structured/annotated, but in a non-NWB format (to minimise need to involve original data producers)! Also open issues as outlined above with links to the data.

Note 1: There are an increasing number of NWB compatible datasets available on the DANDI Archive. For this reason, there is a pressing need to test and ensure these are compatible with our NWB Explorer, rather than make new datasets which will be compatible with it from the start. Applicants who would be prepared to work to test the NWBE interface and make updates for compatibility with other independently developed datasets (e.g. as last year’s applicant did ) would be very welcome!

Note 2: Please share the draft of your application early to allow feedback before the application deadline!

Essential information to include in your application:

The list of potential datasets to convert as discussed above
Details on the course currently being followed and a link to the course webpage.
What are your time commitments during the coding period? Please be specific about this, work/exam commitments etc. Are you planning any vacations this summer? How many classes are you taking this summer?
How many hours per week will you be able to spend on this project?
If you have any evidence of your coding abilities (e.g. contributions to open-source projects) and/or background in neuroscience, please let us know about it. Send links to specific public repositories showing commits by you.
Details of any previous experience in data analysis or computational modelling.

pgleeson · March 23, 2023, 6:38pm

Hi all. Please submit your proposals well before the deadline on the GSoC site, where you can upload a draft pdf, which we can see and comment on.

It is very important though to demonstrate a clear knowledge in that of the current state of data sharing in neuroscience, and have identified a list of (recent) models of public datasets which would benefit from conversion into NWB, and oultline how you would go about doing that. Example code in a GitHub repo that you have created, making a start on converting a dataset to NWB would significantly benefit your application.

sanjayankur31 · March 27, 2023, 1:39pm

Thanks for submitting your draft proposals everyone. We’ll be going through them in due time to provide feedback.

In the meantime we did want to note that we’re likely going to be able to accept only one candidate here (if we receive any slots after the ranking and so on), so we encourage all candidates to please also apply to other projects that they are interested in.

yash786 · March 27, 2023, 6:19pm

Hello mentors @pgleeson , @sanjayankur31 ,
I, recently recovered from the Dengue fever and I guess, it’s late to say all these things but I really want to contribute in this project. I have tried some sample datasets which are mentioned by @pgleeson . I’m little bit confused as what should I include in my proposal and how to structure them in an understandable and easy manner, if you can share me some sample draft proposals, that will be great!
And where to submit draft proposal as gsoc site is only showing the option of submitting the proposal?

pgleeson · March 28, 2023, 3:06pm

Sorry to hear you’ve been unwell @yash786. It may be too late at this stage to submit a strong proposal, but go ahead if you still want to. You can upload a draft pdf on the GSoC site which we can see and give feedback on. I believe there are guidelines on the INCF site for how to structure your proposal. See also the recommended info to include above.

A crucial part of your application would be pointing to some public code on GitHub where you have started to convert some data to NWB (even a dummy data set), which can be viewed by our NWB Explorer.

DakshRathore · March 28, 2023, 8:48pm

Dear Mentors @pgleeson @sanjayankur31 ,

I hope this message finds you well. I wanted to update you on my progress regarding the proposal that I have been working on. I have started working on a dataset ( referred from here ) that I would like to use in my proposal, and I am making good progress .However, I would like to complete the dataset before I submit my proposal .

In this regard, I would appreciate your assistance in reviewing the dataset and providing feedback on any areas that may need improvement. I have created a GitHub repository ( in initial stage ) for the dataset. I have also learned a lot about the .nwb file format and pynwb, and with your help, I am confident that I can master these tools.

Additionally, I would like to mention that I am aware of my inexperience in working on datasets, and I apologize in advance for any mistakes that I may have made. I am committed to learning and improving, and I am grateful for the opportunity to work with experienced mentors like you.

I am excited to continue working on the proposal and would appreciate any guidance you can offer relating to the dataset that I have been working on. Thank you for your continued support.

Best regards,
Daksh Rathore

yash786 · March 29, 2023, 4:26am

Will try to submit proposal asap and it will be great if you can tell me any one potential dataset which can be quickly done by a beginner as there are many on OpenSourceBrain/NWBShowcase/issues this site.
Thank you for quick reply!

pgleeson · March 29, 2023, 4:18pm

Thanks @DakshRathore. Notebook looks good. My only suggestion would be to add some plots of the data before conversion and then generated from the NWB version to show you understand what type of data is contained in the dataset.

pgleeson · March 29, 2023, 4:25pm

@yash786 To be honest part of the process of selection is letting applicants find datasets, evaluate them, assess the difficulty in conversion the original format and decide if they are a good candidate for conversion to NWB. Which is why we don’t want to say “convert this 1 dataset”. Note “neuroscience (experimental or computational) background” is a required skill, and that should be reflected in the application.
At this stage it may be best to use pynwb to create a “toy” NWB dataset, which can be viewed on http://nwbexplorer.opensourcebrain.org.