GSoC 2025 Project #31 UCSD Projects :: Development of data standards interchange via LinkML (350h)

arnab1896 · March 8, 2025, 8:14pm

Mentors: Tom Gillespie <tgbugs@gmail.com> and Jeff Grethe <jeffrey.grethe@gmail.com>

Skill level: Intermediate or greater

Required skills: Python

Time commitment: Full time (350 hours)

About: There are a number of existing data standards that are actively in use in the neuroscience community. A long standing goal is to enable conversion between different standard formats to increase the visibility of datasets across platforms and to make it possible to leverage existing tooling that expects an alternate format. Examples of these standards are Brain Imaging Data Structure (BIDS), SPARC Dataset Structure (SDS), openMINDS, DANDI, etc.

Aims: In this project you will learn about the tools for specifying data standards (such as LinkML) and use them to create mappings between SDS and BIDS. At the same time, you will learn about building data pipelines for converting from one standard to another using Python and the mappings specified in LinkML. As the project progresses additional standards can be added to the converter and more complete mapping from one data standard to another can be pursued.

Websites:

Information about the SPARC Dataset Structure (SDS): https://www.incf.org/sparc-data-structure
Information about the Brain Imaging Data Structure (BIDS): https://www.incf.org/sbp/brain-imaging-data-structure-bids
Initial LinkML model for the SPARC Dataset Structure: sparc-curation/resources/linkml at master · SciCrunch/sparc-curation · GitHub
YAML schema specification for the Brain Imaging Data Structure: bids-specification/src/schema at master · bids-standard/bids-specification · GitHub

Tech keywords: Python, YAML, LinkML, data standards, BIDS, SPARC, SDS, JSON Schema

vrun · March 15, 2025, 8:28am

Dr. Gillespie and Dr. Grethe,

I am interested in contributing to the GSoC 2025 project on LinkML-based data standards interchange. As an undergraduate Data Science student at IIT Madras with experience in Python, data standards, and open-source contributions (GSoC 2023 with INCF), I am eager to apply my skills to this project.

I would appreciate any guidance on how to get started. Looking forward to your response.

Best regards,
Vrushali

v29khare · March 20, 2025, 5:52am

Hi Tom Gillespie <tgbugs@gmail.com> and Jeff Grethe <jeffrey.grethe@gmail.com>

I am Varnika Khare, I am currently a PhD scholar in Cognitive Science at IIT Hyderabad, India, working on motor memory consolidation.

While my research primarily involves behavioural and neural data analysis, I have experience with Python (especially for data visualization), and reinforcement learning and I currently use MATLAB for EEG signal processing.

I am interested in applying for this GSoC project “Development of data standards interchange via LinkML ”.

I am currently going through the websites laid out in the project description. I would love to know the next steps and some guidance regarding the proposal formation.

Best Regards

Sayan_Mandal1 · March 24, 2025, 3:30pm

Dear Tom and @jgrethe ,

I hope this message finds you well.

My name is Sayan Mandal, and I am a final-year Information Technology undergraduate at KIIT, Bhubaneswar. I have a strong background in Python and data pipeline development, with hands-on experience working on projects involving data standardization, machine learning, and AI.

During my research internships at IIEST Shibpur and Jadavpur University, I worked extensively with data transformation, model development, and implementing explainable AI techniques. I am particularly interested in contributing to the “Data Standard Conversion in Neuroscience” project for GSoC 2025. The opportunity to leverage Python, LinkML, and YAML to build data pipelines that facilitate interoperability between standards like SDS and BIDS excites me.

To better prepare for contributing to this project, I have started exploring:

The LinkML model for the SPARC Dataset Structure.
The YAML schema specification for the Brain Imaging Data Structure.
Python-based data pipeline development techniques for data standard mapping.

I would love to gain more insight into how I can align my proposal with the project’s objectives. Would you recommend any specific areas I should focus on or any particular aspects of the SDS-BIDS mapping that require additional attention?

I’m also eager to explore opportunities to extend the project by incorporating additional standards in the later stages. Please let me know if there are any open issues or related discussions that I should look into to get started.

Thank you for your time and guidance. I’m excited about the possibility of contributing to this important work and look forward to hearing from you.

Best regards,
Sayan Mandal
sayanjones77@gmail.com

sonia · April 8, 2025, 8:51am

Dear Dr. Gillespie and Dr. Grethe,

I’m Yong-Shin Jiang, a senior student from National Taiwan University, majoring in Medical Data Science and Veterinary Medicine. I’ve worked on several hands-on projects that connect real-world biomedical data with machine learning, including building data pipelines, working with SPARC-like structured datasets, and using JSON schemas to classify and standardize video data for clinical use.

I’m really excited about Project #31: Development of data standards interchange via LinkML, especially the idea of making biomedical data more interoperable across platforms. I believe having clean, standard-converted data can really speed up both research and clinical applications.

To prepare for this project, I have started studying the LinkML models for SPARC and YAML schema for BIDS, and plan to replicate and extend the SDS-BIDS mapping locally over the next week. I’m especially interested in:

Exploring mappings that preserve semantic integrity in physiological data fields,
Identifying edge cases in SPARC-BIDS conversion and proposing new schema modules if needed,
Extending the tool to support emerging standards such as openMINDS.

I would be deeply grateful for your feedback on how best to shape a proposal that’s meaningful to the INCF community. Are there any GitHub issues or conceptual challenges within the SDS-BIDS mapping process where I could start contributing or learning more?

Looking forward to contributing to this awesome project!

Best,
Yong-Shin Jiang
sonia.y.jiang@gmail.com