Neurobagel is a federated data ecosystem that allows researchers and other data users to find and consume research data that has to remain at their original institute for data governance reasons. To make this possible, Neurobagel provides tools that make annotation, integration, and searching of data easier, and maintains a common data model that allows for federating data queries. However, the scope of query options can be daunting for a user, and obtaining the desired results often requires iteration. Making the search process more accessible and conversational would motivate people to use and contribute back to the federated harmonization ecosystem of Neurobagel, ultimately benefiting all users.
The Neurobagel cohort query workflow allows users to search for cohorts of individual research participants across federated data nodes hosted at each participating institute. Each Neurobagel node consists of a graph database for data storage and an API that exposes specific query parameters and controls what results a user can see. Currently, Neurobagel provides a graphical web query interface that communicates with the node APIs on behalf of the user, making complex queries easier to formulate. We hope to improve the user query experience further by providing a LLM chatbot-style interface to populate queries and elaborate on search results.
Leveraging the existing Neurobagel cohort query workflow, this project aims to create a chatbot using existing large language models (LLMs) for parsing user-provided text into accurate queries and reliably summarizing the results to the user. At a high level, this chatbot should be capable of receiving and understanding user prompts in natural language, initiating the corresponding API calls using predefined Neurobagel parameters (minimum age, maximum age, sex, etc.), interpreting the results, and conveying that information to the user. Ideally, open tools and models can be selected to provide flexible hosting options.
The tasks involved in this project include:
- Getting familiar with the codebase of existing tools, including the API and cohort query tool
- Exploring LLMs and relevant libraries, such as LangChain, Ollama
- Identify a model and sequence of prompts that can generate accurate API calls for the project
- Developing a simple user interface for the agent. Given the flexible time commitment, this task would only be part of the project for a contributor who would like to spend the full 350 h with us
What can I do before GSoC?
Check out Neurobagel’s website and GitHub organization to familiarize yourself with the relevant tools and codebases. Please feel free to reach out to one of the mentors through email (Brent and Arman) to raise questions/discussions that you may have about the project.
Skill level: Beginner / Intermediate
Required skills: Python or JavaScript/TypeScript
Helpful skills: Basic understanding of Linux command line, Git, Docker, network requests / API calls via HTTP
Time commitment: Flexible (175/350 h)
Lead mentors:
- Brent McPherson (@bcmcpher)
- Arman Jahanpour (@Arman)
- Sebastian Urchs (@surchs)
- Alyssa Dai (@alyssadai)
Project website: https://neurobagel.org/
Backup mentors: Members of the Neurobagel team and the Origami Laboratory at McGill
Tech keywords: Python, JavaScript, TypeScript, React, Large Language Models, Artificial Intelligence, Knowledge Graph
IMPORTANT
What to do if you want to work on this project / how to apply
First: Thanks a lot for your interest in our project, we’re excited to talk with you, discuss the project, and answer questions you have. Our project is open to everyone and we want to make sure you feel welcome here! So don’t hesitate to reach out even if you are coming from a different field, are new to this space, or have questions you first want to answer.
Here are some concrete next steps:
- Get to know us and get your questions answered! If something is unclear or you have a question, ask it here directly in the forum so everyone can benefit from the answer. Please don’t get in touch directly via email at this point, just ask your questions here in the forum.
- If you have a more technical question or want to see how we work, meet us on our GitHub Organization where we do and discuss most of our work: Neurobagel · GitHub. Feel free to comment on issues or even open a new one for a specific question, feature, or problem. Our contributor guide has some pointers for how we contribute to the projects: How to contribute - Neurobagel
- Discuss your idea for the project with us so we can help you refine your proposal before you submit it. You can send an email or direct message to @Arman or @bcmcpher for this if you prefer.
- Finally: make sure to look closely at the GSoC rules (Google Summer of Code), guides (What is Google Summer of Code? | Google Summer of Code Guides), timeline (Google Summer of Code 2024 Timeline | Google for Developers) and Advice for People Applying for GSoC | Google Summer of Code | Google for Developers) so you have a good idea of how the process works
Please note that we do not expect you to contribute any work to our repositories before you are selected for the project through GSoC. If you still want to contribute in your own time to our open-source project, you are very welcome to do so! But please understand that this is not a requirement for your application to be selected.
Once you are ready to submit your proposal for this project, please go through the GSoC website (https://summerofcode.withgoogle.com/) and follow the instructions there. We will make an effort to review and respond to your submissions quickly.