Preparing for GSoC 2026 with INCF – guidance on getting started

Hi everyone,

I’m Apurva, a student beginning early preparation for GSoC 2026 and exploring opportunities to contribute to INCF-associated projects. I’d appreciate some guidance on how new contributors usually get started which INCF repositories are currently active, what types of beginner-friendly issues are good first contributions, and any general advice you might have for someone starting out.

My background is in Electronics Engineering with a minor in AI/ML, and I primarily work with Python. I’m keen to learn through hands-on open-source contributions and gradually become involved with the community.
Looking forward to learning from the community.
Thank You.

Hello everyone, and thank you for initiating this discussion.

I’m Pragati, a Computer Science undergraduate with experience in Python and machine learning. I’m currently exploring the NWBWidgets repository and setting up the development environment to better understand the widget architecture and how it integrates with PyNWB.

I’ve previously contributed a documentation PR to the CSA repository and am looking forward to contributing more actively to the NWB ecosystem as I prepare for GSoC 2026.

I would appreciate any suggestions on beginner-friendly issues or areas where contributions are most needed.

Excited to learn and collaborate!

Best,
Pragati

1 Like

Hello everyone,

Thank you for this helpful discussion. My name is Harshitha Arava, and I am a pre-final year student preparing to apply for GSoC 2026. I have experience in Python and working with large language models (LLMs).

I’m particularly interested in contributing to INCF-related projects where LLMs or NLP techniques could support neuroscience research workflows, such as improving documentation, metadata validation, data standard compliance, or developer tooling.

I’m currently exploring INCF-affiliated GitHub repositories and would appreciate guidance on active projects, beginner-friendly issues, or areas where LLM-based or Python-driven contributions could be useful.

Looking forward to learning from the community and contributing.
Thank you!

Update: I’ve made my first code contribution to NWBWidgets!

I fixed a FutureWarning in the infer_categorical_columns function by converting lists to numpy arrays before calling pd.unique(). This eliminates distracting warnings during test execution and ensures compatibility with future pandas versions.

PR: Fix FutureWarning in infer_categorical_columns by converting list to numpy array by pragati-0208 · Pull Request #323 · NeurodataWithoutBorders/nwbwidgets · GitHub

I’d appreciate any feedback or review from mentors and the community. I’m continuing to explore the codebase and looking for more opportunities to contribute.

Best,
Pragati

Hello everyone,

I’m Priti Gupta, a final-year Computer Science student from Amrita Vishwa Vidyapeetham. My background is in full-stack engineering and AI-driven systems, and I’m preparing seriously to contribute to INCF projects for GSoC 2026.

I was previously involved with amFOSS, where I learned how open-source communities function, how to work with maintainers, and how to ship clean, reviewable contributions rather than one-time patches. That experience is what motivated me to participate more deeply in research-oriented open ecosystems.

What excites me about INCF is the impact of building standards, reproducible infrastructure, and interoperable tools that directly support neuroscience research at a global scale.

I’m currently exploring the repositories and setting up development environments to understand the architecture and contribution flow. I would love recommendations on good first issues or modules where contributors are actively needed. I’m happy to start small, build familiarity, and contribute consistently.

Looking forward to learning from everyone here and becoming a long-term contributor.

Thank you!

Hi everyone,

My name is Fares Shaaban, and I am interested in applying for Project 13.2 (Implementation of SWC to NeuroML converter).

To ensure I understood the core challenge, I spent this week studying argparse module and the SWC file format and writing a standalone Python script to parse it and generate basic NeuroML.

You can view my practice code here: https://github.com/fares-shaabanxx/swc-neuroml-prototype

My Next Step: I realize the actual implementation within PyNeuroML will be much more robust. I am currently reading through the existing codebase to understand the project structure.

Could anyone point me to the specific modules or files in PyNeuroML that handle current format conversions? I want to make sure I am studying the right part of the architecture.

Best, Fares

Hi everyone,
I’m Ritesh Thakur, a Computer Science student with a strong interest in systems programming and backend development. I have solid foundations in Data Structures and Algorithms, and I’m proficient in C++, C, Python, and JavaScript. I’ve worked on real-world projects using Node.js, React, MongoDB, REST APIs, Docker, and Git. I’m excited to contribute, collaborate, and learn from this community while working on meaningful open-source improvements.

Hi everyone,

I’m Alex. I am a recent graduate with a Bachelor’s degree in Computer Science and Engineering. During my studies, I focused on Python development, frontend technologies, and UI/UX design for software applications. I have also worked with a startup where I contributed to building web-based solutions and developing user-centric technologies. I’m interested in contributing to INCF projects and would love to get involved with the community.

Thank you!

"Hi everyone!
My name is Mubashar, and I am highly motivated to contribute to INCF for GSoC 2026. I have been following INCF’s mission to make neuroscience data FAIR (Findable, Accessible, Interoperable, and Reusable), and I would love to apply my technical skills to support this mission.
I have a diverse technical background that I believe aligns well with many INCF projects:
Web Technologies: Expert in Next.js, React, and Express.js. I am very comfortable building modern, responsive frontends and dashboards.
Backend & Databases: Strong experience with Django and SQL for managing complex data structures.
Data Science & Analysis: Proficient in Python, NumPy, Pandas, and Scikit-learn. I enjoy working on data visualization and processing tasks.
High-Performance Backend: I also have experience with Rust (Actix-web) and a basic understanding of Tauri for desktop applications.
Since the 2026 Ideas List is expected soon, I wanted to reach out to the mentors for guidance. Based on my skillset—specifically the combination of Next.js for visualization and Python/NumPy for data handling—which upcoming projects would you recommend for me?
I am looking for a project where I can provide the most impact while learning more about neuroinformatics. I am ready to start exploring the codebase and contributing to ‘good first issues’ as soon as I have a direction.
My GitHub: [Muhammad-Mubashar516 (Muhammad Mubashar) · GitHub]
Thank you for your time and guidance!
Best regards,
Mubashar Ameen

Hello everyone,

I’m Afthal Ahamad, an IT undergraduate. I’m currently exploring Project 7.2 – Image Feature and Classification Database as part of my preparation for GSoC 2026. I’ve forked and cloned the repository and am working on setting up the project locally to better understand its structure, database usage, and UI flow.

I’m especially interested in learning more about the database design improvements and UI changes involved in the project, as well as fixing existing issues and bugs.

I’m interested in contributing to INCF projects and would love to get involved with the community. I’d appreciate any suggestions on good starting points for contributing.

Thank you,
Afthal Ahamad

Hi everyone!

I’m Reem, a 3rd-year CS student, and I’m really excited about the possibility of contributing to INCF for GSoC 2026.

I’ve been looking through the project ideas, and after having conversations with past mentors, I noticed that several listed project ideas for this cycle aren’t actually participating this year.
Me personally, I’m really interested in projects involving Generative AI, computer vision, or data modeling. For context, I’ve recently been working as a Research Assistant at VCU, where I co-authored a published paper using Generative AI to model clinical weight trajectories. Additionally, I have experience building AI-assisted tools for healthcare, like a cognitive recognition app for dementia support.

I’m pretty keen to dive in and start my journey early, so I basically wanted to ask when the finalized list of projects for 2026 will be out (I believe the current one is undergoing some changes)?

I’d also love to know what specific steps I should take to demonstrate my fit for a project and earn a mentor’s ‘green light’ to begin drafting a formal proposal.

In the meantime, if there’s a specific repo, or better yet a confirmed project idea that would benefit from someone with a background in Python, R, and ML modeling , please let me know. I’d love to get a head start!

Thank you so much for your time! I’d really appreciate some help.

Best,
Reem

I’m Naitik, a 2nd-year Computer Science student from India, currently focusing on Machine Learning and Deep Learning. I’ve completed coursework and hands-on projects involving ML, neural networks, transformers, fine-tuning, and LLM applications using tools like LangChain, FAISS, and Redis.

Recently, I’ve been exploring neuroscience-related ML applications and reading through several INCF GSoC project ideas to understand how computational methods can support neuroscience workflows and data analysis. I’m particularly interested in projects that involve data pipelines, model integration, or improving research tooling for reproducible science.

I’m currently going through the project repositories, setting up environments, and looking for ways to start contributing — whether through documentation improvements, small fixes, or discussions.

Looking forward to learning from the community and collaborating with everyone here!

1 Like

Hi everyone! I’m Abhranshu a CS student preparing for GSoC 2026.

I have a strong background in C ,Java. What brings me to INCF specifically is my research work on “Visual Cognitive Load and Attention Estimation Using Eye and Facial Behavioral Signals,” particularly its applications in ADHD research.

I am looking for projects where I can apply my experience with behavioral signals and signal processing while contributing to open-source neuroscience tools. I’m currently exploring the 2026 project list and am excited to get started with some initial contributions!

Hello everyone,
I am Rohan Sardar, a third year B.Tech CSE-AIML student

I came here after seeing the INCF GSoC 2026 Ideas List. I want to contribute in:

No.8: Open source Community Sustainability LLM
No.40: Origami Lab, McGill University - Semantic Search for Neuroimaging Datasets

These problem statement aligns with my interest and prior experience. I have experience of using Python and also having hands-on experience developing NLP, semantic search applications and RAG systems both online using LLM APIs and offline on-device using local models through Ollama and Hugging Face. Beside I have experience in developing API endpoints using FastAPI and dockerizing the whole system for easy deployment.

I have used ChromaDB and FAISS for vector database, and local embedding models like all-MiniLM-L6-v2 which are required for problem number 40.

And recently I have created a full asynchronous RAG pipline without any framework. I used only Python, FAISS, asyncio, pypdf, Google Gemini 2.5 Flash, and Gemini Embeddings.

Is there any way to connect with the concerned mentor for these two projects? It would be helpful for guidance.

Looking forward to contribute.
Thank You

Hi everyone,

My name is Dhanush, a Computer Science student interested in AI systems, ML/DL, and LLM-based tools for research workflows. I’m planning to apply to GSoC 2026 with INCF and wanted to introduce myself while starting discussions around a couple of project ideas that caught my interest.

I’ve recently worked on projects involving RAG pipelines, semantic search, embeddings, and agentic AI workflows, where LLMs are used for multi-step reasoning and structured outputs.

Two projects I’m particularly interested in are:

33 - AStats: an agentic-AI approach to applied statistical practitioner workflows
(Mentors: Jonathan Morris, Yohai-Eliel Berreby, Suresh Krishna)
The idea of building an agentic system for dataset exploration and statistical workflows from scratch sounds very interesting to me, especially exploring how LLM-based agents can assist practitioners in exploratory and confirmatory analysis.

40 - Semantic Search for Neuroimaging Datasets (Neurobagel)
(Mentors: Alyssa Dai, Arman Jahanpour, Brent McPherson, Sebastian Urchs)
I’m also very interested in this project since it involves local embeddings and semantic search over dataset metadata, which aligns with some of the retrieval and embedding systems I’ve worked with.

I’ve started exploring the repositories and have already opened a few PRs (currently waiting for review/merge). In the meantime, I’m continuing to explore the codebases and draft my proposal.

If mentors or contributors have suggestions on areas worth exploring early or issues that would be good starting points, I’d really appreciate the guidance.

Looking forward to learning from and contributing to the community.

Thanks!
Dhanush
Github | Linkedin

1 Like

Hello everyone,

I’m Abiraj Kangotra, a computer science student preparing for GSoC 2026 and exploring INCF projects.

I recently started contributing to the CSA repository and opened my first pull request fixing the inconsistent mask color issue in the show() function:

I’m continuing to explore the codebase and would appreciate any suggestions for other beginner-friendly issues or areas where new contributors can help.

Looking forward to contributing more and learning from the community.

Hi everyone,

My name is Satvik Saluja , a second-year undergraduate Biotechnology student from India. I’m interested in machine learning, scientific computing, and applying computational methods to neuroscience and biological systems . I’m planning to apply to GSoC 2026 with INCF and wanted to introduce myself while starting to explore the projects and repositories.

I have experience with Python, Git, competitive programming, and machine learning.

Recently I started contributing to HNN-Core , where I’ve been exploring the codebase and submitting a few PRs related to the testing infrastructure .

My contributions so far include:

  • Refactoring the test suite to replace os.path usage with pathlib.Path
  • Adding and improving tests for Poisson drive creation
  • Small refactoring and cleanup in the tests section

These contributions address issues #960, #1118, and #1144 related to improving the testing infrastructure.

I’m particularly interested in the project:

Mentors: @asoplata, @ntolley

I’m continuing to explore the codebase and test suite while preparing my proposal. If mentors or contributors have suggestions for good issues to work on next or areas that need attention , I would greatly appreciate the guidance.

Looking forward to learning from and contributing to the community.

Thanks!
Satvik Saluja

Hello everyone,

My name is Sanchit Sehgal, and I am an undergraduate student studying Data Science and AI at Thapar Institute of Engineering and Technology. I’m preparing to apply for GSoC 2026 and am particularly interested in the Brian simulator project, specifically the idea focused on improving the documentation infrastructure.

I have experience working with Python, machine learning projects, and GitHub-based development workflows. Recently, I started exploring the Brian repositories and the current documentation setup, including the examples, tutorials, and how they are generated and maintained.

As I continue exploring the codebase and documentation structure, I would really appreciate guidance on where a new contributor should begin. In particular, I would be interested in understanding which parts of the documentation workflow or tooling (for example Sphinx, sphinx-gallery, or CI pipelines) could most benefit from improvements or contributions.

If there are specific issues, areas of the repository, or preparation steps that you would recommend for someone interested in contributing to this project, I would be very grateful for your suggestions.

Looking forward to learning more about the project and hopefully contributing in a meaningful way.

Thank you!

Hi everyone,

I am Soumyaranjan Sahoo, a pre-final year beginning my preparation for GSoC 2026.

My primary focus is on applying machine learning to physiological data and medical imaging. I work predominantly in Python and have recently developed a hybrid CNN-RNN architecture for detecting anomalies in complex time-series ECG signals, as well as an automated CNN pipeline for classifying thoracic conditions from X-rays.

I am incredibly interested in the neuroinformatics tools and ML pipelines being built here at INCF. Could anyone guide me on which active INCF repositories or beginner-friendly issues would be the best starting point for someone with a background in Python, time-series data analysis, and medical ML?

Looking forward to learning from this community! Thank you.