Preparing for GSoC 2026 with INCF – guidance on getting started

SanchitSehgal · March 17, 2026, 1:36pm

Hello everyone,

My name is Sanchit Sehgal, and I am an undergraduate student studying Data Science and AI at Thapar Institute of Engineering and Technology. I’m preparing to apply for GSoC 2026 and am particularly interested in the Brian simulator project, specifically the idea focused on improving the documentation infrastructure.

I have experience working with Python, machine learning projects, and GitHub-based development workflows. Recently, I started exploring the Brian repositories and the current documentation setup, including the examples, tutorials, and how they are generated and maintained.

As I continue exploring the codebase and documentation structure, I would really appreciate guidance on where a new contributor should begin. In particular, I would be interested in understanding which parts of the documentation workflow or tooling (for example Sphinx, sphinx-gallery, or CI pipelines) could most benefit from improvements or contributions.

If there are specific issues, areas of the repository, or preparation steps that you would recommend for someone interested in contributing to this project, I would be very grateful for your suggestions.

Looking forward to learning more about the project and hopefully contributing in a meaningful way.

Thank you!

suresh.krishna · March 17, 2026, 10:32pm

Please look at the topic page dedicated to that project.

Pragati0208 · March 18, 2026, 2:40pm

Thanks for sharing the PR — I went through the DataProfiler implementation to better understand its role in the AStats workflow.

Profiler outputs:
It generates a structured JSON profile of the dataset including column metadata, missing values, descriptive statistics (mean, median, std, etc.), normality test results, outlier detection, and variance homogeneity. It also provides agent_hints such as normal vs non-normal columns and test routing suggestions.
It does NOT handle:
User queries, decision-making workflows, or execution of statistical tests. It also does not include any LLM-based reasoning or agent orchestration.
Its purpose is:
To automatically analyze and summarize datasets and provide structured statistical insights that can guide a higher-level agent in selecting appropriate statistical workflows.

From my experience working on Knowledge Space Agent (INCF), where we used structured signals to guide retrieval and reasoning, this feels like a similar foundational layer for decision-making.

As a follow-up: should the next step be to build an agent layer that uses these agent_hints to dynamically select and execute statistical tests, or is the focus more on designing a guided/interactive workflow for practitioners?

suresh.krishna · March 18, 2026, 2:48pm

I dont know which PR you are referring to, but the answer to your question is that both workflows are possible, depending on user-preference and agentic skill.

Once again, please use the topic page for AStats.

Pragati0208 · March 18, 2026, 2:54pm

Thanks for the clarification! I was referring to the DataProfiler PR for AStats (data-discovery layer). I’ll move the discussion to the AStats topic page to keep things organized.

CIumsy · March 25, 2026, 6:01am

Hi, I’m Krishnanshu Mittal from Delhi, India.

I’m an embedded systems developer at Upside Down Labs, where I build EEG and EMG-based brain-computer interfaces and wearable bio-potential signal devices. I’m applying for GSoC 2026 Project #20 - BreathState, a phone-based HRV biofeedback app, which directly connects to my background in real-time bio-potential signal processing and physiological data visualization.

I’ve introduced myself on the BreathState thread, forked BreathState2_2025, and am currently exploring the codebase to identify areas for contribution.

GitHub
Portfolio