GSoC 2025 Project #20 INCF Secretariat :: Build an AI Agent for for KnowledgeSpace using RAG (350h)

Hemant_Machiwar · December 26, 2025, 6:58am

Hello everyone,

I’m Hemant Machiwar, a B.Tech student at NIT Hamirpur, Himachal Pradesh, India, and I’m very interested in contributing to INCF projects as part of my preparation for GSoC 2026.

I’ve gone through the INCF ideas list and contribution guidelines, and I’ve also created an account on Neurostars. I would now like to start contributing through small, beginner-friendly issues to understand the relevant codebases and workflows better.

I have experience with JavaScript (MERN/Next.js)and a working knowledge of Python, C++, and Java, and I’m comfortable setting up projects locally and contributing via GitHub.

I would really appreciate guidance on:

Which repositories or projects would be best suited for a beginner contributor
Where should I start to understand the codebase effectively
Any recommended issues or tasks that would be valuable to work on initially

I’m eager to contribute consistently and learn in the process. Thank you for your time and for maintaining such impactful open-source projects.

Best regards,
Hemant Machiwar

Ravi_Ranjan · December 27, 2025, 12:17pm

Welcome to the community, @Hemant_Machiwar!

Just a heads-up on the architecture: This repository (knowledge-space-agent) is the pure Python/LangGraph backend for the AI logic.

Since your strength is Next.js/React, you might find the main knowledge-space repository (the core platform) much more relevant to your skills. The frontend work usually happens there, while we focus on the search algorithms and vector pipelines here.

I’d recommend checking that repo’s issue tracker for UI/UX tasks. we definitely need strong frontend contributors there!

Best of luck!

Ayush_kumar_rai · December 28, 2025, 3:23pm

@Ravi_Ranjan It would be nice if u can enlighten me where to find incf repos where contributions are happening

Sai_Vinay · December 29, 2025, 2:43am

Hi everyone,

Given that knowledge-space-agent focuses on the Python/LangGraph backend and vector pipelines, this aligns directly with the proof-of-concept I recently built, Neuro-RAG-Assistant.

To summarize my relevant work for this backend:

Local RAG Implementation: I built an offline RAG pipeline using Phi-3 (quantized via llama.cpp) and FAISS for vector storage.
Semantic Search: Implemented indexing for unstructured neuroscience textbooks to enable context-aware retrieval.
Tech Stack: My project uses pure Python, consistent with the requirements for the agent backend.

My Proof of Concept: https://github.com/Saivinay24/neuro-rag-assistant

I will focus my attention on the knowledge-space-agent repository. I am specifically looking for issues related to the search algorithms or LangGraph logic to start contributing.

Best, Sai Vinay

Ayush_kumar_rai · December 29, 2025, 5:17am

@Sai_Vinay
I have some doubt
Are u using Schema Vector database for processing that textbook . If yes then are u building schema everytime u start prediction new time

Sai_Vinay · December 29, 2025, 5:35am

@Ayush_kumar_rai
In my local Neuro-RAG-Assistant, I’m just using FAISS. I generated the embeddings once and saved the index as index.faiss. So whenever I run the script, it just loads the saved file instantly.
the actual knowledge-space-agent repo is using Elasticsearch and Vertex AI, so the persistence layer there is much better than my local setup.

Ayush_kumar_rai · December 29, 2025, 6:10am

Cool… using saved embeddings saves lot of effort. But yes knowledge -space repo is more optimised. Doesn’t streamlit limit ui interface until unless u r scaling it.

Sai_Vinay · December 29, 2025, 6:20am

@Ayush_kumar_rai Yeah, agreed. Streamlit is good for quick testing, but it definitely hits a wall if you try to scale it or customize the UI too much. It’s mostly just a dev tool in this context.

Charan6924 · January 3, 2026, 8:52pm

Dear @visakh and Tom,

My name is Charan and I’m a second year CS major at Case Western Reserve University. I’m writing this message to demonstrate my interest in Project # 20.

I have professional experience building RAG chatbots for clinical data (at IQVIA) using Vector Search, but I see Project #20 specifically requires Neo4j and ElasticSearch.
I’m currently going through the KnowledgeSpace GitHub and have two specific questions :

Does the current Neo4j instance already have the “NIFSTD” ontology fully mapped for the RAG agent to query, or should my proposal include a phase for refining the graph schema?
For the proposal should I assume we will use Gemini/PaLM models through Vertex AI or open source LLMs such as Llama wrapped in a vertex container?

I am committed to this project and am already working on my cypher queries to ensure I contribute as much as possible

Regards,
Charan

harshdipsaha · January 4, 2026, 11:49am

Hi everyone! Myself Harshdip Saha, currently pursuing Computer Science and Engineering with specialisation in artificial intelligence from Netaji Subhas University Of Technology(NSUT), Delhi. I am currently in 3rd year and got global rank 3rd at MICCAI 2025 conference held at South Korea in brain tumor progression challenge. I am highly interested in working with machine learning and artifical inteligence related things and am always ready to see how we can apply those theoretical stuff in real world and exceed our expectations. I am currently working on issues 16 and 17 and have submiited a pr for issue 17, and a draft pr ready for issue 16, mentors please provide suggestions and guide me. github username:HARSHDIPSAHA

Charan6924 · January 6, 2026, 3:47am

Update: I’ve submitted the PR for the Neo4j retrieval prototype (Link to pull request)

I focused on the Graph side first and included unit tests to verify its logic . Looking forward to your feedback!

Ravi_Ranjan · January 16, 2026, 2:08pm

Hi @visakh and everyone,

Just a quick update as I finalize my proposal for the Agent (focusing on the Offline Testing architecture PR#19).

While exploring the wider ecosystem this week to ensure my design fits well, I spotted and fixed a couple of compatibility issues in other repos

csa (PR #28): Fixed the build system so it finally installs correctly on Windows and modern Python.
artem-is (PR #157): Patched the download script to silence those recurring URL warnings and clean up the logs.

My goal is to bring this same level of cross-platform robustness to the Agent project so it works smoothly for everyone. I’ll share my proposal draft for feedback soon!