Hello everyone,
I’m Hemant Machiwar, a B.Tech student at NIT Hamirpur, Himachal Pradesh, India, and I’m very interested in contributing to INCF projects as part of my preparation for GSoC 2026.
I’ve gone through the INCF ideas list and contribution guidelines, and I’ve also created an account on Neurostars. I would now like to start contributing through small, beginner-friendly issues to understand the relevant codebases and workflows better.
I have experience with JavaScript (MERN/Next.js)and a working knowledge of Python, C++, and Java, and I’m comfortable setting up projects locally and contributing via GitHub.
I would really appreciate guidance on:
- Which repositories or projects would be best suited for a beginner contributor
- Where should I start to understand the codebase effectively
- Any recommended issues or tasks that would be valuable to work on initially
I’m eager to contribute consistently and learn in the process. Thank you for your time and for maintaining such impactful open-source projects.
Best regards,
Hemant Machiwar
Welcome to the community, @Hemant_Machiwar!
Just a heads-up on the architecture: This repository (knowledge-space-agent) is the pure Python/LangGraph backend for the AI logic.
Since your strength is Next.js/React, you might find the main knowledge-space repository (the core platform) much more relevant to your skills. The frontend work usually happens there, while we focus on the search algorithms and vector pipelines here.
I’d recommend checking that repo’s issue tracker for UI/UX tasks. we definitely need strong frontend contributors there!
Best of luck!
@Ravi_Ranjan It would be nice if u can enlighten me where to find incf repos where contributions are happening
Hi everyone,
Given that knowledge-space-agent focuses on the Python/LangGraph backend and vector pipelines, this aligns directly with the proof-of-concept I recently built, Neuro-RAG-Assistant.
To summarize my relevant work for this backend:
- Local RAG Implementation: I built an offline RAG pipeline using Phi-3 (quantized via
llama.cpp) and FAISS for vector storage.
- Semantic Search: Implemented indexing for unstructured neuroscience textbooks to enable context-aware retrieval.
- Tech Stack: My project uses pure Python, consistent with the requirements for the agent backend.
My Proof of Concept: https://github.com/Saivinay24/neuro-rag-assistant
I will focus my attention on the knowledge-space-agent repository. I am specifically looking for issues related to the search algorithms or LangGraph logic to start contributing.
Best, Sai Vinay
@Sai_Vinay
I have some doubt
Are u using Schema Vector database for processing that textbook . If yes then are u building schema everytime u start prediction new time
@Ayush_kumar_rai
In my local Neuro-RAG-Assistant, I’m just using FAISS. I generated the embeddings once and saved the index as index.faiss. So whenever I run the script, it just loads the saved file instantly.
the actual knowledge-space-agent repo is using Elasticsearch and Vertex AI, so the persistence layer there is much better than my local setup.
Cool… using saved embeddings saves lot of effort. But yes knowledge -space repo is more optimised. Doesn’t streamlit limit ui interface until unless u r scaling it.
@Ayush_kumar_rai Yeah, agreed. Streamlit is good for quick testing, but it definitely hits a wall if you try to scale it or customize the UI too much. It’s mostly just a dev tool in this context.
Dear @visakh and Tom,
My name is Charan and I’m a second year CS major at Case Western Reserve University. I’m writing this message to demonstrate my interest in Project # 20.
I have professional experience building RAG chatbots for clinical data (at IQVIA) using Vector Search, but I see Project #20 specifically requires Neo4j and ElasticSearch.
I’m currently going through the KnowledgeSpace GitHub and have two specific questions :
-
Does the current Neo4j instance already have the “NIFSTD” ontology fully mapped for the RAG agent to query, or should my proposal include a phase for refining the graph schema?
-
For the proposal should I assume we will use Gemini/PaLM models through Vertex AI or open source LLMs such as Llama wrapped in a vertex container?
I am committed to this project and am already working on my cypher queries to ensure I contribute as much as possible
Regards,
Charan
Hi everyone! Myself Harshdip Saha, currently pursuing Computer Science and Engineering with specialisation in artificial intelligence from Netaji Subhas University Of Technology(NSUT), Delhi. I am currently in 3rd year and got global rank 3rd at MICCAI 2025 conference held at South Korea in brain tumor progression challenge. I am highly interested in working with machine learning and artifical inteligence related things and am always ready to see how we can apply those theoretical stuff in real world and exceed our expectations. I am currently working on issues 16 and 17 and have submiited a pr for issue 17, and a draft pr ready for issue 16, mentors please provide suggestions and guide me. github username:HARSHDIPSAHA
Update: I’ve submitted the PR for the Neo4j retrieval prototype (Link to pull request)
I focused on the Graph side first and included unit tests to verify its logic . Looking forward to your feedback!
Hi @visakh and everyone,
Just a quick update as I finalize my proposal for the Agent (focusing on the Offline Testing architecture PR#19).
While exploring the wider ecosystem this week to ensure my design fits well, I spotted and fixed a couple of compatibility issues in other repos
csa (PR #28): Fixed the build system so it finally installs correctly on Windows and modern Python.
artem-is (PR #157): Patched the download script to silence those recurring URL warnings and clean up the logs.
My goal is to bring this same level of cross-platform robustness to the Agent project so it works smoothly for everyone. I’ll share my proposal draft for feedback soon!