GSoC 2025 Project #20 INCF Secretariat :: Build an AI Agent for for KnowledgeSpace using RAG (350h)

Mentors: Visakh Muraleedharan <visakh@incf.org> and Tom Gillespie <tom.h.gillespie@gmail.com>

Skill level: Intermediate

Required skills: Python, AI/ML, NLP, Neo4j, ElasticSearch,React, Node.js, GCP, GitLab CI/CD.

Time commitment: Full time (350 hours)

Forum for discussion

About: KnowledgeSpace is a community-driven online resource for the neuroscience community, facilitating open and accessible sharing of data, knowledge, and tools. Neuroscience research generates vast amounts of complex data and literature, making it challenging for users to locate specific, relevant information. By implementing an AI agent powered by RAG, the project aims to address this challenge by creating a tool that facilitates quick and reliable access to the right data and knowledge. This will empower neuroscientists, educators, and the broader community to leverage KnowledgeSpace more effectively.

Aims: To further enhance the platform’s usability, this project aims to develop an AI-powered agent that uses Retrieval-Augmented Generation (RAG) to provide precise, contextually relevant, and human-like answers to user queries that will improve user experience by providing concise, context-aware, and scientifically accurate information about neuroscience concepts and datasets.

Scope:

  • Integration with existing KnowledgeSpace metadata
  • Indexing and data retrieval based on text and vector search
  • Neuroscience Domain context adaptation using standards NIFSTD
  • Model deployment and integration in Vertex AI
  • User Interface development

Websites: https://knowledge-space.org/ and GitHub - INCF/knowledge-space: KnowledgeSpace (KS) is a data-driven encyclopedia and search engine for the neuroscience community.

Tech keywords: Python, AI/ML, NLP, Neo4j, ElasticSearch

2 Likes

Greetings @visakh, Tom Gillespie,
I am Kinshuk Trivedi, a third-year undergraduate student at VJTI, Mumbai, India, pursuing a B.Tech in Electrical Engineering. I have a strong enthusiasm for LLMs, NLP, and machine learning! I am proficient in Python and C++, with hands-on experience in RAG, NLP, Computer Vision, and Python libraries such as NumPy, Pandas, TensorFlow, PyTorch, Streamlit. My skill set also includes Web Development (MERN stack), MySQL, APIs, Linux CLI, and version control with Git/GitHub. I have extensively worked with LLMs, including LangChain, RAG, and the Transformer architecture, applying them to develop a news research tool and a cold email generator using OpenAI and LLaMA 3.1 models.

I found the project “Build an AI Agent for KnowledgeSpace using RAG” incredibly fascinating. KnowledgeSpace serves as a vital hub for the neuroscience community, providing open and accessible sharing of research data, knowledge, and tools. The idea of integrating an AI-powered agent that leverages Retrieval-Augmented Generation (RAG) to enhance search capabilities and deliver precise, context-aware responses is both innovative and impactful. Given my experience with LLMs and RAG, I believe I can make meaningful contributions to this project. With my proficiency in web development, I can also build and improve the user interface. Future improvements could include multilingual support and personalization features to refine search relevance based on user preferences.

I feel that enhancing the way neuroscience data is structured, explored, and interacted with can make a real difference in how knowledge is shared and discoveries are made. This project has the potential to bridge gaps, making it easier for researchers, educators, and enthusiasts to find the right information when they need it, making it an honor to contribute to this project.

I am committed to dedicating the required 350 hours to this project. Over the years, I have developed a strong ability to grasp new technologies quickly, and I am eager to learn and adapt to contribute meaningfully to this project.

I’ve been exploring the ElasticSearch APIs, Graph databases like Neo4J, ArrayDB & also KnowledgeSpace platform by running it locally on my system, experimenting with improvements and potential integrations to enhance its functionality and user experience . Please guide me with the resources which I would require to study, to start working on this project.

Looking forward to hearing from you and contributing under your mentorship.

Do share any updates regarding the tasks to be performed for GSoC’25.

Email: kinshuktrivedi03@gmail.com
My github repo: GitHub
My resume: Resume

Best Regards,
Kinshuk Trivedi

Hello @visakh @TomGillespie
I am Mahi S. Palimkar, a Computer Engineering sophomore at Veermata Jijabai Technological Institute (VJTI), Mumbai. I am a Machine learning enthusiast and have also worked extensively with Natural language processing, computer vision, image processing and web development. I have a good command on python, C/C++ and version control using Git/GitHub.

I’m fascinated by the challenge of making complex neuroscience data accessible, as this directly accelerates scientific discovery. RAG is the ideal solution here as it combines the conversational abilities of LLMs with fact-based retrieval from authoritative sources—ensuring both scientific accuracy and intuitive access to specialized knowledge.

In the last few days, I have gained a deeper understanding of how RAGs work. This is a blog I wrote about it. I also tried building a basic RAG pipeline, and later on a RAG powered chatbot that answers Indian legal queries.
You can see it here:RAG-mini-project

Currently, I have been actively learning Neo4j as it has relevance in implementing the ontology driven aspects of this RAG system.

I would love to contribute to issues and do tasks to get more familiar with the project. It would be really helpful if you guide me on these lines.

I am also attaching my resume here.

Thank you!

Thank you all for your interest!
Besides the RAG part, which we have already started building.
We are looking for someone who can build a chat interface with conversational memory.

1 Like

Greetings @visakh,
Thanks for your update. Yes we can also build a chat interface with conversational memory. There are two ways through which we can implement it :-

1.using RAG (AI Agent) by which we can store past conversations as vector embeddings and query the vector database to fetch relevant past conversations and then concatenate retrieved memory + user query and pass it down to LLM which will generate the desired response. In this one we can have more than one database also (for different knowledge domains), for which the data retrieval from the databases will be handled by the agent using LLM reasoning.

2.using LLM fine-tuning (best for domain specific memory retention). If you want the chatbot to retain knowledge and respond more naturally then we can fine-tune an open-source model (e.g., LLaMA, Mistral, Falcon ) on custom conversation datasets and then train on past user interactions + responses to put memory into weights.

We can also make a hybrid one from both of the above methods. The UI part can be done using React/Flask or any other tech stack which is suitable for integration. Hope you got the ideology of the above methods.

Regards,
Kinshuk Trivedi

Thank you for your response @visakh!
I will look out for the best ways to do so and update you at the earliest.

Hello @visakh

I hope you’re doing well. My name is Uzair Sayyed, and I’m currently a second-year Computer Science Engineering student with a strong passion for LLMs and agentic AI workflows. I’m excited about the opportunity to contribute to our current system by building a Retrieval Augmented Generation (RAG) layer on top of our existing Elasticsearch data.

The approach focuses on leveraging the power of Elasticsearch combined with advanced language models. The idea is to convert our existing text data into dense, numerical representations that capture the semantic meaning of the text. These dense vectors will be stored alongside the original content within Elasticsearch. Storing the dense vectors with the original text is crucial: the vectors enable efficient semantic search using techniques like kNN, while the original text provides the full context and readability for users once relevant documents are retrieved.

When a user submits a query, we will convert that query into an embedding using the same model. Elasticsearch’s k-Nearest Neighbors (kNN) search will then be used to retrieve the most semantically similar documents from our dataset. These documents, rich in context, are subsequently fed into a language model along with the query to generate an informed and context-aware response.

Additionally, to further enhance the user experience, I propose building a conversational chat interface using LangChain. This interface will not only handle the retrieval and generation process but also incorporate conversational memory, ensuring that the dialogue remains coherent and engaging across multiple interactions.

Overall, this approach allows us to seamlessly integrate our existing data into a robust RAG system, combining semantic search with generative AI to deliver more accurate, context-rich responses. I’m very willing to contribute to this project and collaborate with you all to refine the details.

Looking forward to your thoughts and feedback.

mail:uzairsayyed010@gmail.com

Hey @visakh ,
I am Satvik, a third-year at Amrita Vishwa Vidyapeetham and a member at amFOSS.
I sent you an email asking questions regarding how to obtain the metadata so that I could explore this project further. I hope you can respond to it soon (a reply over here would work too!). I’d love to be a part of building the RAG even though you have already started working on it as it is a genuine interest of mine and it would be an incredibly enlightening experience to be a part of such a project, regardless of it being considered as a part of GSoC. I believe that building the RAG is far more challenging than developing a chat interface to record conversational memory (which I would also be interested in contributing to), but my main interest lies in joining the team currently working on it. I recently worked on a personal project implementing RAPTOR RAG from scratch so this would be a good continuation of that project.
I hope you see this and respond as soon as possible, as I would love to be part of this project.

Regards,
Satvik Mishra
mail:satvmishi@gmail.com