Hi Visakh,
Would you prefer using something like FastAPI or Flask to create a backend for the chatbot? Or would you be interested in bundling the entire functionality with the streamlit or gradio code? I have experience with building an end-to-end RAG based chatbot with langchain and I think I can contribute in building the chatbot. Thanks in advance!
Iām Abhishek, currently pursuing my Masterās in Computer Engineeringwith a focus onMachine Learning at Virginia Tech. Before starting grad school, I worked as a Software Engineer at an ML-focused firm for two years. My experience includes building APIs and developing user interfaces, particularly for chatbot applications.
I came across the Build an AI Agent for KnowledgeSpace using RAG project and found it closely aligned with my interests and experience. Iām currently involved in a research project focused on developing a reasoning model using Retrieval-Augmented Generation (RAG), and I believe I could meaningfully contribute to this initiative.
Iāve attached my resume and portfolio for your reference.
My name is Mohamed Awad, a fourth-year undergraduate student at CUNY Queens College studying computer science in New York City. I have a strong interest in LLM, machine learning, and NLP. I have demonstrated proficiency in Python with hands-on experience in ML, RAG, and other Python libraries, such as Numpy, Pandas, and Pytorch. Recently, I have done ML research where I analyzed thousands of images to classify signs of skin cancer. My skill set also includes web development from tools like JS, React, Node.js, Git, APIs, AWS, and Firebase.
I came across this project āBuild an AI Agent for KnowledgeSpace using RAGā and I found it interesting that in a world of LLMs, the models themselves need to be accurate, fast, and up-to-date with the relevant context. With my knowledge in RAG, NLP, and LLMs, I believe I can make meaningful contributions for this project and feel that neuroscience data would be structured and accessible to community members.
I am dedicated to commit at least 350 hours to this project given my background in AI and web development. I have developed a strong ability to learn quickly and adapt to technologies, allowing myself to make contributions that benefit the entire KnowledgeSpace community. Iām wondering if you could guide me with any resources to get started before getting into the nitty-gritty of this project.
Iām currently a third-year computer science PhD at UMass Boston, researching bias mitigation and fairness enhancement in Recommender Systems (RS). In my research, Iām also experimenting with LLMs to uncover the black box nature of RS. I wanted to utilize my summer learning and contribute meaningfully and I found the āBuild an AI Agent for KnowledgeSpace using RAGā project interesting and close to my research and interests.
Iāve worked with RAG and chatbots in past academic projects and also got to explore AI agents through hackathons at the MIT/Harvard club. Through these hackathons, I came across a tool called Maestro by AI21 Labs. I believe the problem of creating an AI agent for KnowledgeSpace would need a similar approach where the Agent dynamically plans the task based on the user query and makes an execution plan to retrieve correct data effectively.
Additionally, I read that you are looking for someone to build the chat interface with conversational memory. For that, Iāve had some experience in the past where Iāve worked in Web Development and can create a chat interface similar to ChatGPT, Deepseek, etc. Also, as part of the academic project where I worked on a RAG-based chatbot, we worked on a Conversational pipeline to save and tailor the LLM response based on the conversation history.
I would love to get this conversation going and am looking forward to your response.
PS: attaching my resume for you to get a better understanding of my skills. Resume
Iām Rishika, a third-year CSE undergrad with a strong interest in AI and backend development. I recently built a WhatsApp chatbot for the 2025 Indian elections as part of a government-backed project, handling large-scale queries on AWS with NLP-based retrieval and optimization techniques. Through that, I worked on efficient retrieval methods, prompt engineering for factual accuracy, and scalable caching strategiesāwhich directly align with the challenges in this project.
I know Iām joining the discussion a bit late, but the work youāre doing with RAG and conversational memory in KnowledgeSpace is incredibly exciting! I had a quick question regarding the retrieval pipeline:
Are we looking at a hybrid memory approach, combining vector search (FAISS/Qdrant) with structured storage for long-term context, or are we optimizing more for short-term recall with token-window-based solutions? Also, given the neuroscience domain, what trade-offs are we considering in retrievalāspeed, storage constraints, or something else?
Really looking forward to your insights and excited to contribute.
Datasets are metadata from large neuroscience data repos together with curated ontologies in NIFSTD.
The agent can provide concept definitions and provide results on relevant datasets. The metadata schema for each datasets/sources in not standardised, but the llm can be quite useful to retrieve relevant information there.
I hope youāre doing well. Iām writing to express my strong interest in the GSoC 2025 project: āBuild an AI Agent for KnowledgeSpace using RAG.ā
Thank you for the recent update clarifying that the RAG backend is already underway. Iām genuinely excited to hear that! Based on your guidance, Iāve begun exploring the development of a chat interface with conversational memory, which will integrate with the existing RAG system and enhance KnowledgeSpaceās usability.
Over the past few days, Iāve:
Researched and experimented with frontend chat UIs using React
Explored how conversational memory can be built using LangChain concepts (e.g., buffer memory and context windows)
Studied how vector stores and graph databases like Neo4j might support long-term memory or structured metadata retrieval
I understand that metadata across datasets is not standardized, and I see the huge potential of using LLMs to bridge this gap by providing accurate, contextual answers through a conversational experience. Iām also beginning to experiment with model deployment on Vertex AI, as mentioned in the project scope.
Attached is a mockup that showcases my early thoughts and interface concept. Iād love your feedback and would be happy to iterate or prototype further based on your suggestions.
Understanding Metadata Structure in KnowledgeSpace To effectively enable contextual responses and relevant dataset retrieval in the AI agent, I studied the existing metadata structuring pipeline. The diagram below outlines how dataset provenance, measurement device data, and results are formalized using OWL ontologies and stored in an ontology repository. My proposed conversational UI will interface with this layer to provide accurate, ontology-powered search and interaction.
Iām currently drafting my full GSoC proposal and would be honored to collaborate on this project under your mentorship.
I am Tao He, currently an MSCS Align student at Northeastern University(Boston). Iām credibly excited about the KnowledgeSpace RAG project and would love to contribute as a GSoC contributorā¦
What draws me to this project is its rare intersection of AI, knowledge retrieval, and neuroscienceāareas Iām now deeply passionate about. Iāve worked on multiple end-to-end NLP and ML projects, including:
A comparative time-series modeling research (ARIMA vs SVR) published with Taylor & Francis
A document-level sentiment classification project using an improved BiLSTM with attention (accepted at ICDSE 2024)
A solo-built full-stack prompt-sharing platform Promptllery using React + Supabase, demonstrating my frontend/backend + user-centric design skills
UI/UX for querying and interacting with neuroscience data
Fine-tuning LLM outputs for clarity and domain relevance in the RAG pipeline
As someone transitioning from economics and public policy into CS and AI, I bring not only technical curiosity but also a deep respect for the complexity of domain knowledgeāespecially in fields like neuroscience.
Iād love the opportunity to dive deeper into the problem scope and propose a concrete implementation plan if selected. Thank you for considering my interest!