Subject: Exploring Local LLMs for Neuroscience Knowledge Retrieval (RAG) - Seeking Feedback
Hi everyone,
My name is Sai Vinay, and I am a Data Science undergraduate interested in the intersection of LLMs and Neuroinformatics.
In preparation for the upcoming GSoC 2026 cycle (aiming for the Knowledge Space/Agent projects), I built a proof-of-concept tool called Neuro-RAG-Assistant.
What it does: It allows researchers to perform semantic searches over local neuroscience textbooks (PDFs) using a quantized Phi-3 model and FAISS. It runs entirely locally (via llama.cpp), ensuring data privacy and zero API costs.
My Question for Mentors: As I refine this, I want to align it with INCF standards.
- Are there specific metadata schemas (like BIDS derivatives) that are preferred for indexing “unstructured” text data in the Knowledge Space?
- Would integration with the NWB (Neurodata Without Borders) documentation be a valuable test case for this RAG system?
Any feedback on the code or approach would be greatly appreciated!
Best, Sai Vinay