GSOC 2026 Project #8 : Open source Community Sustainability LLM

As demonstrated by many organizations, open-source communities can do great things. But this is only true if the contributor community can maintain public goods such as the software codebase and institutional knowledge over time and despite contributor turnover. Moreover, as the demand for open-source software continues to grow, so do the challenges related to community management, collaboration, and sustainability. During GSoC 2024, one of the approaches to address this was the creation of LLAMOSC (LLM-Powered Agent-Based Model for Open Source Communities), a comprehensive framework designed to simulate and enhance the sustainability of open-source communities using Large Language Models (LLMs)(GSoC/Open Source Sustainibility using LLMs at main · OREL-group/GSoC · GitHub).

In 2024, the main work was done to simulate Github i.e a CodeSpace Environment within this framework, complete with issues of varying difficulty, contributor as well as maintainer agents with different coding ability and experience levels, an discussion space to discuss various approaches for a particular task among agents, an automated pull request lifecycle as well as multiple decision-making algorithms to choose task allocation for contributors and corresponding metrics for its simulation. For this project, the need is to maintain and develop the LLAMOSC framework. Additional features also include to improve upon the underlying models (Add Collaboration Algorithm for Multiple Agents on a Single Issue · Issue #64 · OREL-group/GSoC · GitHub) and add a ConversationSpace (Add ConversationSpace (to Simulate Slack) · Issue #60 · OREL-group/GSoC · GitHub) within this framework to simulate a IRC (Internet Relay Chat) / Slack / Discord model, an essential part of many open source communities. A possible approach for this is using RAG (Retrieval Augmented Generation) (Integrate RAG within ConversationSpace and GithubDiscussion · Issue #62 · OREL-group/GSoC · GitHub) but other approaches backed by research are also welcome. Our goal is to develop one or more maintainers of the platform who are also capable of research software engineering (https://www.hpcwire.com/off-the-wire/ncsa-innovators-bridging-research-and-software-engineering/).

What can I do before GSoC? You can join the Orthogonal Lab, as well as attend our Saturday Morning NeuroSim meetings. You will work with our Ethics, Society, and Technology group, and interactions with your colleagues is key. You will also want to become familiar with our various Open Source Sustainibility Models (GitHub - OREL-group/GSoC: A place to investigate potential Google Summer of Code opportunities. · GitHub) developed in previous years, as well as go through installation steps (GSoC/Open Source Sustainibility using LLMs at main · OREL-group/GSoC · GitHub) and various open issues related to LLAMOSC (GitHub · Where software is built)

Orthogonal Research and Education Lab: https://orthogonal-research.weebly.com/

Skill level: Intermediate

Required skills: The following languages and frameworks will be used extensively throughout the project: Python, PyQT and Ollama. This project will also involve working with Large Language Models, computational and agent-based models, UI design, and open-source community-building, so experience in these areas is helpful but not required. Knowledge of open-source development practices and an interest in interdisciplinary research are a must.

Time commitment: Full-time (350 h)

Lead mentor: Sarrah Bastawala (sarrahbastaw@gmail.com)

Project website: https://orthogonal-research.weebly.com/

Backup mentors: Bradly Alicea (bradly.alicea@outlook.com), Jesse Parent (jesse@jopro.org)

Tech keywords: Open Source Communities, Large Language Models (LLM), Agent-based Models, Python, PyQT, Ollama

Hi everyone! I am Kalpana Shanmugam, a B.Tech(AI&DS) pre-final year student from India, interested in applying for Project #8 (Open Source Community Sustainability LLM) for GSoC 2026 with INCF/OREL.

Background: I recently won the IBM Watsonx Agentic AI Hackathon , NexusGuardAI - a multi agent SOC Copilot built using RAG pipelines and tool-calling directly relevant to the Conversation Space and RAG work planned for this project.

What I’ve done so far:
→ Installed LLAMOSC locally on Windows 11 with Python 3.11 and llama3 via Ollama
→ Successfully ran the simulation through agent discussion and bidding phases
→ Identified and fixed a KeyError crash in rating_and_bidding.py when LLM-generated issue descriptions contain dict-like strings → PR submitted: Fix: KeyError crash when issue description contains dict-like strings in format templates by kalpana-Shan · Pull Request #134 · OREL-group/GSoC · GitHub

My technical question (@SarrahBastawala):
For Issue #60 (ConversationSpace), the current agent interaction model is task-driven and synchronous. When designing the IRC/Slack simulation, should ConversationSpace run as a parallel async loop alongside the existing CodeSpace, or trigger contextually only when an agent flags a task as blocked? I’m prototyping the async approach in my fork but want to align with your architectural vision first.

Looking forward to contributing to OREL!

Kalpana Shanmugam,
GitHub: kalpana-Shan
Fork: GitHub - kalpana-Shan/GSoC: Fork of OREL - GSoC 2026 Project #8 — LLAMOSC · GitHub

Hey, I’m Sandeep — interested in Project #8.

I went through the LLAMOSC codebase and noticed the RAG retriever had FAISS hardcoded with no way to swap backends. Based on the requirements in Issue #62, I:

  1. Created Issue #133 proposing a vector store abstraction layer
  2. Submitted PR #135 implementing FAISS and Chroma backends

Now you can switch backends with a single parameter:
retriever = RAGRetriever(backend=“chroma”)

Looking forward to feedback! Happy to add benchmarking next if this direction works.

GitHub: SandeepChauhan00 (Sandeep Chauhan) · GitHub

Hey Kalpana, nice work on the bug fix!

I’m working on the RAG side (#62) — just submitted PR #135
adding pluggable vector store backends. Once ConversationSpace
is ready, the RAG layer can index those conversations.

Maybe we can coordinate on the integration later?

Thanks Sandeep! Really glad the fix landed cleanly :blush:
That’s awesome. pluggable vector store backends is exactly the kind of abstraction ConversationSpace will need. I’m planning to look into the ConversationSpace + RAG integration side (#60/#62) next, so your PR is super relevant to what I want to work on.
Would love to coordinate once I dig into the integration layer , I’ve worked with RAG pipelines in a multi-agent setup before, so I have some ideas on how the indexing flow could work. Will ping you when I have something concrete!