GSOC 2026 Project #8 : Open source Community Sustainability LLM

As demonstrated by many organizations, open-source communities can do great things. But this is only true if the contributor community can maintain public goods such as the software codebase and institutional knowledge over time and despite contributor turnover. Moreover, as the demand for open-source software continues to grow, so do the challenges related to community management, collaboration, and sustainability. During GSoC 2024, one of the approaches to address this was the creation of LLAMOSC (LLM-Powered Agent-Based Model for Open Source Communities), a comprehensive framework designed to simulate and enhance the sustainability of open-source communities using Large Language Models (LLMs)(GSoC/Open Source Sustainibility using LLMs at main · OREL-group/GSoC · GitHub).

In 2024, the main work was done to simulate Github i.e a CodeSpace Environment within this framework, complete with issues of varying difficulty, contributor as well as maintainer agents with different coding ability and experience levels, an discussion space to discuss various approaches for a particular task among agents, an automated pull request lifecycle as well as multiple decision-making algorithms to choose task allocation for contributors and corresponding metrics for its simulation. For this project, the need is to maintain and develop the LLAMOSC framework. Additional features also include to improve upon the underlying models (Add Collaboration Algorithm for Multiple Agents on a Single Issue · Issue #64 · OREL-group/GSoC · GitHub) and add a ConversationSpace (Add ConversationSpace (to Simulate Slack) · Issue #60 · OREL-group/GSoC · GitHub) within this framework to simulate a IRC (Internet Relay Chat) / Slack / Discord model, an essential part of many open source communities. A possible approach for this is using RAG (Retrieval Augmented Generation) (Integrate RAG within ConversationSpace and GithubDiscussion · Issue #62 · OREL-group/GSoC · GitHub) but other approaches backed by research are also welcome. Our goal is to develop one or more maintainers of the platform who are also capable of research software engineering (https://www.hpcwire.com/off-the-wire/ncsa-innovators-bridging-research-and-software-engineering/).

What can I do before GSoC? You can join the Orthogonal Lab, as well as attend our Saturday Morning NeuroSim meetings. You will work with our Ethics, Society, and Technology group, and interactions with your colleagues is key. You will also want to become familiar with our various Open Source Sustainibility Models (GitHub - OREL-group/GSoC: A place to investigate potential Google Summer of Code opportunities. · GitHub) developed in previous years, as well as go through installation steps (GSoC/Open Source Sustainibility using LLMs at main · OREL-group/GSoC · GitHub) and various open issues related to LLAMOSC (GitHub · Where software is built)

Orthogonal Research and Education Lab: https://orthogonal-research.weebly.com/

Skill level: Intermediate

Required skills: The following languages and frameworks will be used extensively throughout the project: Python, PyQT and Ollama. This project will also involve working with Large Language Models, computational and agent-based models, UI design, and open-source community-building, so experience in these areas is helpful but not required. Knowledge of open-source development practices and an interest in interdisciplinary research are a must.

Time commitment: Full-time (350 h)

Lead mentor: Sarrah Bastawala (sarrahbastaw@gmail.com)

Project website: https://orthogonal-research.weebly.com/

Backup mentors: Bradly Alicea (bradly.alicea@outlook.com), Jesse Parent (jesse@jopro.org)

Tech keywords: Open Source Communities, Large Language Models (LLM), Agent-based Models, Python, PyQT, Ollama

2 Likes

Hi everyone! I am Kalpana Shanmugam, a B.Tech(AI&DS) pre-final year student from India, interested in applying for Project #8 (Open Source Community Sustainability LLM) for GSoC 2026 with INCF/OREL.

Background: I recently won the IBM Watsonx Agentic AI Hackathon , NexusGuardAI - a multi agent SOC Copilot built using RAG pipelines and tool-calling directly relevant to the Conversation Space and RAG work planned for this project.

What I’ve done so far:
→ Installed LLAMOSC locally on Windows 11 with Python 3.11 and llama3 via Ollama
→ Successfully ran the simulation through agent discussion and bidding phases
→ Identified and fixed a KeyError crash in rating_and_bidding.py when LLM-generated issue descriptions contain dict-like strings → PR submitted: Fix: KeyError crash when issue description contains dict-like strings in format templates by kalpana-Shan · Pull Request #134 · OREL-group/GSoC · GitHub

My technical question (@SarrahBastawala):
For Issue #60 (ConversationSpace), the current agent interaction model is task-driven and synchronous. When designing the IRC/Slack simulation, should ConversationSpace run as a parallel async loop alongside the existing CodeSpace, or trigger contextually only when an agent flags a task as blocked? I’m prototyping the async approach in my fork but want to align with your architectural vision first.

Looking forward to contributing to OREL!

Kalpana Shanmugam,
GitHub: kalpana-Shan
Fork: GitHub - kalpana-Shan/GSoC: Fork of OREL - GSoC 2026 Project #8 — LLAMOSC · GitHub

Hey, I’m Sandeep — interested in Project #8.

I went through the LLAMOSC codebase and noticed the RAG retriever had FAISS hardcoded with no way to swap backends. Based on the requirements in Issue #62, I:

  1. Created Issue #133 proposing a vector store abstraction layer
  2. Submitted PR #135 implementing FAISS and Chroma backends

Now you can switch backends with a single parameter:
retriever = RAGRetriever(backend=“chroma”)

Looking forward to feedback! Happy to add benchmarking next if this direction works.

GitHub: SandeepChauhan00 (Sandeep Chauhan) · GitHub

Hey Kalpana, nice work on the bug fix!

I’m working on the RAG side (#62) — just submitted PR #135
adding pluggable vector store backends. Once ConversationSpace
is ready, the RAG layer can index those conversations.

Maybe we can coordinate on the integration later?

Thanks Sandeep! Really glad the fix landed cleanly :blush:
That’s awesome. pluggable vector store backends is exactly the kind of abstraction ConversationSpace will need. I’m planning to look into the ConversationSpace + RAG integration side (#60/#62) next, so your PR is super relevant to what I want to work on.
Would love to coordinate once I dig into the integration layer , I’ve worked with RAG pipelines in a multi-agent setup before, so I have some ideas on how the indexing flow could work. Will ping you when I have something concrete!

Hi everyone! I’m Neeraja P, a third year B.Tech Computer Science student from Amrita Vishwa Vidyapeetham, India.

I have been working with Python, Flask, and machine learning, and I’m genuinely interested in contributing to Project #8 this GSoC.

Two of my projects feel relevant here: MindTune, a mood classification system using simulated EEG signals that maps Alpha/Beta wave patterns to playlist recommendations, and DreamBalance, a dream journaling platform that uses NLP concepts for emotion tagging and weekly mental health trend visualisation. Both are on my GitHub: JANE7J (JANEJ7) · GitHub

I also had the opportunity to work as an AI intern at IBM SkillsBuild on applied AI projects around mental health analysis, and recently as a project coordinator for an AI/ML intern team at UpToSkills, both experiences gave me a better understanding of how AI systems are built collaboratively.

I’ve cloned the LLAMOSC repository, installed all dependencies, pulled llama3 via Ollama, and spent time going through the codebase. I noticed that collaborative_team.py currently forms teams by ranking contributors on bid score and experience, but the agents don’t actually discuss or deliberate together on an issue, and the README lists this as future work under Issue #64.

I’d like to work on this, my initial thinking is a round-robin discussion loop where agents each propose an approach, respond to each other, and the Lead agent uses the LLM to synthesise a final decision. I’m still exploring the codebase and would love feedback on whether this fits the architectural vision.

A question for the mentors: should this collaboration layer live inside the ConversationSpace being developed by Kalpana, or operate as a separate deliberation layer within collaborative_team.py itself?

Thank you for putting together such a well-documented project, looking forward to learning from and contributing to OREL!

Hi everyone, I’m Anusha Manjunath, just finished my MSc in Data Science at Middlesex University London in January 2026. My dissertation was on synthetic data generation and RAG-assisted labelling for multimodal product recognition and it ended up winning top 5 best projects at my university.

I’ve been reading through this thread and thinking about the RAG implementation for ConversationSpace. I want to raise something worth discussing before we commit to a vector-based approach.

The fundamental problem with vector similarity search for community data is that similarity doesn’t always mean relevance. Two contributors can have identical behavioural patterns but completely different language. A vector search finds what’s closest in embedding space, not what’s actually correct for the reasoning task. For LLAMOSC’s growing community history this gets worse over time, not better.

I’ve been looking at a very recent paper from MIT CSAIL, Recursive Language Models by Zhang, Kraska and Khattab (December 2025, arxiv 2512.24601), that takes a completely different approach. Instead of chunking and embedding, the LLM treats the document as an external environment, writes code to peek into it, decomposes it recursively, and calls itself on subsections. No vector database, no embeddings, no chunking. They show RLMs handle inputs 100x beyond normal context windows without accuracy degradation, while standard LLMs degrade badly at long contexts.

For LLAMOSC this feels directly relevant. Community conversation history grows indefinitely. Vector RAG gets noisier as the knowledge base scales. An RLM-based approach would let agents reason through community history structurally rather than finding nearest embeddings, and it would actually get more useful as the history grows rather than less.

I benchmarked my hybrid system against LLM-only, RAG-only and CLIP-only baselines. The hybrid hit 92.4% on common patterns and 81.5% on rare edge cases. The evaluation methodology is something I’d bring directly to this project, because measuring whether retrieval actually improves agent decision quality compared to the baseline is something most implementations skip.

@sarrah.bastawala @arnab1896 — would love your thoughts
Happy to discuss whether RLMs make sense here or whether a hybrid of both approaches would work better

Hi mentors and team,

I am Shourya, a CSE student specializing in Python and local LLM architectures, focusing my GSoC 2026 application on Project 8 (LLAMOSC).

While configuring the local setup and analyzing the agent mechanics, I noticed that solve_issue_without_acr in contributor.py was hardcoded to always succeed (if True:). To restore the realistic failure rates and agent fatigue necessary for valid simulation data, I have submitted a PR that replaces this with a dynamic success probability matrix (weighted by agent experience, motivation, and task difficulty).

You can review this in PR #153 on the main repository.

As I draft my final proposal, I am focusing on solving the latency bottlenecks of local multi-agent simulations while increasing community realism. My proposed architecture includes:

  • Heterogeneous Competency Matrices: Moving away from generic workers by assigning agents overlapping, domain-specific technical stacks to simulate realistic peer-review debates.

  • Semantic Routing Architecture: Drastically reducing compute overhead by classifying issue complexity before LLM inference, routing simple fixes to lightweight heuristics and complex architectural changes to the heavy LLM agents.

I look forward to collaborating and would appreciate any initial feedback on the PR or the proposed architecture direction!

Continuing the discussion from GSOC 2026 Project #8 : Open source Community Sustainability LLM:

Hi everyone! I’m Kewen Chen, an MSCS student at Georgia Tech, currently focusing on distributed systems and scalable backend infrastructure.

My background is primarily in LLM inference engineering, I previously participated in building a multi-tenant LLM inference gateway using vLLM and FastAPI, handling 2.5M+ tokens daily with admission control, request queuing, and micro-batching for throughput optimization. I also have experience building full-stack applications with React and TypeScript on the frontend.

I came across Project #8 and find the core idea genuinely compelling, using LLM-powered agents to model open-source community dynamics is a creative and underexplored direction, and one that sits right at the intersection of my interests in agent systems and collaborative software engineering.

Looking forward to learning from everyone here and contributing to OREL!