As demonstrated by many organizations, open-source communities can do great things. But this is only true if the contributor community can maintain public goods such as the software codebase and institutional knowledge over time and despite contributor turnover. Moreover, as the demand for open-source software continues to grow, so do the challenges related to community management, collaboration, and sustainability. During GSoC 2024, one of the approaches to address this was the creation of LLAMOSC (LLM-Powered Agent-Based Model for Open Source Communities), a comprehensive framework designed to simulate and enhance the sustainability of open-source communities using Large Language Models (LLMs)(GSoC/Open Source Sustainibility using LLMs at main · OREL-group/GSoC · GitHub).
In 2024, the main work was done to simulate Github i.e a CodeSpace Environment within this framework, complete with issues of varying difficulty, contributor as well as maintainer agents with different coding ability and experience levels, an discussion space to discuss various approaches for a particular task among agents, an automated pull request lifecycle as well as multiple decision-making algorithms to choose task allocation for contributors and corresponding metrics for its simulation. For this project, the need is to maintain and develop the LLAMOSC framework. Additional features also include to improve upon the underlying models (Add Collaboration Algorithm for Multiple Agents on a Single Issue · Issue #64 · OREL-group/GSoC · GitHub) and add a ConversationSpace (Add ConversationSpace (to Simulate Slack) · Issue #60 · OREL-group/GSoC · GitHub) within this framework to simulate a IRC (Internet Relay Chat) / Slack / Discord model, an essential part of many open source communities. A possible approach for this is using RAG (Retrieval Augmented Generation) (Integrate RAG within ConversationSpace and GithubDiscussion · Issue #62 · OREL-group/GSoC · GitHub) but other approaches backed by research are also welcome. Our goal is to develop one or more maintainers of the platform who are also capable of research software engineering (https://www.hpcwire.com/off-the-wire/ncsa-innovators-bridging-research-and-software-engineering/).
What can I do before GSoC? You can join the Orthogonal Lab, as well as attend our Saturday Morning NeuroSim meetings. You will work with our Ethics, Society, and Technology group, and interactions with your colleagues is key. You will also want to become familiar with our various Open Source Sustainibility Models (GitHub - OREL-group/GSoC: A place to investigate potential Google Summer of Code opportunities. · GitHub) developed in previous years, as well as go through installation steps (GSoC/Open Source Sustainibility using LLMs at main · OREL-group/GSoC · GitHub) and various open issues related to LLAMOSC (GitHub · Where software is built)
Orthogonal Research and Education Lab: https://orthogonal-research.weebly.com/
Skill level: Intermediate
Required skills: The following languages and frameworks will be used extensively throughout the project: Python, PyQT and Ollama. This project will also involve working with Large Language Models, computational and agent-based models, UI design, and open-source community-building, so experience in these areas is helpful but not required. Knowledge of open-source development practices and an interest in interdisciplinary research are a must.
Time commitment: Full-time (350 h)
Lead mentor: Sarrah Bastawala (sarrahbastaw@gmail.com)
Project website: https://orthogonal-research.weebly.com/
Backup mentors: Bradly Alicea (bradly.alicea@outlook.com), Jesse Parent (jesse@jopro.org)
Tech keywords: Open Source Communities, Large Language Models (LLM), Agent-based Models, Python, PyQT, Ollama