GSoC 2025 Project Idea #5 Orthogonal Research :: Open source Community Sustainability (350h)

arnab1896 · February 27, 2025, 6:50pm

Mentors: Bradly Alicea @b.alicea <bradly.alicea@outlook.com>, Sarrah Bastawala @sarrah_basta <sarrahbastaw@google.com>, Jesse Parent @jparent <jesse@jopro.org>

Skill level: Advanced

Required skills: Expertise or the ability to integrate multiple development environments is an important baseline skill. The ability to extract model representations from complex systems is helpful. Knowledge of open-source development practices and an interest in interdisciplinary research are a must.

Time commitment: Fulltime (350 hours)

Forum for discussion

About: Open-source communities are only as powerful as their ability to collectively complete tasks and projects. One way to enable the functional capacity of such a community is to model the collective behavioral and cognitive aspects of day-to-day project engagement. Your current involvement will involve the maintenance, development, and further implementation of two models from past years: a Reinforcement Learning model, and a hybrid Agent-based/Large Language Model.

The candidate will build an analytical model that incorporates features such as general feedback loops (recurrent relationships) and causal loops (reciprocal causality). This might be in the form of a traditional boxes and arrows (input-output) model, or something more exotic such as Reinforcement Learning.

Aims: In 2024, Github activity was simulated with a CodeSpace Environment. This included generating issues of varying difficulty for both contributor and maintainer agents. Implementing CodeSpace resulted in the following capabilities: different coding ability and experience levels, a discussion space to discuss various approaches for a particular task among agents, an automated pull request lifecycle as well as multiple decision-making algorithms to choose task allocation for contributors and corresponding metrics for its simulation. For 2025, you might help to improve upon the underlying models (Add Collaboration Algorithm for Multiple Agents on a Single Issue · Issue #64 · OREL-group/GSoC · GitHub) or add a ConversationSpace (Add ConversationSpace (to Simulate Slack) · Issue #60 · OREL-group/GSoC · GitHub) within this framework to simulate a IRC (Internet Relay Chat) / Slack / Discord model, an essential part of many open source communities. Last year’s project utilized RAG (Retrieval Augmented Generation) (Integrate RAG within ConversationSpace and GithubDiscussion · Issue #62 · OREL-group/GSoC · GitHub), but other approaches backed by research are also welcome. Our goal is to develop one or more maintainers of the platform who are also capable of research software engineering (https://www.hpcwire.com/off-the-wire/ncsa-innovators-bridging-research-and-software-engineering/).

What can I do before GSoC?

You can join the Orthogonal Lab Slack and Github, as well as attend our Saturday

Morning NeuroSim meetings. You might also become familiar with the existing codebase:

LLAMOSC (Agent-based and Large Language Hybrid Model): GSoC/Open Source Sustainibility using LLMs at main · OREL-group/GSoC · GitHub
MARLSOC (Multi-agent Reinforcement Learning): https://github.com/OREL-

group/GSoC/tree/main/Open%20Source%20Sustainability%20using%20RL

Project website: https://orthogonal-research.weebly.com

Tech keywords: Computational Modeling, Reinforcement Learning, Language Models

Vidhi_Rohira · March 2, 2025, 5:51am

Hi, I’m Vidhi Rohira, a sophomore at Veermata Jijabai Technological Institute (VJTI), Mumbai, India. I have a keen interest in open-source contributions, reinforcement learning, and AI research.

During Summer 2024, I worked on Super MaRLo Bros, a project where I explored reinforcement learning algorithms to train an AI agent to play a custom game inspired by Super Mario. This involved experimenting with Q-learning, Deep Q-Networks (DQN), and Proximal Policy Optimization (PPO), ultimately concluding that PPO provided better results.

Additionally, I have participated in several hackathons and competitions across colleges, which have strengthened my understanding of LLMs, AI-driven systems, and community-driven research. These experiences have given me hands-on exposure to working with AI models and developing solutions that align with real-world applications.

I am excited about the opportunity to contribute to GSoC 2025 Project Idea #5: Open Source Community Sustainability, as my interests in AI, research, and open-source collaboration align with the project’s goals. I look forward to leveraging my experience and learning from the community to make meaningful contributions.

I’d love to hear any guidance or suggestions on how I can further prepare for GSoC and improve my contributions. Are there specific areas I should focus on to strengthen my application and make a greater impact?

Vidhi-Rohira-Resume

sarrah_basta · March 15, 2025, 3:53pm

Hi Vidhi,

Thank you for sharing your background and interests—your work and experience with both large language models and reinforcement learning is impressive and aligns well with the goals of the project.

As you move forward, one of the key decisions you’ll need to make is whether you’d like to build your proposal around the LLAMOSC (LLM-based Agent Modeling for Open Source Communities) model or the MARLOSC (RL-based simulation for Open Source Communities) model. Both are great directions with existing work, and your experience could add meaningful value to either—or even a combination of the two.

There are currently several open issues under the LLAMOSC repository that highlight future scope and enhancement ideas, such as adding RAG or having multiple agents collaborate. If you plan to center your proposal around LLAMOSC, I recommend:

Going through the existing README and documentation to understand the model as well as metrics and important concepts regarding open-source sustainibility,
Setting it up locally to understand how the current pipeline works, and
Exploring the open issues to see what areas align with your interests.

Feel free to DM me anytime if you’d like to discuss ideas or need help navigating the LLAMOSC model.

Also, if you’re considering combining elements of LLAMOSC and MARLOSC, or proposing a new approach altogether, that’s absolutely encouraged! Just try to back it with some initial research or experimentation, and feel free to share your thoughts in the weekly meetings.

Looking forward to seeing your contributions!

Sayan_Mandal1 · March 23, 2025, 5:56pm

Dear @sarrah_basta , @jparent , @b.alicea

I hope you are doing well. My name is Sayan and I am an AI Researcher at the CMATER Lab(Jadavpur University) and an ex-Research Intern at IIEST. I’m passionate about open-source contributions and research in AI, particularly in reinforcement learning (RL) and language models.

I’m really excited about contributing to Open Source Community Sustainability in GSoC 2025. After reviewing the project details and previous discussions, I’m particularly interested in exploring the integration of LLAMOSC (LLM-based Agent Modeling for Open Source Communities) and MARLOSC (RL-based simulation for Open Source Communities).

Given my background in developing AI models for complex systems, such as:

Super MaRLo Bros: Training AI agents using RL algorithms like Q-learning, DQN, and PPO.
Extensive work with LLMs and RAG-based systems during my internship and academic projects.
Experience in building hybrid models, including reinforcement learning and explainable AI (LIME and SHAP), which aligns with the goals of this project.

I’m eager to deep dive into the existing codebases of LLAMOSC and MARLOSC to understand the models, pipelines, and contribution scopes. I’ve already gone through the project repositories and reviewed the open issues, including:

Adding Collaboration Algorithm for Multiple Agents on a Single Issue
Integrating RAG within ConversationSpace and GitHubDiscussion

My initial thought is to focus on enhancing collaboration between agents through improved decision-making or proposing a hybrid solution that leverages features from both LLAMOSC and MARLOSC. I’m also open to exploring a new approach, as encouraged in the previous conversation.

Could you please guide me on:

Any specific areas or issues where you’d recommend focusing to strengthen my proposal?
Suggestions for initial research or experimentation that could add value to the project?

I’d love to discuss my ideas and get your insights in the upcoming weekly meetings. Thank you so much for your time and guidance, I’m looking forward to contributing meaningfully to the project!

Best regards,
Sayan Hasan Mandal
sayanjones77@gmail.com

LauraWay · March 24, 2025, 9:44pm

Hi! I’m Laura, and I’m very interested in your MARLOSC project, particularly the potential it holds for modeling open-source communities. I noticed the title highlights the Multi-Agent Reinforcement Learning model, and while I’m especially drawn to the LLM and ConversationSpace integration, I’m intrigued by the project’s overall approach.

I understand that the RL aspect forms a core part of MARLOSC, and I’m keen to learn more about the research behind it. While the README provided a high-level overview, I’m curious about the specific methodologies and challenges you’ve encountered in implementing the RL model. I believe that understanding the RL foundation is crucial for effectively contributing to the LLM and ConversationSpace components, as they’ll likely interact within the simulated environment.

My background is in full-stack development, and I have solid experience with GitHub and Git, which I understand are essential for this project. I’m also actively expanding my knowledge in AI, particularly LLMs. I’m particularly excited about the possibility of contributing to the ConversationSpace and RAG integration, as I see great potential in using LLMs to simulate and improve the collaborative dynamics of open-source communities.

I’m eager to delve into the MARLOSC codebase, and taking some peeks at the LLAMOSC codebase in more detail.

I’m also interested in attending the Saturday Morning NeuroSim meetings to gain a deeper understanding of the project’s goals and progress. I have a few specific questions about these meetings and the project’s current state—would it be alright if I DM you to discuss them further before submitting a proposal?

Thank you for considering my application. I’m eager to contribute to this project and learn from your team.

Cheers,
Laura Way

sarrah_basta · March 28, 2025, 4:10pm

Welcome @Sayan_Mandal1

Both the issues you have pointed out are interesting points in the future of the open source sustainability project. As you have also mentioned, you can either choose one if the models (LLAMOSC or MARLOSC) to work on, or create a hybrid approach using elements from both.

Your next steps would be to finalise your approach and understand and research the exaxt requirements to tackle the issues you have proposed to solve.

Also, joining the lab’s slack channel and joining the meetings to discuss your ideas and communicate with the lab members would be great. Look forward to seeing you!

Sayan_Mandal1 · March 28, 2025, 9:12pm

Dear @sarrah_basta,

Thank you so much for the detailed guidance! I’m moving forward with this.

After reflecting on the possible directions, I’m inclined to center my proposal around LLAMOSC while exploring the possibility of incorporating elements from MARLSOC to enhance collaboration and decision-making.

I’ll start by:
Diving deeper into the README and documentation to understand the model, metrics, and open-source sustainability concepts.

Setting up the pipeline locally to grasp the current workflows and integration points.
Exploring the open issues to identify areas where I can make impactful contributions.

I’ll also join the lab’s Slack channel and participate in the weekly meetings to engage with the community and exchange ideas. I appreciate the offer to DM for any questions—I’ll definitely reach out if I need guidance while exploring the model.

Looking forward to collaborating with you all and contributing meaningfully to this exciting project!

Best regards,
Sayan

sarrah_basta · March 29, 2025, 2:33pm

Welcome Laura! Yes sure, feel free to dm as well as join the slack to communicate with all the lab members anytime. The Saturday Morning Neurosim meeting is on right now, feel free to join at SMN Room | Jitsi Meet.

ravjot07 · March 31, 2025, 1:28am

Hi @sarrah_basta , @jparent

I’m Ravjot Singh, a Computer Science undergraduate at IIIT Gwalior with a strong background in open-source development, automation, and systems-level thinking. Over the past year, I’ve contributed to high-impact projects under the Linux Foundation, including Kmesh (CNCF), where I developed and integrated advanced CI pipelines and testing strategies to enhance reliability in distributed environments. I also contributed to Veraison (Confidential Computing Consortium), where I worked on command logic, CI automation, and modular system improvements.

My work combines system-level programming with cognitive and collaborative modeling. I’ve built tools like Duplyzer Go, a concurrent file deduplication CLI using Go and Docker, and I’ve developed production-ready Figma plugins to streamline design pipelines. These experiences have sharpened my ability to navigate and integrate multiple development environments, automate agent workflows, and reason about human-computer collaboration—skills I’m excited to apply to this project.

I’m deeply interested in modeling emergent behavior in open-source communities and exploring multi-agent collaboration systems through frameworks like LLAMOSC and MARLSOC. The idea of expanding on last year’s reinforcement learning and LLM-based agent simulations—especially integrating recurrent and causal feedback loops, and possibly building a ConversationSpace—aligns strongly with my curiosity around decentralized coordination and intelligent systems.

Looking forward to contributing meaningfully to the platform and growing as both a researcher and a developer.

Warm regards,
Ravjot Singh
ravu2004@gmail.com | GitHub | Portfolio