Mentors: Bradly Alicea @b.alicea <bradly.alicea@outlook.com>, Sarrah Bastawala @sarrah_basta <sarrahbastaw@google.com>, Jesse Parent @jparent <jesse@jopro.org>
Skill level: Advanced
Required skills: Expertise or the ability to integrate multiple development environments is an important baseline skill. The ability to extract model representations from complex systems is helpful. Knowledge of open-source development practices and an interest in interdisciplinary research are a must.
Time commitment: Fulltime (350 hours)
Forum for discussion
About: Open-source communities are only as powerful as their ability to collectively complete tasks and projects. One way to enable the functional capacity of such a community is to model the collective behavioral and cognitive aspects of day-to-day project engagement. Your current involvement will involve the maintenance, development, and further implementation of two models from past years: a Reinforcement Learning model, and a hybrid Agent-based/Large Language Model.
The candidate will build an analytical model that incorporates features such as general feedback loops (recurrent relationships) and causal loops (reciprocal causality). This might be in the form of a traditional boxes and arrows (input-output) model, or something more exotic such as Reinforcement Learning.
Aims: In 2024, Github activity was simulated with a CodeSpace Environment. This included generating issues of varying difficulty for both contributor and maintainer agents. Implementing CodeSpace resulted in the following capabilities: different coding ability and experience levels, a discussion space to discuss various approaches for a particular task among agents, an automated pull request lifecycle as well as multiple decision-making algorithms to choose task allocation for contributors and corresponding metrics for its simulation. For 2025, you might help to improve upon the underlying models (Add Collaboration Algorithm for Multiple Agents on a Single Issue · Issue #64 · OREL-group/GSoC · GitHub) or add a ConversationSpace (Add ConversationSpace (to Simulate Slack) · Issue #60 · OREL-group/GSoC · GitHub) within this framework to simulate a IRC (Internet Relay Chat) / Slack / Discord model, an essential part of many open source communities. Last year’s project utilized RAG (Retrieval Augmented Generation) (Integrate RAG within ConversationSpace and GithubDiscussion · Issue #62 · OREL-group/GSoC · GitHub), but other approaches backed by research are also welcome. Our goal is to develop one or more maintainers of the platform who are also capable of research software engineering (https://www.hpcwire.com/off-the-wire/ncsa-innovators-bridging-research-and-software-engineering/).
What can I do before GSoC?
You can join the Orthogonal Lab Slack and Github, as well as attend our Saturday
Morning NeuroSim meetings. You might also become familiar with the existing codebase:
group/GSoC/tree/main/Open%20Source%20Sustainability%20using%20RL
Project website: https://orthogonal-research.weebly.com
Tech keywords: Computational Modeling, Reinforcement Learning, Language Models
1 Like
Hi, I’m Vidhi Rohira, a sophomore at Veermata Jijabai Technological Institute (VJTI), Mumbai, India. I have a keen interest in open-source contributions, reinforcement learning, and AI research.
During Summer 2024, I worked on Super MaRLo Bros, a project where I explored reinforcement learning algorithms to train an AI agent to play a custom game inspired by Super Mario. This involved experimenting with Q-learning, Deep Q-Networks (DQN), and Proximal Policy Optimization (PPO), ultimately concluding that PPO provided better results.
Additionally, I have participated in several hackathons and competitions across colleges, which have strengthened my understanding of LLMs, AI-driven systems, and community-driven research. These experiences have given me hands-on exposure to working with AI models and developing solutions that align with real-world applications.
I am excited about the opportunity to contribute to GSoC 2025 Project Idea #5: Open Source Community Sustainability, as my interests in AI, research, and open-source collaboration align with the project’s goals. I look forward to leveraging my experience and learning from the community to make meaningful contributions.
I’d love to hear any guidance or suggestions on how I can further prepare for GSoC and improve my contributions. Are there specific areas I should focus on to strengthen my application and make a greater impact?
Vidhi-Rohira-Resume
1 Like
Hi Vidhi,
Thank you for sharing your background and interests—your work and experience with both large language models and reinforcement learning is impressive and aligns well with the goals of the project.
As you move forward, one of the key decisions you’ll need to make is whether you’d like to build your proposal around the LLAMOSC (LLM-based Agent Modeling for Open Source Communities) model or the MARLOSC (RL-based simulation for Open Source Communities) model. Both are great directions with existing work, and your experience could add meaningful value to either—or even a combination of the two.
There are currently several open issues under the LLAMOSC repository that highlight future scope and enhancement ideas, such as adding RAG or having multiple agents collaborate. If you plan to center your proposal around LLAMOSC, I recommend:
- Going through the existing README and documentation to understand the model as well as metrics and important concepts regarding open-source sustainibility,
- Setting it up locally to understand how the current pipeline works, and
- Exploring the open issues to see what areas align with your interests.
Feel free to DM me anytime if you’d like to discuss ideas or need help navigating the LLAMOSC model.
Also, if you’re considering combining elements of LLAMOSC and MARLOSC, or proposing a new approach altogether, that’s absolutely encouraged! Just try to back it with some initial research or experimentation, and feel free to share your thoughts in the weekly meetings.
Looking forward to seeing your contributions!
1 Like
Dear @sarrah_basta , @jparent , @b.alicea
I hope you are doing well. My name is Sayan and I am an AI Researcher at the CMATER Lab(Jadavpur University) and an ex-Research Intern at IIEST. I’m passionate about open-source contributions and research in AI, particularly in reinforcement learning (RL) and language models.
I’m really excited about contributing to Open Source Community Sustainability in GSoC 2025. After reviewing the project details and previous discussions, I’m particularly interested in exploring the integration of LLAMOSC (LLM-based Agent Modeling for Open Source Communities) and MARLOSC (RL-based simulation for Open Source Communities).
Given my background in developing AI models for complex systems, such as:
- Super MaRLo Bros: Training AI agents using RL algorithms like Q-learning, DQN, and PPO.
- Extensive work with LLMs and RAG-based systems during my internship and academic projects.
- Experience in building hybrid models, including reinforcement learning and explainable AI (LIME and SHAP), which aligns with the goals of this project.
I’m eager to deep dive into the existing codebases of LLAMOSC and MARLOSC to understand the models, pipelines, and contribution scopes. I’ve already gone through the project repositories and reviewed the open issues, including:
- Adding Collaboration Algorithm for Multiple Agents on a Single Issue
- Integrating RAG within ConversationSpace and GitHubDiscussion
My initial thought is to focus on enhancing collaboration between agents through improved decision-making or proposing a hybrid solution that leverages features from both LLAMOSC and MARLOSC. I’m also open to exploring a new approach, as encouraged in the previous conversation.
Could you please guide me on:
- Any specific areas or issues where you’d recommend focusing to strengthen my proposal?
- Suggestions for initial research or experimentation that could add value to the project?
I’d love to discuss my ideas and get your insights in the upcoming weekly meetings. Thank you so much for your time and guidance, I’m looking forward to contributing meaningfully to the project!
Best regards,
Sayan Hasan Mandal
sayanjones77@gmail.com
1 Like
Hi! I’m Laura, and I’m very interested in your MARLOSC project, particularly the potential it holds for modeling open-source communities. I noticed the title highlights the Multi-Agent Reinforcement Learning model, and while I’m especially drawn to the LLM and ConversationSpace integration, I’m intrigued by the project’s overall approach.
I understand that the RL aspect forms a core part of MARLOSC, and I’m keen to learn more about the research behind it. While the README provided a high-level overview, I’m curious about the specific methodologies and challenges you’ve encountered in implementing the RL model. I believe that understanding the RL foundation is crucial for effectively contributing to the LLM and ConversationSpace components, as they’ll likely interact within the simulated environment.
My background is in full-stack development, and I have solid experience with GitHub and Git, which I understand are essential for this project. I’m also actively expanding my knowledge in AI, particularly LLMs. I’m particularly excited about the possibility of contributing to the ConversationSpace and RAG integration, as I see great potential in using LLMs to simulate and improve the collaborative dynamics of open-source communities.
I’m eager to delve into the MARLOSC codebase, and taking some peeks at the LLAMOSC codebase in more detail.
I’m also interested in attending the Saturday Morning NeuroSim meetings to gain a deeper understanding of the project’s goals and progress. I have a few specific questions about these meetings and the project’s current state—would it be alright if I DM you to discuss them further before submitting a proposal?
Thank you for considering my application. I’m eager to contribute to this project and learn from your team.
Cheers,
Laura Way