GSOC 2026 Project #19 : ActiveVision: continued development of a data and model portal for the study of goal-directed vision

Mentors: Buxin Liao <buxin.liao@mail.mcgill.ca>, Katarzyna Jurewicz <jurewicz.ka@gmail.com>, Suresh Krishna <suresh.krishna@mcgill.ca>

Skill level: Intermediate – Advanced

Required Skills: Familiarity with open-source vision and multimodal AI models. Fluency in Python and PyTorch. Familiarity with Slurm and working with clusters preferred. Basic web-development skills or interest in learning them will be useful.

Time commitment: Full time (350 hours)

About: Salience map research in computer vision has extensively examined where human observers look in images and videos during free viewing. Despite cognitive psychology recognizing the role of behavioral goals for over 50 years, integrating task dependence into quantitative models and large open datasets is a recent development. This project aims to create an open portal that consolidates existing machine learning/AI models and eye-tracking datasets related to goal-directed vision (e.g., visual search) while providing tools for model testing and validation. A key focus is on multimodal AI, particularly language-vision integration. Additionally, this platform will serve as a prototype for similar data+model initiatives on public hardware platforms.

Aims: In last year’s GSoC project ( GSoC 2025 report.md · GitHub), we created a library of machine vision models and a toolbox for their application to scanpath datasets. Over the past year, there has been substantial progress in terms of better models and datasets. This year’s project aims to bring the library and toolbox up to date, and additionally create a user-facing web portal on Compute Canada that will facilitate submission and evaluation of models on scanpath datasets.

Project website: GitHub - m2b3/ActiveVisionPortal: This project is the work for the Google Summer of Code 2025, with the organization INCF. · GitHub and
GitHub - m2b3/SciCommons-frontend · GitHub

Tech keywords: Python, PyTorch, Visual search, Saliency, Science portals, Vision AI, Vision-language models.

Hello everyone,

My name is Katerina and I’m very interested in the ActiveVision project for GSoC 2026.

I have a background in software engineering and I also hold a Master’s degree in Neuroscience. During my studies I completed a six-month research internship at a neuroscience laboratory at UCL, where I worked with research workflows and scientific datasets.

Because of this background, I’m particularly interested in projects that combine machine learning models, scientific datasets, and platforms that make them easier to explore and evaluate.

From a technical perspective, I have experience with backend development (Java / REST APIs) and data-driven applications, and I am very interested in Python-based AI workflows and model evaluation pipelines.

The idea of building a portal that integrates vision models and eye-tracking datasets for goal-directed vision research sounds very exciting, especially as a tool for organizing and benchmarking models on scientific datasets.

I would love to explore the ActiveVisionPortal repository and learn more about how contributors can get involved.

A few questions:
• Are there recommended first issues or starter tasks for new contributors?
• Which parts of the project are currently the highest priority (model library updates, evaluation pipeline, or the web portal)?
• Is there documentation describing the current architecture of the portal and toolbox?

Looking forward to learning more about the project.

Best regards,
Katerina Eleftheriadi

Thank you for your interest. Please join the community at alphatest.scicommons.org.

This project is focused more on the computer vision component than the portal, so some degree of fluency with computer vision would be ideal. Please go through last year’s report and the repo if this project fits your skills and interests.

Hello everyone,

I’m Apurva Sharma, a B.Tech student and I’m interested in contributing to the ActiveVision data and model portal project for GSoC 2026.

I’ve started exploring the repository and understanding how models are registered and executed through main.py and model_registry.py.

While working with the CLI (e.g., --list_models), I noticed that the current registry eagerly imports all models at startup. This leads to dependency issues (for example, models like HAT require Detectron2 even when not being used).

To address this, I experimented with a lazy-loading approach for the model registry so that models are imported only when explicitly requested. This allowed the CLI to function without requiring all heavy dependencies upfront.

I’m currently continuing to explore the IRL model and the overall pipeline, and I’d appreciate any guidance on where I can contribute further or if there are beginner-friendly issues I should focus on.

Thank you!
Apurva Sharma

1 Like

Thank you for your interest. Please join the community at alphatest.scicommons.org.

This project is focused on the computer vision component, so some degree of fluency with computer vision would be ideal. Please go through last year’s report and the repo, and feel free to propose extensions, new features, ideas etc.

At this late stage, we are not focusing on PRs or issues. Once GSoC starts, we can discuss that with whoever is still around (selected intern(s), volunteers etc).

Thanks for the guidance!

I’ve joined the community and will go through last year’s report and the repository in more detail.

I’ll focus on understanding the computer vision components and come back with some concrete ideas or possible improvements soon.

Thanks again!
Apurva Sharma

Hello Sir ,

I’ve started going through the report and exploring the evaluation part of the repository.

One thing I noticed is that while it’s easy to run individual models, it’s not very straightforward to compare the results across models or track outputs clearly.

So, I’m thinking of adding a simple way to compare multiple models in a single run , extending the model registry with basic metadata (like description or dataset) and structuring evaluation outputs so results are easier to track and analyze.

Any suggestions or guidance on this would be really helpful.

Thanks!

1 Like

Hi everyone,

I’ve been reviewing the 2025 GSoC report and the current repository, and I have a few questions to help me understand the ‘Core Interest’ for the project. I see three high-level potential paths for improving the codebase. And I would love to know which aligns best with the mentors’ vision:

  1. Repository Expansion: focusing on integrating a wider variety of recent SOTA models (e.g., from CVPR or ETRA 2025/2026) to ensure the library includes the most recent solutions from the research community.

  2. Algorithmic Enhancement: improving existing models by refining data preprocessing or implementing more sophisticated feature extraction. For example, focusing on embedding extraction from multimodal data to provide richer informational layers for the models.

  3. Standardized Benchmarking: focusing on the ‘evaluation’ aspect and identifying performance gaps in current models and implementing meaningful metrics based on gaze data properties.

Additionally, are there particular recent studies (or specific architectures) you are eyeing as high-priority references for this year? I assume that knowing some references would help applicants tailor more meaningful proposals.

Thank you for the response!

Best,

Kate

1 Like

@Apurva_Sharma sure, you can go ahead and do that. please note that your proposal should have a much larger scope.. and as I mentioned, smaller issues can be tackled after the proposal submission deadline.

@katemel - These are excellent lines of thought and all three are perfectly aligned with where we want to go. We do not have a list of particular studies or architectures that we want to constrain you with. However, we are generally more interested in open-weight models that we can host and run inference with ourselves on the Compute Canada slurm cluster (rather than operate with remotely-hosted paid APIs and incur charges).

We look forward to your proposal ! We are happy to give comments on 1 version, if we get it via DM well in time.

All the best !

Hi everyone,

I am Divyanshu Choudhury and I am applying for this project for GSoC 2026. I have spent the last couple of days going through the 2025 GSoC report, the architecture diagram and both repositories. It looks like last year’s work set up three really solid pipelines on the Compute Canada cluster, the evaluation framework, the CLIP embedding toolkit and the VLM inference engine. I also dug into the SciCommons codebase a bit, both the Django backend and the Next.js frontend, since the project description mentions it as the web portal layer. Right now the system is mostly CLI and HPC based which makes total sense since the priority last year was getting the core pipelines running. My understanding is that the 2026 goal is to build out the researcher facing layer so people without direct Compute Canada access can submit models and compare results through a browser.

Before I finalise my proposal I had a couple of questions. For the model submission flow on the portal, should researchers submit a GitHub link that gets cloned and run on the cluster, or should they be able to upload their weights and checkpoints directly? That detail changes how the job dispatching would need to work quite significantly. I was also wondering if the portal is meant to be a standalone deployment or if it makes more sense to integrate it as a dedicated community inside the existing SciCommons platform, since the existing submission and review workflows there seem to map reasonably well onto model evaluation.

And since COCO-Search18 is currently the only dataset in the system, are there any other high priority datasets you want to see added this year to make the benchmarking more meaningful?

Thanks a lot for your time, looking forward to hearing your thoughts.

1 Like

SciCommons and ActiveVision are two independent projects.

As for the model submission flow, you can visualize anything reasonable for the proposal.. the exact patterns for this sub-task will change when the actual codign period starts, what you can do is propose somethign that you think will work (well).

Part of your proposal can include proposing inclusion of other datasets, models etc.. so you would do the literature search for that. All the best !

1 Like

Hi everyone,
I’m Ansuman Patra, a sophomore at IIT Varanasi with a strong focus on Computer Vision and Deep Learning. I’m very interested in applying for the ActiveVision project for GSoC 2026. I have prior experience on working with open-source computer vision models as well as training and building models from scratch.
I’ve gone through last year’s GSoC report and explored the repository. The three-module architecture built from scratch is impressive particularly the unified evaluation framework with the MODEL_REGISTRY system, the CLIP embedding toolkit supporting OpenCLIP/SigLIP variants, and the multi-GPU VLM inference engine. What excites me most is the vision-language integration for goal-directed attention, since I’ve worked on similar multimodal problems.

I wanted to ask some questions before I finalize my proposal:

1.Dataset/Model Priorities: Beyond COCO-Search18 and the 5 existing models, which direction would be most valuable-expanding dataset coverage, adding newer models, or improving evaluation metrics?
2.Portal Integration: Should the submission workflow extend the existing MODEL_REGISTRY system in main.py, or would a separate API layer be better? Any preference on submission methods (Git repos vs. direct uploads)?
3.Scope Priority: For the portal, should I focus on general model evaluation infrastructure first, or also plan to integrate task-specific features from vlm_parallel.py

I’m already exploring the codebase and finalising my proposal and just wanted to confirm that we should follow 2025’s proposal template right?
Also, do you provide any feedback on the submitted proposals?
I extend my gratitude to the mentors for this amazing project opportunity

1 Like

Welcome..

  1. All three

  2. As you wish

  3. General first, but planning to integrate would be within scope

    Follow the template here: Recommendations for GSoC contributors | INCF

    Yes, we can provide comments on 1 draft

    All the best and looking forward to you proposal.

Thank you for the reply sir, just wanted to ask 2 more questions -

  1. As you said you will provide comments on 1st draft, should we submit that 1st draft directly on the gsoc portal or email it to you to get feedback
  2. Can we submit proposals for more than 1 projects under INCF?
  1. here via dm
  2. yes, you can, but you have to mention it (and other proposals you are submitting to other orgs) in your proposal.

Hello Sir,

After exploring the codebase, I’ve been structuring my proposal around four connected contributions:

  1. Experiment Engine - a YAML config where you define models, datasets and metrics once, and a batch runner that executes everything together and stores results in a standard format. The lazy-loading fix I implemented earlier would be the foundation of this.

  2. VLM Integration - adding open-weight models like LLaVA-1.5 and InstructBLIP that can run directly on Compute Canada without depending on paid external APIs.

  3. Web Portal — building on the existing SciCommons frontend, adding pages for experiment submission, a model leaderboard, and results viewing, with a FastAPI backend connecting to the experiment engine.

  4. Task-Conditioned Evaluation - evaluating models separately by task type (free-viewing vs visual search), and for VLMs, passing the task as a text prompt to measure whether language context improves goal-directed scanpath predictions.

One quick question- would Slurm job submission from within the portal be in scope for this year, or better left for later?

Thanks,
Apurva

SciCommons and ActiveVision are two independent projects.

The current project repo already depends on Slurm submission.

1 Like

Hello everyone,

My name is Vijaya Durga Adithya. I am a 3rd-year undergraduate student in the AI&DS department at KL University, Hyderabad. As an AI&DS student specializing in Computer Vision, I have built a project for BiLSTM-Transformer architectures for Telugu alphabet recognition detection. Currently, I am working on a personal project involving meta-awareness semantic segmentation vision model in temple environments for future robotics.

I have a deep interest in building models at the intersection of neuroscience and ML, specifically exploring Indian knowledge systems and theories of consciousness. While I have not contributed to this codebase yet, I plan to explore it over the next 24 hours. Although I may not be able to provide “fast” contributions immediately just for selection, I am confident that I can provide high-quality contributions throughout the full project session if selected.

I will submit my proposal by the night of the 25th. @suresh.krishna sir, I would appreciate any feedback or guidance you might have before then.

1 Like

Welcome. Please go through the messages above. We can provide feedback on 1 version of your proposal if given to us in time. All the best.

1 Like