GSoC 2025 Project #25 ActiveVision :: a data and model portal for the study of goal-directed vision (350h)

Mentors: Katarzyna Jurewicz <Katarzyna.jurewicz@mcgill.ca>, Buxin Liao <buxin.liao@mail.mcgill.ca>, Suresh Krishna <suresh.krishna@mcgill.ca>

Skill level: Advanced

Required Skills: Fluency in Python and PyTorch. Familiarity with open-source vision and multimodal AI models. Familiarity with Slurm and working with clusters preferred. Basic web-development skills or interest in learning them will be useful.

Time commitment: Full time (350 hours)

Forum for discussion

About: Salience map research in computer vision has extensively examined where human observers look in images and videos during free viewing. Despite cognitive psychology recognizing the role of behavioral goals for over 50 years, integrating task dependence into quantitative models and large open datasets is a recent development.

Aims: This project aims to create an open portal that consolidates existing machine learning/AI models and eye-tracking datasets related to goal-directed vision (e.g., visual search) while providing tools for model testing and validation. A key focus will be on multimodal AI, particularly language-vision integration. Additionally, this platform will serve as a prototype for similar data+model initiatives on public hardware platforms.

Project website: m2b3/ActiveVision

Tech keywords: Python, PyTorch, Visual search, Saliency, Science portals, Vision AI, Vision-language models.

Dear @suresh.krishna, Buxin Liao and Katarzyna Jurewicz,

I hope you are doing well.

My name is Ishaan Chandak, and I am a final-year Information Technology student at VJTI Mumbai. I am deeply passionate about exploring the intersection of deep learning and biology.

I’m excited about the opportunity to contribute to this project, and I’ve been exploring different approaches and research directions to better understand how multimodal systems can effectively predict and evaluate task-driven visual attention. Also, I’ve been exploring some existing visual search datasets and evaluation metrics commonly used in saliency research. This has given me a broader perspective on how saliency models can be evaluated. I look forward to learning from you and understanding how I can meaningfully support the ongoing work. Along the way, I’ve been documenting the research and my learnings to better structure my understanding and keep track of useful insights.

I would be happy to contribute to the project and complete whatever starting tasks will allow me to become more familiar with it — please do let me know if there are some relevant tasks for me.

I am also attaching my resume here for your reference.

Best regards,

Ishaan Chandak

Hello @suresh.krishna , Katarzyna jurewicz and Buxin Liao,

I hope you’re doing well. My name is Aniruddha Roy, and I’m a machine learning enthusiast eager to learn more about this project. However, I’m unsure how to get started or contribute effectively, and I would greatly appreciate your guidance.

With hands-on experience in Python, PyTorch, and computer vision, I have worked on projects involving CNNs, U-Net architectures, and vision-language models. Additionally, I have explored LangChain, LangGraph, Hugging Face, and OCR engines like Tesseract and EasyOCR. While I haven’t extensively worked with Slurm, I am keen to learn and adapt quickly while contributing in the project.

Any advice or resources to help me better understand and engage with the project would be highly valuable. Looking forward to your insights!

Thank You,
Aniruddha Roy

Hello everybody, thank you for your interest.

We will have more information on the Github page by Monday next week. We will list more specific goals, example codebases that can be brought together, evaluation tasks, etc. We will also indicate the rough direction for the project, to provide some guidance on the range of ideas you can yourself also propose to do.

Your proposal should then indicate in detail what you will do, how you will do it, why you are the right person to do it (given your background, skillset etc) and that you will be able to do it (feasibility).

Good luck, and thanks for the patience !

Hello @suresh.krishna, Buxin Liao, and Katarzyna Jurewicz,

I hope you’re doing well. I am Vaishvi Khandelwal, a second-year Information Technology student at VJTI Mumbai, deeply interested in AI, multimodal learning, and retrieval-based systems. I previously introduced myself via email regarding my interest in this project and potential approaches to contributing. I’m excited to now engage with the broader community.

My experience spans developing RAG pipelines with vector databases, modifying transformer architectures for vision-language tasks, and building AI-powered web applications. I have worked extensively with PyTorch and open-source vision models and am particularly intrigued by this project’s focus on goal-directed vision and saliency modeling. Understanding how human cognitive processes influence visual attention and integrating these insights into AI-driven systems is a challenge I’m eager to contribute to.

I have also been exploring different methodologies for structuring an open portal that consolidates ML models and eye-tracking datasets. I look forward to the GitHub details and any additional resources that could help me align more effectively with the project.

Thanks & Regards,
Vaishvi Khandelwal

Hello @suresh.krishna,
I would like to seek your guidance on understanding the goals of the project and would be grateful if you could suggest some resources or provide advice on how and where to get started.
I’m also following the github pages related to this project.
I appreciate you advice and thank you for your time and assistance.

Thank you,
Aniruddha

1 Like

please see the message above. there will be an update on monday.

Dear Katarzyna, Buxin, and Suresh,

I hope you’re doing well. I am Chinmaya, a final-year undergraduate student at NITK Surathkal, specializing in AI/ML and deep learning. I came across the ActiveVision project and found it highly aligned with my experience and interests. I would love to contribute as part of GSoC 2025.

I have worked extensively with PyTorch-based deep learning models, particularly in computer vision, medical imaging, and multimodal AI. My projects have involved segmentation, classification, and knowledge distillation using foundation models. Additionally, I have experience working with large datasets, vision-language models, and real-time inference. While I am new to Slurm and cluster-based computing, I am eager to learn if required.

Before submitting my proposal, I wanted to ask if there are any prerequisite tasks or areas I should explore to better understand the project’s scope. I have also attached my resume for your reference. Looking forward to your guidance!

With regards,
Chinmaya

1 Like

The ActiveVision Github page now has additional details. Please take a look there.

ActiveVision/README.md at main · m2b3/ActiveVision

1 Like

Hello @suresh.krishna, Buxin Liao, and Katarzyna Jurewicz,

I’ve developed an approach for the project and would like to share the details. Would it be alright if I sent you a personal message to elaborate?

Thanks,
Ishaan

Please use the GSoC INCF template to write it up as a tentative proposal and then send it to me by DM for feedback. @Ishaan_Chandak

2 Likes

Hey @suresh.krishna

I’m Abdallah Alkholy, an AI graduate specializing in Computer Vision, Deep Learning, and Machine Learning. I’m excited to contribute to GSoC 2025, particularly to the ActiveVision project.

I have experience working on computer vision, multimodal AI, and deep learning models. Some of my past work includes vision-based perception for autonomous robots, medical imaging AI, and optimizing deep learning models like CNNs and Transformers. I find the idea of integrating eye-tracking data with AI models for goal-directed vision fascinating and would love to contribute to this project.

I’ve gone through the ActiveVision GitHub page , and I had a couple of questions:

Is there a specific benchmark or dataset the project aims to work with?

Are there any existing preliminary models or baselines for goal-directed vision in this project?

Best,

Abdallah Alkholy

Email: abdallyalkhuoly@gmail.com

GitHub: Alkholy53 (Abdallah Alkholy) · GitHub

LinkedIn: https://linkedin.com/in/abdallah-alkholy-9b9a36181

The answers to these questions are on the Github page and in the messages above…

1 Like

Hello @suresh.krishna ,

I’m Karim Malawany, a computer vision and deep learning enthusiast with experience in building AI models using Python, PyTorch, and TensorFlow. I have well rounded experience in deep learning with a focus in computer vision, I have worked on multiple tasks including image classification, object detection and segmentation, I have published a paper on car damage inspection using deep learning. I have good knowledge in GANs and open-source vision and multimodal AI models like CLIP.

I am excited about this project because it connects with my interest in how AI models human perception and goal-directed vision. I am especially eager to explore multimodal AI, combining language and vision. This project is a great opportunity to use my deep learning skills to help build open platforms that improve research in AI and its real-world applications in visual attention and task-driven tasks.

resume

1 Like

Welcome. Please see the Github page and the messages above. All the best.

2 Likes

Hey Suresh Krishna , Buxin Liao and Katarzyna Jurewicz

I’m Madhan Aturu, a 2nd-year CS student, super excited about ActiveVision for GSoC 2025! → Here’s my idea:

→ Build a portal with:

  • Python + PyTorch for vision-language models (e.g., CLIP) → Integrate datasets for goal-directed vision (visual search, salience maps).
  • React frontend for easy access → Users upload models, run tests, see results (e.g., accuracy metrics, heatmaps).
  • **Multimodal AI ** → Let researcher input text (e.g., “find red ball”) → AI predicts eye movements, compares with real data and give results of accuracy.

→ My skills:

  • ML/AI(Python, TensorFlow , Scikit Learn , Pytorch , XGboost).
  • Web dev(React, HTML/CSS/JS ).
  • Slurm (Basics)

How should I prioritize vision models vs. language integration? Any specific datasets you’d recommend?

Best,
Madhan Aturu

Hey Mentors,

I hope you’re all doing well. I wanted to touch base regarding our project progress. I believe that if we ramp up our activity a bit, we can move closer to completing the integration of our deep learning forecasting models into Aeon.

Could we perhaps be a bit more active ? I’m eager to contribute further and ensure that we maintain our momentum.

Looking forward to your feedback.

Best regards,
Aturu Guru Madhan Reddy

Dear mentors,
I hope you’re doing well. My name is Mrunmayee, and I am a second-year Electronics and Communication Engineering student at IIIT Nagpur. I am highly interested in contributing to the ActiveVision project, and I would love to discuss how I can contribute effectively.

My Background & Relevant Experience

I have a strong foundation in Python and PyTorch, along with hands-on experience in signal processing, embedded systems, and AI-driven applications. Some of my relevant skills include:
Deep Learning & Computer Vision: Familiar with PyTorch, OpenCV, and multimodal AI models.
Time-Series & Signal Processing: Being a second year ECE student, I had relevant courses like
Hardware & Sensor-Based AI: Worked withArduino, ESP32, and biomedical sensors to collect and analyze real-world data, integrating insights into machine learning models.
Web Development: Comfortable with necessary WebDev skills like HTML,CSS,JS, Node, PHP.

Relevant Projects

Smart Mirror with AI Integration

Developed a Smart Mirror that displays real-time information such as weather, time, and news updates while also integrating voice recognition and AI-powered facial recognition. This project involved Raspberry Pi, Python, OpenCV, and web technologies, demonstrating my ability to work with computer vision, IoT, and web interfaces.

Sign Language Recognition Using Computer Vision

Built a real-time sign language recognition system using OpenCV and Python, which detects and translates hand gestures into text. This project enhanced my computer vision and deep learning skills, as I trained a CNN-based model using PyTorch to recognize different hand gestures accurately.

Image Differences Detection Using Computer Vision

Developed an application that detects differences between two images using OpenCV and Python. This project focused on image processing techniques, edge detection, and feature matching, strengthening my knowledge in computer vision and AI-based image analysis.

I am particularly excited about goal-directed vision and multimodal AI, as I am keen to explore how language-vision integration enhances saliency-based modeling. The challenge of building an open platform that consolidates datasets and models aligns with my interests in AI research and open-source development.I am eager to learn more about large-scale ML model integration, dataset curation, and validation pipelines. Additionally, I am comfortable with SLURM-based cluster computing and am excited to expand my experience in working with large-scale AI models in a research-driven environment.I am looking forward to learn many insightful things from the mentors and other peers while working on this project(if I do get selected) and get hands on experience in this field.I would love to contribute to this project and will be submitting my proposal soon. Looking forward to your feedback and any guidance you may have on the next steps.

Best regards,
Mrunmayee

you can send me a link to your proposal here via dm and i can give you feedback. please follow the incf gsoc template. and read the material posted here above, and also on github. good luck.

Dear mentors,

I’m Yu Jiang, a junior majoring in Data Science at Sichuan Agricultural University. My interests lie in AI, computer vision, and multimodal learning. You can learn more about my background here https://yujiangjulia.github.io/.

I’m very interested in Project #25: ActiveVision, especially its integration of vision-language models and real-world attention monitoring. I’ve started prototyping a simple demo to explore these ideas, which is available here:

Thank you for your time, and I look forward to contributing to this exciting project.

Best,
Yu Jiang