Mentors: Katarzyna Jurewicz <Katarzyna.jurewicz@mcgill.ca>, Buxin Liao <buxin.liao@mail.mcgill.ca>, Suresh Krishna <suresh.krishna@mcgill.ca>
Skill level: Advanced
Required Skills: Fluency in Python and PyTorch. Familiarity with open-source vision and multimodal AI models. Familiarity with Slurm and working with clusters preferred. Basic web-development skills or interest in learning them will be useful.
Time commitment: Full time (350 hours)
Forum for discussion
About: Salience map research in computer vision has extensively examined where human observers look in images and videos during free viewing. Despite cognitive psychology recognizing the role of behavioral goals for over 50 years, integrating task dependence into quantitative models and large open datasets is a recent development.
Aims: This project aims to create an open portal that consolidates existing machine learning/AI models and eye-tracking datasets related to goal-directed vision (e.g., visual search) while providing tools for model testing and validation. A key focus will be on multimodal AI, particularly language-vision integration. Additionally, this platform will serve as a prototype for similar data+model initiatives on public hardware platforms.
Project website: m2b3/ActiveVision
Tech keywords: Python, PyTorch, Visual search, Saliency, Science portals, Vision AI, Vision-language models.
Dear @suresh.krishna, Buxin Liao and Katarzyna Jurewicz,
I hope you are doing well.
My name is Ishaan Chandak, and I am a final-year Information Technology student at VJTI Mumbai. I am deeply passionate about exploring the intersection of deep learning and biology.
I’m excited about the opportunity to contribute to this project, and I’ve been exploring different approaches and research directions to better understand how multimodal systems can effectively predict and evaluate task-driven visual attention. Also, I’ve been exploring some existing visual search datasets and evaluation metrics commonly used in saliency research. This has given me a broader perspective on how saliency models can be evaluated. I look forward to learning from you and understanding how I can meaningfully support the ongoing work. Along the way, I’ve been documenting the research and my learnings to better structure my understanding and keep track of useful insights.
I would be happy to contribute to the project and complete whatever starting tasks will allow me to become more familiar with it — please do let me know if there are some relevant tasks for me.
I am also attaching my resume here for your reference.
Best regards,
Ishaan Chandak
Hello @suresh.krishna , Katarzyna jurewicz and Buxin Liao,
I hope you’re doing well. My name is Aniruddha Roy, and I’m a machine learning enthusiast eager to learn more about this project. However, I’m unsure how to get started or contribute effectively, and I would greatly appreciate your guidance.
With hands-on experience in Python, PyTorch, and computer vision, I have worked on projects involving CNNs, U-Net architectures, and vision-language models. Additionally, I have explored LangChain, LangGraph, Hugging Face, and OCR engines like Tesseract and EasyOCR. While I haven’t extensively worked with Slurm, I am keen to learn and adapt quickly while contributing in the project.
Any advice or resources to help me better understand and engage with the project would be highly valuable. Looking forward to your insights!
Thank You,
Aniruddha Roy
Hello everybody, thank you for your interest.
We will have more information on the Github page by Monday next week. We will list more specific goals, example codebases that can be brought together, evaluation tasks, etc. We will also indicate the rough direction for the project, to provide some guidance on the range of ideas you can yourself also propose to do.
Your proposal should then indicate in detail what you will do, how you will do it, why you are the right person to do it (given your background, skillset etc) and that you will be able to do it (feasibility).
Good luck, and thanks for the patience !
Hello @suresh.krishna, Buxin Liao, and Katarzyna Jurewicz,
I hope you’re doing well. I am Vaishvi Khandelwal, a second-year Information Technology student at VJTI Mumbai, deeply interested in AI, multimodal learning, and retrieval-based systems. I previously introduced myself via email regarding my interest in this project and potential approaches to contributing. I’m excited to now engage with the broader community.
My experience spans developing RAG pipelines with vector databases, modifying transformer architectures for vision-language tasks, and building AI-powered web applications. I have worked extensively with PyTorch and open-source vision models and am particularly intrigued by this project’s focus on goal-directed vision and saliency modeling. Understanding how human cognitive processes influence visual attention and integrating these insights into AI-driven systems is a challenge I’m eager to contribute to.
I have also been exploring different methodologies for structuring an open portal that consolidates ML models and eye-tracking datasets. I look forward to the GitHub details and any additional resources that could help me align more effectively with the project.
Thanks & Regards,
Vaishvi Khandelwal
Hello @suresh.krishna,
I would like to seek your guidance on understanding the goals of the project and would be grateful if you could suggest some resources or provide advice on how and where to get started.
I’m also following the github pages related to this project.
I appreciate you advice and thank you for your time and assistance.
Thank you,
Aniruddha
1 Like
please see the message above. there will be an update on monday.
Dear Katarzyna, Buxin, and Suresh,
I hope you’re doing well. I am Chinmaya, a final-year undergraduate student at NITK Surathkal, specializing in AI/ML and deep learning. I came across the ActiveVision project and found it highly aligned with my experience and interests. I would love to contribute as part of GSoC 2025.
I have worked extensively with PyTorch-based deep learning models, particularly in computer vision, medical imaging, and multimodal AI. My projects have involved segmentation, classification, and knowledge distillation using foundation models. Additionally, I have experience working with large datasets, vision-language models, and real-time inference. While I am new to Slurm and cluster-based computing, I am eager to learn if required.
Before submitting my proposal, I wanted to ask if there are any prerequisite tasks or areas I should explore to better understand the project’s scope. I have also attached my resume for your reference. Looking forward to your guidance!
With regards,
Chinmaya
1 Like
The ActiveVision Github page now has additional details. Please take a look there.
ActiveVision/README.md at main · m2b3/ActiveVision
1 Like
Hello @suresh.krishna, Buxin Liao, and Katarzyna Jurewicz,
I’ve developed an approach for the project and would like to share the details. Would it be alright if I sent you a personal message to elaborate?
Thanks,
Ishaan
Please use the GSoC INCF template to write it up as a tentative proposal and then send it to me by DM for feedback. @Ishaan_Chandak
1 Like
Hey @suresh.krishna
I’m Abdallah Alkholy, an AI graduate specializing in Computer Vision, Deep Learning, and Machine Learning. I’m excited to contribute to GSoC 2025, particularly to the ActiveVision project.
I have experience working on computer vision, multimodal AI, and deep learning models. Some of my past work includes vision-based perception for autonomous robots, medical imaging AI, and optimizing deep learning models like CNNs and Transformers. I find the idea of integrating eye-tracking data with AI models for goal-directed vision fascinating and would love to contribute to this project.
I’ve gone through the ActiveVision GitHub page , and I had a couple of questions:
Is there a specific benchmark or dataset the project aims to work with?
Are there any existing preliminary models or baselines for goal-directed vision in this project?
Best,
Abdallah Alkholy
Email: abdallyalkhuoly@gmail.com
GitHub: Alkholy53 (Abdallah Alkholy) · GitHub
LinkedIn: https://linkedin.com/in/abdallah-alkholy-9b9a36181
The answers to these questions are on the Github page and in the messages above…