GSoC 2025 Project #37 EBrains :: Bayesian inference on neuroimaging data in Probabilistic Programming Languages (350h)

arnab1896 · March 16, 2025, 10:06pm

Mentors: Meysam Hashemi Meysam.hashemi@univ-amu.fr; Daniele Marinazzo < Daniele.Marinazzo@UGent.be>; Andrea Brovelli Andrea.Brovelli@univ-amu.fr

Skill level: In its easy version the project involves good knowledge of Python, Git, and JAX; in its difficult version, in which the contributor can perform the model inversion, knowledge of probabilistic programming in Numpyro/PyMC and toolbox for analysis of Bayesian models such as Arviz is needed.

Required skills: Python, JAX, and git required. Knowledge of (computational) neuroscience, BIDS format, Bayesian statistics,MCMC, and probabilistic programming such as Numpyro /PyMC is a plus.

Time commitment: Large (350h)

Forum for discussion

About: The contributor will create and test use cases, in the form of jupyter notebooks, for the application of Bayesian model inversion using state-of-the-art MCMC sampling from neuroimaging and/or neurophysiological (MEG/iEEG) data. Both models and data are hosted on EBRAINS.

Aims:

Use cases in form of notebooks.
Baseline goal: one use case. More advanced goals: more use cases, comparative tests of different models on same data, and/or of the same model on different data.
Milestones:
- reproduction of existing use cases
- changing model parameters
- optimize model inversion pipeline according to different data types (e.g., task-related versus resting state activity, neuroimaging versus neurophysiological data)
- adapting input and output to existing workflows

Website:

Tech keywords: Python, JupyterLab, neuroimaging, MCMC sampling

mhashemi · March 31, 2025, 1:08pm

Here a preliminary task could be to code a multivariate ornstein uhlenbeck process in Pymc/Numpyro,
and then estimate a sparse matrix SC (first low dim- and then for high-dim needs the sparse prior, eg., Lasso regression)

Forward model:

nn=number of region starting from 3 dims
Sigma_true = np.eye(nn)

y_true = np.random.multivariate_normal(mean=np.zeros(nn), cov=Sigma_true, size=nt)
for i in range(1, nt):
y_true[i, :] = y_true[i-1, :] @ SC_true + np.random.multivariate_normal(mean=np.zeros(nn), cov=Sigma_true)

and inverse problem to estimate SC:

from sklearn import linear_model

SC_est = np.zeros((nn, nn))
for j in range(nn):
X = y_true[:nt-1, :]
Y = y_true[1:, j]
clf = linear_model.Lasso(alpha = .1)
reg = clf.fit(X, Y)
SC_est[:, j] = reg.coef_

elenaajayi · April 6, 2025, 10:31pm

@mhashemi, @Daniele_Marinazzo,

Hi Dr. Hashemi,

My name is Elena Ajayi, and I’m a Master’s student in Data Science at St John’s University in NY with a background in Biomedical Sciences. I’ve worked extensively with MATLAB on image processing and feature extraction, one of those projects even led to a poster presentation at the Vision Sciences Society Conference, 2022. More recently, I have been interested in probabilistic programming with Python, and I am eager to build stronger skills in that area.

Right now, I’m finalizing up my proposal for the Bayesian inference project. It’s a great fit for me, especially since it connects my interests in biomedical sciences and computational neuroscience. I’m particularly interested in how advanced Bayesian methods can be applied to fMRI data to better understand brain activity, which I believe is something that could ultimately help improve both research outcomes and clinical care.

I’m also interested in the broader implications of Bayesian inference for AI safety. I believe having models that are well-calibrated and interpretable is essential when applying AI in healthcare.

For the preliminary task, my plan is to simulate synthetic time-series data based on a known connectivity matrix, then introduce noise. I’ll use Lasso regression to try and recover the original matrix, mostly as a sanity check for the pipeline before moving into the more complex modeling.

I had a few questions to help refine this setup:

Are there any recommended values or levels of sparsity for the connectivity matrix that would better reflect realistic brain networks?
What would be a reasonable number of time steps (nt) to start with, so that the simulation stays both stable and computationally manageable?
Once I’ve validated the Lasso step, would it make sense to move straight into a full Bayesian model using something like NumPyro or PyMC?

Thank you very much for your guidance as I finalize my proposal (due April 8). I would really appreciate your input, and I look forward to your suggestions.

Elena Ajayi
elenaajayi@outlook.com

mhashemi · April 7, 2025, 2:15pm

Hi,
Thanks for your interest.
The posterior predictive checks and fully Bayesian information criteria are recommended for comparison and evaluation of sparsity. The approach is Markovian, and often the minimum 200 warmup and 200 samples are required. The Lasso is an example and the aim is to make this probabilistic using NumPyro or PyMC and sparse priors (Cauchy/Laplace/Horseshoe).

elenaajayi · April 7, 2025, 11:15pm

@mhashemi, @Daniele_Marinazzo Hi Dr. Hashemi,

Thanks so much for your detailed feedback on the way I plan to approach the prelim task. I’ve updated my proposal to include posterior predictive checks and fully Bayesian information criteria for evaluating sparsity using a baseline of 200 warmup and 200 sampling iterations. My approach now moves from a Lasso-based example to a fully probabilistic framework using NumPyro/PyMC with sparse priors (Cauchy, Laplace, and Horseshoe). I am also planning to validate the methods on synthetic datasets. For example, I may use simulated time-series data generated from an Erdős-Rényi connectivity matrix. This model is based on randomly connecting nodes in a network with a fixed probability, and it provides a simple way to set a known ground truth for connectivity. An overview of the model can be found here: Erdős–Rényi model - Wikipedia.

I have also set aside time to get familiar with JAX and PyMC through tutorials and small projects. Would it be okay if I sent over my full proposal for further review before the deadline tomorrow, April 8th? I would really appreciate any additional insights to ensure it covers everything clearly.

Best,
Elena Ajayi

mhashemi · April 8, 2025, 10:38am

thanks! a clear structure and timeline is recommended!

samkitshah1262 · April 8, 2025, 12:05pm

Dear Dr. @mhashemi ,

My name is Samkit Shah, and I’m a recent Computer Science graduate from IIT Jodhpur. I currently work at Warner Bros. Discovery as a Software and Machine Learning Engineer, where I co-lead the development of distributed ML services for intelligent work order management and vendor recommendation. I’ve worked extensively with real-time orchestration systems, Cadence-Temporal, and Bayesian time series models to improve both cost and latency across production workflows.

What excites me about this project is the opportunity to bring my background in scalable backend systems and probabilistic ML into the domain of computational neuroscience. I’m particularly interested in how methods like Bayesian inference and sparse modeling can be used to understand brain connectivity — especially when integrated with modern probabilistic frameworks like NumPyro or PyMC.

I have experience with studying Markov equivalent graphs and their scoring methodologies. Moreover I have a strong foundation in probability and statistics.

I have already completed the preliminary task and will be sharing with you the proposal for a review shortly along with the task code.

A fundamental questions I wanted to ask was:
What could be the effective use cases for the model inversion that aligns with work progressing at EBrains.
In order to avoid any redundancy in the proposal.

Best regards,
Samkit Shah

mhashemi · April 8, 2025, 2:28pm

Hi,
Thanks! So the connectivity between brain regions are estimated using tractopgraohy method, which for human is symmetric and no direction. The effective connectivity rather inform us about the causality and direction. Please see this paper.

samkitshah1262 · April 8, 2025, 2:30pm

Is it advisable to include the aforementioned use case, along with similar (if any) in the proposal ?

elenaajayi · April 8, 2025, 3:50pm

@mhashemi, I sent over the proposal on GSoC. Thanks!