GSoC 2021 project idea 1.1: A Python toolbox for computing high-order information in neuroimaging

malin · January 28, 2021, 2:50pm

The functioning of complex systems (i.e. the brain, and many others) depends on the interaction between different units; crucially, the resulting dynamics is different from the sum of the dynamics of the parts. In order to deepen our understanding of these systems, we need to make sense of these interdependencies. Several tools and frameworks have been developed to look at different statistical dependencies among multivariate datasets. Among these, information theory offers a powerful and versatile framework; notably, it allows detecting high-order interactions that determine the joint informational role of a group of variables.

The goal of this project is to collect and refine existing metrics of high-order interactions currently implemented in Matlab or Java, for example

and integrate them in an unified python-based toolbox.

The main deliverable of the project is a toolbox, whose inputs are the measurements of different variables plus some parameters, and whose outputs are the measures of higher order interdependencies. Ideally, the toolbox will interface with visualisation & processing platforms of neuroimaging data, such as MNE — MNE 0.22.0 documentation fMRIPrep: A Robust Preprocessing Pipeline for fMRI Data — fmriprep version documentation and become a docker container too. A parallel project would focus on the visualization of these higher order measures.

Experience in statistics and neuroimaging data analysis is a plus, but not a requirement.

Lead mentor: @Daniele_Marinazzo, Ghent University; [GitHub]
Co-mentor: @f.rosas Fernando Rosas, Imperial College London; [GitHub]

Skills: neuroimaging, statistics, Python, Matlab, Java, MNE, fMRIprep

Athene-ai · February 4, 2021, 8:56am

very interesting project !

karthik · February 5, 2021, 5:58pm

@Daniele_Marinazzo , @f.rosas
Dear mentees,
This is Sama Sai Karthik. I am in my third year pursuing an integrated Bachelor’s and Master’s program (5 year program) in Computer Science at International Institute of Technology, Bangalore. I have a general interest in computational neuroscience, through which I got to know about the field of Non-Linear Dynamics. I found the subject really interesting and am currently taking a course in NLD at the institute. I also have experience of developing web apps using python and Java. I went through the description and the code mentioned in Github links. I want to contribute to this project, it would be really kind of you to help me get started. Thank you .

Daniele_Marinazzo · February 6, 2021, 6:19pm

Dear Karthik
thanks a lot for your interest!
The focus of this project is the neuroinformatics/implementation aspect, but of course the fact that you are interested in (nonlinear) dynamics and in neuroscience is a great thing.
For how I see the project, the starting point would be to start understanding the code in
GitHub - danielemarinazzo/dynamic_O_information: code for dynamic O information and
GitHub - brincolab/High-Order-interactions: High-Order interactions
and see if you can reproduce it with Java or Python. The JIDT toolbox already contains some implementations in Java.
On the other hand some estimation in Python of information theoretical quantities (including a Gaussian copula as used in the dynamic_O_information above) can be found here
GitHub - robince/partial-info-decomp: Partial Entropy Decomposition and Partial Information Decomposition with pointwise surprisal measures
GitHub - robince/pyentropy: Information Theoretic Tools for Python
GitHub - robince/gcmi: Functions for calculating mutual information and other information theoretic quantities using a parametric Gaussian copula..

@f.rosas, do you have other suggestions for the moment?

Cheers!

d.

Pranav_Mahajan · February 16, 2021, 7:09am

Dear @Daniele_Marinazzo, @f.rosas,

I am Pranav Mahajan, a final year undergraduate in electronics and communications engineering, with a background in cognitive and computational neuroscience, dynamical systems (nonlinear and chaotic too), signal processing and information theory.

A very interesting project indeed, I too am interested in contributing to this project (and collaborating) as it feels like a natural next step after having written a synchronization measures MATLAB toolbox recently (GitHub, Docs, Preprint) with focus time-series data from neuroscience and biological neural nets. It also had a few simpler info theoretic measures, I am looking forward to learning more about the measures mentioned above, specific to studying higher-order interactions. In addition, I do have some familiarity with MNE Python, while working at a CogNeuro lab and am looking forward to familiarizing myself with fMRIPrep while contributing to this project.

Thanks a lot for the GitHub repo links to get started, I’ll start with understanding their code and try to reproduce them in Python. Thanks!

karthik · February 16, 2021, 8:19am

@Pranav_Mahajan great to have company! even I have started to reproduce the script in python can we collaborate?

f.rosas · February 22, 2021, 9:17pm

Dear Pranav and Karthik

As Pranav mentioned, these measures are the natural high-order extension of more standard synchronisation measures which are capable of providing a much more detailed depiction of quasi-ordered metastable configurations. Hence it is really great to hear your share our enthusiasms about them!

Please let us if you have questions about the papers that Daniele mentioned above, or about anything else.

Best wishes,
F

darkdebo · February 28, 2021, 10:22am

Hola @Daniele_Marinazzo @f.rosas I am Debojyoti Chakraborty a cs undergrad prefinal year , I have experience in Deep Learning,Machine Learning and I see the first project to build a toolbox for neuroimaging quite interesting but I am open to any other dl related project where I can contribute in someway.

I found this project 26 Eye-tracker project.

f.rosas · March 5, 2021, 2:12am

Hi Debojyoti, thanks for the interest.
I wonder if DL techniques could be used to better estimate these information-theoretic quantities in high-dimensions. However, that would be far from trivial and at this stage I have no concrete ideas on how to go about that. In any case, building this toolbox could be a good first step to get familiarity with these tools.
F

Athene-ai · March 10, 2021, 11:43am

Dear @karthik, @f.rosas, @Daniele_Marinazzo and @darkdebo,

Sure !

I agree

Pranav_Mahajan · March 22, 2021, 11:40am

Dear @Daniele_Marinazzo, @f.rosas,

I read the paper on dynamic O-info shared by @Daniele_Marinazzo. I have a couple of doubts about it and a few general doubts regarding the proposal and project, it’d be very helpful if any of you could clarify!

In the paper, I understood O-information, but the following motivation for moving to dynamic O-information was not entirely clear -

However, in order to remove shared information due to common history and input signals, one should condition on the state vector of the target variable, thus leading to the definition of the dynamic O-information

It’d help if you could elaborate on the reason for removing shared information because that seems to be at the heart of extending its dynamic O-info. As in what are the advantages of distinguishing whether the info is from the exchange of common history, inputs etc? Any other advantages of this measure? I suppose this is more feasible than the PID measures discussed earlier in the paper.

Secondly, I am new to the concept of “synergy” and “redundancy”, I guess these are terms specific to high-order interactions. I have a qualitative understanding of it from equation 2 in the paper, as in when a new variable is added to the system if the variation in the total O-info is positive then it’s redundant or else it’s synergistic. Is the quantification of redundancy and synergy often just in terms of d\Omega_k and -d\Omega_k or is there more to that?

A few of my general doubts are -

The resources shared by you range from MI estimation (GCMI) and the usual info theory measures to PID measures and implementation of measures specific to high-order interactions (O-info, S-info and dO-info). Although things like GCMI are eventually used in dO-info, what all would the desired focus of the toolbox be, in terms of with and without stretch goals? As in I believe the O, S and dO-info are must-have measures and the measures like GCMI or KSG MI or transfer entropy etc would be some sort of optional helper functions? (since they are not specific to high-order interactions). And would the inclusion of PID be a stretch goal? I didn’t get a good understanding of PID measures, I’ll try to find a few more papers specific to that, or else if you could suggest a few that’d be great!
The dO-info paper uses a very specific dataset, it’d be useful to eventually test the python toolbox on that dataset. But since the goal is to other neuroscientists too (working with MNE or fMRI data etc), I was interested in knowing if you had any dataset or end-goal in mind (to work towards), testing on which could be a part of the documentation or make the toolbox complete etc?
I didn’t quite get a grasp of what kind of visualizations are usually used or are best suited for high-order interactions. Would it be similar to having some sort of edge weights on these brain networks, where each variable is a node in the network (ROI of the brain)?
To make a strong or compelling proposal, is it better to have a couple of measures already converted in python to show for in smaller repo by the time of application? Or would the plan just suffice and the implementation can be started post-selection etc?
Would it be fine to get a proposal draft reviewed from mentors before submission? If yes, what would be the desired mode of contact (you could share your email-ids perhaps?)

Apologies for the long post and thanks a lot for your time.
Sincerely,
Pranav

Daniele_Marinazzo · March 22, 2021, 1:10pm

Dear Pranav

thanks for the extensive set of questions, always stimulating, and of course my bad for not making it clear enough to a reader of the paper.

Two identical time series (A sending info to B with very strong coupling and no noise, or C sending info to A and B together at the same lag, etc) will have max mutual info and correlation. The transfer entropy will on the other hand go to zero, since we don’t need the driver to predict the target. This is why we propose to condition.

Syerngy and redundancy are ubiquitous yet elusive concepts. Here we go for a practical definition. The O-information allows to define them in terms of the sign of the multiplet indeed. The same applies when we condition. @f.rosas had an objection to calling them in the same way when he revised the paper, I guess this proves he had a point !

Pranav_Mahajan:

The resources shared by you range from MI estimation (GCMI) and the usual info theory measures to PID measures and implementation of measures specific to high-order interactions (O-info, S-info and dO-info). Although things like GCMI are eventually used in dO-info, what all would the desired focus of the toolbox be, in terms of with and without stretch goals? As in I believe the O, S and dO-info are must-have measures and the measures like GCMI or KSG MI or transfer entropy etc would be some sort of optional helper functions? (since they are not specific to high-order interactions). And would the inclusion of PID be a stretch goal? I didn’t get a good understanding of PID measures, I’ll try to find a few more papers specific to that, or else if you could suggest a few that’d be great!

So, here we should distinguis between the quantity and their estimators. TE is MI conditioned on the past, regardless of the estimators we use.
As far as the estimators are concerned, Gaussian Copulas are much more parsimonious than other estimators, so I would maybe start with that one in this project.
The PID framework allows to define synergy and redundancy together, and not as mutually exclusive quantities. You can read more about PID in this paper and the whole special issue it introduces, but the preprint by Mediano, @f.rosas et al. is also a straightforward way to bridge PID and (d-)O-info.

Yes indeed the toolbox is rather aimed to large scale noninvasive brain neuroimaging, and possibly in BIDS format. So any of the datasets in openneuro.org could be suitable. The datasets need to be processed, some of the datasets on openneuro.org are already processed, or can be seamlssly processed using BIDS-apps such as fMRIPREP (for fMRI data), even on the cloud with brainlife.io. M/(i)EEG data can be processed with MNE.

Yes (this is for the other project). Part of the challenge is to brainstorm on what we actually need. One could start from average quantities, but then a more comprehensive visualization would be welcome. Hypergraphs are a promising direction, see e.g. https://towardsdatascience.com/how-to-visualize-hypergraphs-with-python-and-networkx-the-easy-way-4fe7babdf9ae.

The main goal is collaborative science, above and beyond the GSoC program, so contributions by anyone at anytime are welcome. As far as GSoC is concerned, you are not required to work on the project before being selected, neither there’s an expectation for you to do so. When the time comes to choose a student for GSoC (which can be only one per project, unlike “standard” collaborators, in that case the more the merrier), then of course for otherwise equivalent candidates, we would probably select someone who is already into the flow, but again, no requirement in this direction.

Yes, you can send us emails (mine is easily googleable).

Hope this helps, please let us know if it’s clear enough. @f.rosas please pitch in and correct me if I missed/misstated something

f.rosas · March 22, 2021, 2:18pm

Hi!
Thanks a lot Pranav for all those insightful questions and Daniele for those great comments.

Let me just give my opinion about a small thing:

From my perspective, the O-info is not so strong in providing a definition to synergy and redundancy, but more focused on enabling practical explorations. In fact, the O-info builds on the formal definition of those terms from the Partial Information Decomposition (PID) literature ([1004.2515] Nonnegative Decomposition of Multivariate Information), and on reasonable intuitions. In particular, a negative O-information is a sufficient (but not necessary) condition for synergy to exist in a system - more precisely, it is a necessary and sufficient condition to a system to be "dominated’’ by synergy, ie for synergy to be its most important form of statistical interdependency.

Redundancy and synergy are a way of characterising interdependencies between three or more variables. Intuitively, redundancy means when that some information can be extracted for at least three variables, like there are multiple (more than two) copies of the same thing; synergy is about relationships that are the whole but not in the parts, like the XOR. Of course these intuitions can be operationalised in many different ways.

The important thing is that any PID-compatible operationalisation has to be compatible with the O-information. In that sense, the O-information doesn’t try to solve the big problem of fully defining synergy and redundancy, but provides a practical partial solution that is guaranteed to be compatible with any proper solution that can be built in the future

I hope this helps.
Fernando

Pranav_Mahajan · March 25, 2021, 4:44am

Dear @Daniele_Marinazzo, @f.rosas,

Thanks for your detailed replies! A lot of it is much more clear to me now, I’ll further go through the links and papers suggested by you and get back in case of any doubts.

Thanks again,
Pranav