GSoC 2024 Project Idea 12.1 Writing a R package for the computation of the O-information (350 h)

Real-world systems are often characterized by higher-order interactions (HOIs) within multiplets i.e. groups of three or more units (Battiston et al., 2021). In neuroscience, most pieces of evidence we have about brain networks come from the interactions between pairs of brain regions but little is known about what type of information remains hidden in the non-pairwise interactions (Luppi et al. 2023, Luppi et al. 2024). Interestingly, recent findings suggest that HOIs might be a better neural marker of neurodegeneration than standard pairwise approaches (Herzog et al., 2022).

Several methods have been proposed to estimate HOIs, from popular fields like graph- and information-theory. The O-information (short name for “information about Organisational structure”) is an information-theoretical quantity to characterize statistical interdependencies within multiplets of three and more variables (Rosas et al., 2019). It allows us to not only quantify how much information multiplets of brain regions are carrying but also informs us on the nature of the information i.e. whether multiplets are carrying mainly redundant or synergistic information.

Estimating HOIs is computationally intensive. As an example, a cortical parcellation dividing the brain into 80 distinct regions involves estimating HOIs in 80.000 triplets, in 1.5 million quadruplets, in 24 million quintuplets, etc. The computational burden of the O-information only relies on simple quantities like entropies, which makes the O-information an ideal candidate to estimate HOIs in a reasonable time. Still, there is yet no neuroinformatic gold standard to estimate HOIs, in a decent amount of time and accessible to network enthusiasts encompassing experts and non-experts.

Currently an R implementation is missing, limiting the adoption to a relevant part of the community, in particular colleagues working with behavioral data and psychometrics.

Project aims and tasks
This project aims at building a R package, missing at the moment, for the computation of this quantity.

We divided this project into five main tasks:

  1. Test current implementation in Matlab and Python
  2. Build R functions to compute the Total Correlation and the Dual Total correlation
  3. Implement and test statistical validation for the multiplets
  4. Data simulation: add a function to simulate HOIs
  5. Explore plotting solutions, in R or preparing the output for plotting with existing packages such as XGI
  6. Explore interfaces with other R packages used in psychometrics (https://lavaan.ugent.be/ http://psychonetrics.org/ CRAN - Package psychotools)
  7. Prepare a package to be submitted to CRAN

Ultimately, this project could lead to the establishment of a gold standard to go beyond pairwise interactions by measuring HOIs, accessible to R experts such as to users with little programming knowledge.

Skill level: Intermediate/advanced

Required skills: R, some Python

Time commitment: Full-time (350 h)

Lead mentor: Daniele Marinazzo (daniele.marinazzo@gmail.com)

Project website:

Backup mentors: Fernando E. Rosas (f.rosas@imperial.ac.uk), Pedro Martinez Mediano (p.mediano@imperial.ac.uk)

Tech keywords: R, Python

2 Likes

Dear mentors @daniele.marinazzo@gmail.com, @f.rosas@imperial.ac.uk, @p.mediano@imperial.ac.uk. I’m Mingcong. This is a really interesting project and I want to express my great interests in contributing to it. Could you please guide me on how I might get started? Thank you very much!

Best Regards,
Mingcong

Summary

This text will be hidden

Dear Mingcong
thanks a lot for your interest.
You can have a look at the paper introducing the measure [1902.11239] Quantifying High-order Interdependencies via Multivariate Extensions of the Mutual Information, and at current implementations
GitHub - danielemarinazzo/HOI: Retrieving high-order information multiplets from data using the O-information
GitHub - brainets/hoi: Higher-Order Interactions

1 Like

Dear @daniele.marinazzo@gmail.com

Thank you very much for your prompt response and sharing these interesting resources. I have started to familiarize myself with the materials you provided. I realized I hadn’t mentioned earlier that I am currently a psychology master student at Boston University, and I usually use R (sometimes Python or other softwares) in my research. Could you please advise on the next step or particular areas within the project that you would like me to focus on? Thank you very much!

You can explore current approaches used in R to compute entropy, and then look at the Gaussian entropy used here GitHub - robince/gcmi: Functions for calculating mutual information and other information theoretic quantities using a parametric Gaussian copula. (theory here Entropy of the Gaussian or in the paper linked to the repo, the Gaussian copula is not relevant in this case for now), and see if you can implement it in R.

Thank you very much for your guidance! I will delve into the analyses in R and will keep you updated on my progress.

Best regards,
Mingcong

Hello everyone!!! My name is Federico and I am a passionate computer science student with a fervent curiosity for scientific research and artificial intelligence. I am currently pursuing a double master’s degree in Computer Science at the University of Trento, Italy, and Eötvös Loránd University in Budapest, Hungary.

I would like to introduce why I want to actively participate in this Google Summer of Code project. First, I have always been interested in participating in open source projects, as I strongly believe in the importance of collaboration and free exchange of knowledge in the technology community. This is an excellent opportunity for me to contribute to a real project, one that has the potential to have a significant impact in the field of neuroscience and psychometrics.

The idea of going beyond pairwise interactions in the field of neural networks particularly fascinates me. It is a novel approach that can lead to a deeper understanding of how the human brain works and potentially to new treatments for neurodegenerative diseases.

In addition, this project offers me the opportunity to apply theoretical knowledge gained during my graduate studies in a practical and meaningful context. I am excited about the idea of working on computationally intensive problems and developing efficient solutions that can be used by a wide range of users, including experts and non-experts.

One of the main reasons I feel attracted to this project is the opportunity to be mentored by experts in the field. The prospect of learning from qualified professionals and working closely with them is extremely exciting for me. I am confident that this experience will allow me to grow both professionally and personally.

Here I attach a link to my CV: link.

Dear Federico

thanks a lot for your interest!
This platform is to exchange information on technical aspects of the project. Please feel free to ask questions in this direction, after looking at the material already shared.
When time comes to apply for a project, you can do so through the dedicated GSoC portal.
Kind regards

Dear Dr. Daniele,
I am interested in joining this project.
How can I join it?

Thanks

Dear Ahmed
Thanks for your interest.
you can browse the reading material and the existing repo pasted above in this thread.
If you want you can already start working on some code implementation.
Then, until April 2 you should formally submit your proposal in the Google Summer of Code portal.

Hi @Daniele_Marinazzo, @f.rosas, and @p.mediano,

I’m planning to apply for the O-information R package project (12.1). The fact that HOIs could be a better marker for neurodegeneration is exactly why I want to work on this. I currently build AI cognitive tools for dementia patients, so this clinical focus hits right at home for me.

To be completely upfront: I come from a heavier Python background, so my R is more functional than advanced right now. (I recently wrote the R code for a clinical paper on generative AI and weight trajectories, so I know the basics!). That said, I’m super motivated, I’ve successfully contributed to open-source projects before, and I’m ready to learn whatever it takes to build this properly.

I’ve already started poking around the frites and HOI Python repos to see how you’re handling Total Correlation. Quick question before I start structuring the R skeleton: given how intense the scaling is for these multiplets, are you open to using Rcpp to run the heavy combinatorial loops in C++, or do you want this to stay 100% pure R?

Also, how can I start helping right now? Would you be open to me porting over a small helper function from Python to R as a quick proof of concept?

Thanks!
Reem

Dear Reem
Thanks for your interest!
this project was proposed for the 2024 edition, we’re no longer mentoring it in this context.
Of course if you want to contribute to the general enterprise, this is great.
Our colleague Niels Van Santen has already put some things together. The sole publicly available repo for the moment is here GitHub - NielsVS0/Hitchiker-s-guide-to-info-measures-in-psychology: R code accompanying the manuscript "A Hitchhiker's guide to information theoretical measures in psychology", to be found here: https://osf.io/preprints/psyarxiv/vb5s6_v1..

Cheers!

1 Like

Oh, thanks for letting me know. It was listed for the 2026 project ideas so I was pretty interested. But thank you!