GSoC 2023 Project Idea 5.1 Optimization of the computations of higher-order interactions and integration in the Frites python package (175 h)

Real-world systems are often characterized by higher-order interactions (HOIs) within multiplets i.e. groups of three or more units (Battiston et al., 2021). In neuroscience, most pieces of evidence we have about brain networks come from the interactions between pairs of brain regions but little is known about what type of information remains hidden in the non-pairwise interactions. Interestingly, recent findings suggest that HOIs might be a better neural marker of neurodegeneration than standard pairwise approaches (Herzog et al., 2022).

Several methods have been proposed to estimate HOIs, from popular fields like graph- and information-theory. The O-information (short name for “information about Organisational structure”) is an information-theoretical quantity to characterize statistical interdependencies within multiplets of three and more variables (Rosas et al., 2019). It allows us to not only quantify how much information multiplets of brain regions are carrying but also informs us on the nature of the information i.e. whether multiplets are carrying mainly redundant or synergistic information.

Estimating HOIs is computationally intensive. As an example, a cortical parcellation dividing the brain into 80 distinct regions involves estimating HOIs in 80.000 triplets, in 1.5 million quadruplets, in 24 million quintuplets, etc. The computational burden of the O-information only relies on simple quantities like entropies, which makes the O-information an ideal candidate to estimate HOIs in a reasonable time. Still, there is yet no neuroinformatic gold standard to estimate HOIs, in a decent amount of time and accessible to network enthusiasts encompassing experts and non-experts.

Project aims and tasks

This project aims at optimizing the computations of dynamic HOIs. Ultimately, we want to be able to estimate the O-information on both simulated data and real brain data, possibly with high spatiotemporal resolution. To this end, we will start from an existing implementation we made during the BrainHack 2021 and 2022. We made two implementations of the HOIs: a first version using the standard NumPy library (Harris et al., 2020) that we compared with a second implementation using a more modern library called Jax for accelerated linear algebra on both CPU and GPU. Finally, we will integrate the developments into the open-source toolbox called Frites which currently only supports pairwise interactions.

We divided this project into five main tasks:

  1. Optimize the low-level HOIs computations (70 hours): find ways to decrease computing time while keeping reasonable memory usage. Some ideas include faster entropy calculations, avoiding recomputing some quantities, parallel computations, etc.

  2. Merge implementations into a single one (15 hours): merge the NumPy and Jax into a single function called “conn_hoi” and allow users to use the NumPy one without needing to install the Jax library (i.e. minimize requirements)

  3. Data simulation (20 hours): add a function “simulate_hoi” to simulate HOIs

  4. HOIs plotting (10 hours): create a function “plot_hoi” to plot the output of the “conn_hoi”. We will consider using either XGI or HyperNetX

  5. Integrate the HOIs into the Frites software (60 hours): create a pull request (PR) to integrate the “conn_hoi” inside Frites. The PR will have to follow Frites’ formatting including input types and coding quality (pep8, flake8). We will also make an online comprehensive documentation accessible to non-experts, add unit tests and illustrative examples.

Ultimately, this project could lead to the establishment of a gold standard to go beyond pairwise interactions by measuring HOIs, accessible to Python experts such as to users with little programming knowledge.

Skill level: Intermediate/advanced

Required skills: Python

Time commitment: Half-time (175 h)

Lead mentor: Daniele Marinazzo, Etienne Combrisson

Project website: GitHub - brainets/frites: Framework for Information Theoretical analysis of Electrophysiological data and Statistics

Backup mentors: Andrea Brovelli

Tech keywords: Python, HOI

1 Like

@Daniele_Marinazzo

Hello Mentor, I am Pooja Saini from the civil department, IIT Bombay. I got to know that INCF contributed in GSOC this year.I would like to contribute to the project “5.1 Optimization of the computations of higher-order interactions and integration in the Frites python package” in your organization. Can you please guide me and suggest resources that help in contributing to project

With regards,
Pooja saini

Hi @Pooja_Saini , nice to hear from you :slight_smile:
As mentioned in the project idea itself, there are quite a few links and resources shared (like the github link and other library links). Please go through them and try to come up with an idea of how “you” will implement the project. In the meantime, please give Daniele and the other mentors some time to reply.
Happy to help in case of more queries.

Also, please remember, that the more pointed and specific queries you come up with, the better mentors will be able to give you feedback

Thanks

1 Like

Hi @Daniele_Marinazzo @EtienneCmb ,

I’m very interested in the GSoC 2023 Project Idea 5.1 regarding the optimization of the computations of higher-order interactions and integration in the Frites python package. Can you provide more details on the current state of the project and the specific optimization techniques that will be used to improve the computations? Also, how will the project’s success be measured, and what skills or experiences are required for a successful contribution to this project? I’m eager to learn more and potentially contribute to this exciting project!

Hi @glunkad,

Can you provide more details on the current state of the project and the specific optimization techniques that will be used to improve the computations?

Here’s the link to the Github repo implementing the O-info. You’ll see inside that there are two implementations: one using NumPy and a second using Jax. We are really interested in optimizing the Jax one as it should work on both C/GPU. We also included a few essential scientific papers describing the math behind the O-info. As a first optimization step, we removed some loops by using a tensor-based implementation of the entropy. Additional ideas for optimizing the code:

  • Go back to a more simple vector-based entropy and use jax.vmap instead?
  • Some entropies are computed several times. Could we cache them without increasing too much the memory requirements?
  • @Daniele_Marinazzo has a Matlab implementation of the O-info with additional features like selecting the most relevant multiplets using bootstrapping. This is something that is missing in the current implementation

Also, how will the project’s success be measured, and what skills or experiences are required for a successful contribution to this project?

For the required skills/experiences, it depends. The optimization requires advanced coding and linear algebra skills. Intermediate skills are required to write unit tests, improve the documentation, or find ways for plotting the results. Good question for the success of the project. If we have everything to include this O-info function (i.e. unit test, doc, a switch NumPy/jax without forcing Jax as a fixed requirement of the package) inside the Frites package, it would be a very successful project.

Thank you for your interest in the project,

Dear @Pooja_Saini

thanks for your interest!

this answer GSoC 2023 Project Idea 5.1 Optimization of the computations of higher-order interactions and integration in the Frites python package (175 h) - #7 by EtienneCmb is relevant to you as well.

Kind regards

Thanks @EtienneCmb and @Daniele_Marinazzo for your valuable input. I’ll apply it to my proposal draft. Do you have any other guidelines for me to keep in mind or a specific format for the proposal? I want to ensure I deliver a well-structured and thorough proposal.
Can you suggest any starter issues that I can work on to get started and eventually learn more about the codebase? I want to make sure I get off to a good start.

Hello @EtienneCmb @Daniele_Marinazzo
I am Dishie, a computer engineering student looking forward to contribute to INCF this year. I found this project idea quite interesting to work on and, equally important as HOIs have been proved to give better results of cortical dynamics.

I am currently going through the papers given on this repo.

I would love to contribute to issues and do tasks required to get started on this project. Please let me know if there is something relevant for me.

Thank you!

Dear Mentors(@EtienneCmb , @Daniele_Marinazzo)
After carefully reviewing the project description, I am confident that my skills and background align with the project’s requirements. I am excited about the opportunity to work on this project and contribute to its success.
I am eager to learn more about the project and how I can contribute to its success. Please let me know if there are any additional details or requirements that I should be aware of, or if you have any questions for me.

Hello @Daniele_Marinazzo mentor ,

I’m passionate about Data Science and I want to contribute to Project Idea 5.1 Optimization of the computations of higher-order interactions and integration in the Frites python package, it’s interesting.

If I got chance with work in this organisation it will be great chance to me to work with mentors and make project with them. Basically I can gain much knowledge from mentor’s and about open source project which will be helpful for me.

So I want some basic information about how to start with this project ?
Please let me know how to proceed or how to begin initially.

I hope to hear from you soon and you will guide me.

Warm regards,
Vaishnavi Bhushan

Dear @glunkad, @dishie,

Many thanks to all of you for your interest in this project.

Here are the steps to start working on this project :

  1. Be familiar with the theory, especially Rosas’ paper about the O-info.
  2. The code implements the equation (4) of Rosas’ paper

(1/2)

@SohamSangole and @Vaishnavi,

  1. Take a look at the main, there’s a working example to introduce redundant and synergistic interactions within multiplets
  2. We have two implementations, one pure NumPy-based and a second one using the more recent Jax, especially to perform the computations on GPU. You’ll see that the Jax one has several important differences. Take a look at the Jax library, it’s very cool.
  3. We identified 5 aims for this project. Let me know if you prefer to work on the low-level optimization (1), on a NumPy/jax switch (2), on a clear function to simulate data (3) or on plotting (4).

(2/2)

Best,

@glunkad,

Please find here some materials and relevant information to prepare your proposal.

Best,

1 Like

@EtienneCmb
I have already installed the library and have been exploring its capabilities extensively. I am very excited about the potential that JAX offers and believe that it will be an excellent tool for our project.
I want to thank you for introducing me to this library and for all the guidance and support that you have provided thus far. I truly appreciate the opportunity to work with such a knowledgeable and supportive mentor.

1 Like

Thanks @EtienneCmb for providing clear steps to start working on this project. It’s great to see that there are multiple implementations available and the Jax library is being utilized. It’s exciting to have the opportunity to choose from a variety of aims, including low-level optimization, NumPy/Jax switch, simulating data, and plotting. I appreciate your thorough guidance and look forward to getting started on this project.

Hi @EtienneCmb ,
I prefer to work on Numpy/Jax switch.Can you provide some steps to get started?

Hello @EtienneCmb,
Thank you for the step-by-step instructions you provided to begin with the project’s foundation. I’ve been going through the resources and papers provided.

After reading Rosas’ paper, I was looking through the code for o-info when I came across a step that I wasn’t certain I understood. In line no. 105 of the code:

if not isinstance(maxsize, int):
maxsize = n_roi

Why do we set the length of the roi (n_roi) as the maximum size of multiplets if the maxsize is not an integer? Why not use math.floor(maxsize) instead? Wouldn’t n_roi yield a multiplet with the same size as the roi array?
I was hoping you could provide some light on the subject.

Hi @glunkad, very good, thanks for your interest!

Hi @dishie, I’m glad you already started reading the code. The largest multiplet you can investigate can not be larger than the number of brain regions. For example, if you’ve four regions, the largest multiplet is a quadruplet. About why if not isinstance(maxsize, int): maxsize = n_roi, imaging someone put maxsize=None. Otherwise, we could test the type of maxsize and raise an error if it’s not an int.

Okay, got it. Thanks for clearing it up!

1 Like