Best practices for GPU-accelerated transfer entropy analysis of neuronal spike trains

I would like to share and ask for feedback on a MATLAB/CUDA implementation for GPU-accelerated transfer entropy and sorted local transfer entropy analysis of neuronal spike-train data.

The code estimates directed neuron-neuron interactions while considering a range of delays. It was used in the following study:

Kajiwara, M., Nomura, R., Goetze, F., Kawabata, M., Isomura, Y., Akutsu, T., & Shimono, M. (2021). Inhibitory neurons exhibit high controlling ability in the cortical microconnectome. PLOS Computational Biology, 17(4), e1008846.

Code:

Related network quantification code:

Related whole-brain cortical microconnectome study:
Matsuda, K., Shirakami, A., Nakajima, R., Akutsu, T., & Shimono, M. (2023). Whole-Brain Evaluation of Cortical Microconnectomes. eNeuro, 10(10), ENEURO.0094-23.2023.

I would be interested in feedback on the following points:

  1. What are current best practices for estimating directed interactions from large-scale neuronal spike-train data?
  2. How should transfer entropy-based methods be benchmarked against alternative connectivity estimation methods?
  3. Are there recommended public datasets for validating excitatory/inhibitory interaction estimation?
  4. What would be useful additions to make this MATLAB/CUDA implementation easier to reuse?

Any suggestions or related references would be appreciated.