GSoC 2025 Project #8 Brian Simulator :: Replace Brian's just-in-time compilation mechanism (350h)

Mentors: Marcel Stimberg <marcel.stimberg@inserm.fr>, Dan Goodman <d.goodman@imperial.ac.uk>, Benjamin Evans <B.D.Evans@sussex.ac.uk>

Skill level: Intermediate/Advanced

Required skills: Python, C++

Time commitment: Full time (350 hours)

Forum for discussion

About: Brian is a clock-driven spiking neural network simulator that is easy to learn, highly flexible, and simple to extend. Written in Python, it allows users to describe and run arbitrary neural and synaptic models without needing to write code in any other programming language. It is built on a code-generation framework that transforms model descriptions into efficient low-level code. In Brianā€™s ā€œruntime modeā€, the generated code can interact with Python code by accessing the same data structures in memory. To make this work seamlessly, Brian makes use of Cython. This approach comes with two major disadvantages: 1) Cython compilation is slow (it generates a lot of code for error checking, etc.). For Cython, this is not a big downside, since it is commonly used to compile libraries once, but for Brian it matters since it needs to compile dynamically generated code frequently 2) We need to maintain a third code generation target besides Python and C++, with small but non-trivial differences to both of them.

Aims: The aims of this project are to:

  • Replace the Python data structures that are currently used from within Cython code (dynamic arrays and the ā€œspike queueā€) by C++ equivalents
  • Research solutions to call the compiled C++ code from Python and make it directly access the memory in the shared data structures storing the simulation variables. This could build upon existing just-in-time compilation technologies such as numba, or a package such as scipy-weave.
  • Implement the above solution, and refactor the current code to make use of it

Project website: GitHub - brian-team/brian2: Brian is a free, open source simulator for spiking neural networks.

Tech keywords: Python, C++, compilation, JIT

Hello!
I love this project!
Iā€™m really love computational neuroscience and want to build my career in this space. Iā€™ve been working with NeuroML and have experience with spiking neural networks, particularly in modeling and simulation. Given my background, Iā€™d love to contribute to this project. What areas should I focus on to best prepare myself? Would a deeper dive into JIT compilation techniques (maybe Numba?) or a stronger understanding of C++ memory management for neural simulations be most beneficial? What do you suggest?

Dear @namita-ach, many thanks for your interest in the project. This project is a bit different from the others we propose, in that there will be a significant ā€œexploratoryā€ component where we figure out what the best approach will be. Therefore, in addition to getting an idea of how Brian works, it would be good to get an overview of existing solutions to interface C/C++ and Python (Cython, ctypes, cffi, ā€¦). Regarding the way Brianā€™s runtime code generation currently works with Cython, I invite you to run the code posted below. It will simulate a (very simplified and not very interesting) neuron model and record its activity over time. You can find its generated code in the directory that gets printed out. Hopefully that should give you a bit of an idea of what the generated code looks like at the moment, and how it links between Python and Cython/C++. Please include a (not too detailed) ā€œwalkthroughā€ through these files, and in particular how the final C++ code accesses memory that was allocated on the Python side, in your application (in case you decide to apply, of course :wink: )

Donā€™t hesitate to ask any questions you will most likely have about this.

from brian2 import *
from brian2.codegen.runtime.cython_rt.extension_manager import get_cython_cache_dir

try:
    clear_cache("cython")  # Just to make sure that we start with a clean slate
except Exception:
    pass  # there might be nothing to clear

prefs.codegen.target = "cython"
prefs.codegen.runtime.cython.delete_source_files = False

# A not-so-interesting example of a neuron simulation
neurons = NeuronGroup(10, "dv/dt = -v / (10*ms) : 1", method="exact")
neurons.v = "rand()"
state_monitor = StateMonitor(neurons, "v", record=True)

run(25*ms)

print("Generated code in:", get_cython_cache_dir())

# Just to show that the simulation worked, not generating any Cython code
plt.plot(state_monitor.t/ms, state_monitor.v.T)
plt.show()

Please also see our general guidelines for GSoC applications here: GSoC 2025 | The Brian spiking neural network simulator

Thank you, @mstimberg, for all the details!
I hope all the Brain projects get selected by Google. But if Brain gets only one slot, which project would be your top priority out of the 3?

Hello @mstimberg, I am Karthik Sathish, a junior year student from IIT Roorkee. I would like to work on this project. This would be my work flow for the next 2 days:
Understand the current Just in Time architecture that Brain follows (I want to know which part of the architecture is making it run slow.)
Once I figure out the actual problem, I will look for the possible alternatives.
If I face any problem in the journey, I will write it down here.

Indeed, sorry about that! I edited my post to fix the link.

We donā€™t have a priority project as such. If we only get one slot, weā€™d select the participant/project combination that we estimate to have the highest chance of achieving the aims of the project.

Hi @mstimberg sir,
Iā€™m Sagar, I found this project exiciting! I am happy to work on it and would love if I can add any value to make Brian faster and better.

I have good C++ skills and some python experience, though I am still learning cython. I studied some of Brianā€™s code and understood parts of it, like how cython slows things down with extra checks and why switching to C++ for the heavy stuff (like neuron data) makes sense. Some parts were tricky for me, but I am starting to figure them out. I am trying to find a way to connect python and C++ smoothly (maybe with Numba) and looking for blocks that I can rewrite in C++ as of now. Please let me know if there is anything that I can improve.

Hi @Mavericks,
Please have a look at the example I posted earlier in this thread, working through it should make clearer (hopefullyā€¦). I wouldnā€™t try to rewrite things in C++ as of now ā€“ what is the most important thing for this project is a simple/lightweight way of exchanging data between Python and a compiled C++ extension. For a relatively simple data structure, e.g. a dynamic array (which could be just a class wrapping a std::vector in C++), how can we declare this data structure together with a Python wrapper (which could be written in Cython) and use it both from Python and from the compiled C++ code, both accessing the same underlying memory. Not sure whether this is clear ā€“ please let me know if not :blush:

Actually, we are not that much worried about the slow speed of execution in Cython (despite the additional error checks, it is not much slower than ā€œpureā€ C++), but about the speed of compilation. If you look at the C++ code that is generated from the Cython code, you can see that it is huge compared to what it would look like if you wrote it in C++.

Hi @mstimberg ,

Iā€™m Jiayi, an undergrad at UIUC studying Computer Science and Statistics. Iā€™ve been exploring the Brian JIT compilation replacement project and really appreciate the helpful context youā€™ve provided in the discussion thread! :slightly_smiling_face:

Iā€™ve started running and digging into how Brian currently manages runtime codegen and memory access, and Iā€™m trying to align my early experiments with the broader project goals. I had a few higher-level questions as I prepare my proposal:

ā€¢ Are there particular pain points or complexities with the current Python-based ā€œspike queueā€ or dynamic arrays that we should prioritize when replacing them with C++ equivalents?

ā€¢ In terms of memory access between compiled C++ code and Python, would you prefer strategies like the buffer protocol, raw pointers, or another approach to minimize copies and overhead?

ā€¢ For the JIT backend, do you see libraries like Numba or LLVM-based approaches (e.g. llvmlite) as feasible directions, or would a simpler mechanism (like direct shared object loading) be preferred?

Thanks again for your guidance and for maintaining such an exciting project. Iā€™m very excited to keep exploring this and share a small working prototype soon. :grinning:

Hi @Jiayi_Qian. Good questions! Here are some quick rough answers:

Note that we already have a C++ equivalent of both: the C++ dynamic array is used in C++ standalone mode, and the C++ spike queue is used even in Python mode, since we wrap it into a Python object via Cython. The main ā€œpain pointā€ in runtime mode for the spike queue is that we generate Cython/C++ code that uses the Python API to call a method on the Python object which wraps the C++ object, instead of interfacing with the C++ code directly. For the dynamic array, it is similar, but we use a pure Python implementation (which uses numpy under the hood which of course calls out to compiled C/Fortran code itself).

I donā€™t know to be honest ā€“ I was thinking of passing around raw pointers, but maybe there is a better solution.

I think numba is too limited for us, but potentially something LLVM-based could work ā€“ I do not know enough about it to judge, though. For now, I had direct shared object loading in mind (comparable to how for example scipy-weave worked when it was still maintained).

Hope that gives you some pointers!

Hi @mstimberg ,

Iā€™m Mrigesh Thakur, an undergrad at NIT Hamirpur studying Mathematics and Scientific Computing.

Feel free to check out my GitHub profile if youā€™d like! :blush:

First off, I just wanted to say how much Iā€™ve enjoyed diving into the Brian codebaseā€”itā€™s such a well-crafted project. Iā€™m especially drawn to ideas that explore the low-level mechanics of systems, and Brianā€™s architectureā€”right from parsing differential equations to building and executing ASTsā€”is fascinating. I held off on introducing myself earlier because I wanted to take the time to deeply understand the project and try out a few proof-of-concept ideas before reaching out.

Thanks again for all your answers and insights so farā€”they really helped me get a better grasp of the direction this project aims for.

From what Iā€™ve understood, one of the core goals of the GSoC project is to retain Brianā€™s efficient memory-sharing model between Python and compiled code, while removing the overhead of Cython-based JIT compilation. So I started digging into Brianā€™s runtime mode and ran a few small experiments to explore that space.


How Memory Sharing Currently Works in Brian

After analyzing your example simulation, I focused on how C++ code accesses memory thatā€™s owned and allocated by Python. The current setup is smart and elegant:

  • No data copying between Python and compiled code
  • Native-speed execution via C++
  • Clean interoperability between Python and low-level code

Butā€”as you mentionedā€”the bottleneck lies in Cython compilation:

  • Each simulation generates a large amount of C++ code (my test saw ~7k chars of Cython producing ~535k chars of C++)
  • Compilation is slow, especially for large models
  • The generated code includes a lot of error-checking and boilerplate for Python C-API integration

POCs and Benchmarks Iā€™ve Tried

Hereā€™s a quick overview of what Iā€™ve explored so far ā€” full benchmarks and images are available in this GitHub repo. Would love for you to take a look :smile:


Experimentation Strategy

Since this GSoC project is very experimentation- and benchmarking-heavy, I thought the best approach would be to leverage Brianā€™s Preferences system to toggle experimental features cleanly.

What Iā€™ve done:

  • Used prefs to enable/disable raw-pointer mode for approach 2 and 3

  • Created parallel implementations rather than replacing existing ones

  • This lets us benchmark side-by-side without affecting stability

Iā€™d love your thoughts on whether this is a good pattern to follow during GSoCā€”and if you have a preferred format for such modular experimentation.


Approach 1: Shared Function Includes to Reduce Code Duplication

  • Extracted common C functions into a common_functions.pxi include file
  • Slightly reduced the expansion ratio (from ~76.8x to ~53.2x), but still didnā€™t eliminate Python API usage or solve compilation time issues

Approach 2: C++ Data Structures + Python Wrappers

  • Created a C++ DynamicArray<T> class and wrapped it with a Python/Cython interface
  • This allowed me to manage memory in C++ while exposing NumPy views in Python
  • Still hit Python API calls in generated code, which slowed things down
  • Benchmarks showed mixed results (NumPy access was still slightly faster for some ops)


Approach 3: Raw Pointer Exposure from C++ to Templates

I initially thought this would be the most promising approach, but the results ended up being pretty similar to Approach 2. :no_mouth: Mightā€™ve messed something up somewhere.

  • Modified the wrapper to expose raw C++ pointers
  • Injected those pointers directly into the generated code
  • Eliminated all Python API calls from runtime computation code
  • Benchmarks :


Design Directions Iā€™m Exploring

Based on your earlier feedback and what Iā€™ve understood from the project goal, here are some implementation paths Iā€™ve been thinking about:


Option A: Direct Shared Object Compilation (like scipy-weave)

  • Generate C++ code with raw pointers
  • Compile to .so/.dll dynamically and load at runtime
  • Minimal changes to current architecture
  • No Cython, just pure C++ with direct memory access

Option B: LLVM-Based JIT Compilation

  • Use something like llvmlite for low-level JIT codegen
  • Avoids C++ compilation overhead
  • More complex but very flexible
  • Might be a longer-term investment due to tooling complexity

Option C: Hybrid Backend with Simplified Cython

  • Stick with Cython but:
    • Use typedefs for simplicity
    • Centralize logic in header files
    • Expose pointers to templates
  • Easier transition from existing codegen paths

Option D: Global Resource Manager

Instead of either Python or C++ owning the memory, create a global resource manager:

  • Using libraries like sharedstructures or Boost.Interprocess
  • Clear separation of concerns
  • Potential for multi-process access to shared data

:thinking: A Few Questions and Next Steps

  • Proposal Review: Is there a mailing list or a preferred way I could share my draft proposal with you for feedback?
  • Benchmarks: Are there specific benchmarks or simulation patterns youā€™d like us to use for testing different approaches?
  • Memory Ownership Model: Which model do you think is best suited for Brianā€™s runtime mode going forward?
    • C++ owns memory, Python gets views
    • Python owns memory, C++ gets views (current method)
    • Global/shared manager abstraction?
  • Codegen Strategy:
    • Add a new backend separate from ā€˜numpyā€™ and ā€˜cythonā€™?
    • Extend the existing ā€˜cythonā€™ target?
    • Go fully custom/minimal?

Looking forward to your thoughts! Super excited to keep iterating on this and would love to know if Iā€™m headed in the right direction :raised_hands:

Thanks again,
Mrigesh Thakur


Hi @MRIGESH_THAKUR. Thank you for your interest in our project! I did not go through your post in detail yet, but just about this point:

You can send me a private message here with a link to your proposal, or simply to let me know that you uploaded it to the https://summerofcode.withgoogle.com/ platform (you can still modify the proposal until the deadline).

Sounds greatā€”thanks so much! Iā€™ll share the proposal with you as soon as itā€™s ready :blush:

1 Like

Hi everyone. I just realized that my first comment in this thread (GSoC 2025 Project #8 Brian Simulator :: Replace Brian's just-in-time compilation mechanism (350h) - #3 by mstimberg) was directly addressed at the first student that commented, who was by then the only student who expressed interest in working on this project. But just to be clear: the suggestions what to include in the application that I made in that comment of course apply to everyone who wants to submit an application :blush:

A few more comments on your questions @MRIGESH_THAKUR

As I noted earlier, I am not that worried about runtime performance, since that is already comparable to what weā€™d have with a C++ based solution. What is more important is the compilation time, or maybe as a proxy the size of the generated C++ files.

I think in runtime mode, we want to allocate all memory from within Python, but given that this allocation might be the creation of an object that is a wrapper around C++, the actual memory allocation might happen in C++ code. I am not 100% sure I am getting the difference between owned memory and view here, we are always dealing with pointers to the memory, arenā€™t we? Or do you mean this in the context of garbage collection?

My original plan for this project was to add a new backend, somewhat similar to the weave backend that we had in Python 2.x times, but without all the features around calling Python code from within C++. But I think a first step could be to change the approach in the Cython backend to interact with the spike queue and the dynamic array, where it currently calls back into Python.

Looks like a good direction so far :+1:

1 Like

@mstimberg Thank you , so much for the reviews :)

Apologies for the delayed reply ā€” I was traveling the past two days and just got the chance to catch up.

I think in runtime mode, we want to allocate all memory from within Python, but given that this allocation might be the creation of an object that is a wrapper around C++, the actual memory allocation might happen in C++ code. I am not 100% sure I am getting the difference between owned memory and view here, we are always dealing with pointers to the memory, arenā€™t we? Or do you mean this in the context of garbage collection?

Totally fair ā€” and sorry for the confusion on my part ! That was a bit of an imprecise statement. I was still working through the differences myself, but I think Iā€™ve got a clearer picture now.

The issue isnā€™t so much about memory ownership per se ā€” itā€™s more about how the generated Cython code unnecessarily routes through Python, even when the underlying data structures are implemented in C++. That indirection adds performance overhead and bloats the generated code.


Hereā€™s whatā€™s happening ( as in case of SpikeQueues , as much I have figured out :sweat_smile:):
When a neuron spikes in Brian2, the information flows like this:

Generated Cython ā†’ Python Method ā†’ Cython Wrapper ā†’ C++ SpikeQueue

This creates a bunch of slowdowns because:

  1. The Cython code has to look up Python objects in a dictionary
  2. Then it calls Python methods (slow!)
  3. These methods wrap C++ functionality
  4. Only then does it reach the actual fast C++ code

For example, instead of directly accessing C++ objects, the generated code does stuff like:

# Slow dictionary lookup to get a Python object
_spike_queue = _namespace['_spike_queue']  
# Slow Python method call
_spike_queue.push(np.array(_spikes, dtype=np.int32))  

So despite having fast C++ code under the hood, weā€™re paying the cost of:

  • Python-to-C++ boundary crossings
  • Type conversions and runtime checks
  • Indirection via dynamic namespace lookups

All of which cancel out the benefits of using Cython/C++ in the first place ā€” and result in unnecessarily large and slow generated code.

But I think a first step could be to change the approach in the Cython backend to interact with the spike queue and the dynamic array, where it currently calls back into Python.

Absolutely ā€” I agree this is the best place to start. Hereā€™s what I was thinking:

  • Update the Code Generator
    In brian2/codegen/generators/cython_generator.py, code it to recognize and handle SpikeQueue (and similar structures) as direct C++ objects, not just Python objects.
  • Fix the Cython Templates
    Modify the runtime templates in brian2/codegen/runtime/cython_rt/templates/ to emit native C++ level calls instead of relying on Python-wrapped methods.
  • Pass Native Objects into Namespace
    Update brian2/synapses/synapses.py (and similar files) to pass actual C++ object pointers into the namespace, instead of Python-wrapped versions.

Doing this should:

  • Shrink the size of the generated Cython files
  • Eliminate unnecessary dynamic dispatch
  • Bring us much closer to ā€œtrueā€ Cython-level performance

So as you said, The plan to start with fixing the spike queue and dynamic array interaction seems like a good first step to test this approachā€¦and Once weā€™ve verified the performance and codegen improvements there, we can consider broader structural changes like switching to a weave -style backend or even a new intermediate representation.


Also, @mstimberg ā€” just a quick question as Iā€™m putting some notes together for the proposal.

I was digging into how the old weave backend used to work, and it looks like it allowed embedding raw C/C++ code directly inside Python, something like:

from scipy import weave

def fast_computation(a, b, n):
    result = 0
    code = """
    for (int i=0; i<n; i++) {
        result += a[i] * b[i];
    }
    """
    weave.inline(code, ['a', 'b', 'n', 'result'])
    return result

So from what I understand, what weā€™re aiming to do now is something similar in spirit, but much more modern and robust:


The Goal:

Build a new backend that:

  1. Generates C++ code as strings or templates
  2. JIT-compiles it without needing Cython (to avoid the overhead and boilerplate)
  3. Enables direct memory access between Python and C++ (avoiding slow wrapper calls)
  4. Keeps the simplicity and flexibility of Python, but benefits from native C++ speed

To do this, we could potentially explore modern tools like:

  • LLVM for custom JIT compilation
  • pybind11 for seamless C++/Python integration (and easier memory sharing)

Does that direction sound in line with what you had in mind for a future backend? Just wanted to confirm Iā€™m understanding it correctly before I include it in the draft.

Thanks!

:alarm_clock: Dear students/open source beginners interested in the ā€œReplace Brianā€™s just-in-time compilation mechanismā€ project, please donā€™t forget that the deadline for applications on https://summerofcode.withgoogle.com is later today at 18:00 UTC, i.e. in about 5 hours. Note that it will be impossible for us to ask Google to finance an internship for a candidate that did not submit an application, and that there will be no extension of this deadline from Googleā€™s side. Good luck everyone, hope to see a few of you staying around (with or without a GSoC internship) :crossed_fingers:

1 Like