GSoC 2025 Project #8 Brian Simulator :: Replace Brian's just-in-time compilation mechanism (350h)

arnab1896 · March 4, 2025, 2:29pm

Mentors: Marcel Stimberg <marcel.stimberg@inserm.fr>, Dan Goodman <d.goodman@imperial.ac.uk>, Benjamin Evans <B.D.Evans@sussex.ac.uk>

Skill level: Intermediate/Advanced

Required skills: Python, C++

Time commitment: Full time (350 hours)

Forum for discussion

About: Brian is a clock-driven spiking neural network simulator that is easy to learn, highly flexible, and simple to extend. Written in Python, it allows users to describe and run arbitrary neural and synaptic models without needing to write code in any other programming language. It is built on a code-generation framework that transforms model descriptions into efficient low-level code. In Brian’s “runtime mode”, the generated code can interact with Python code by accessing the same data structures in memory. To make this work seamlessly, Brian makes use of Cython. This approach comes with two major disadvantages: 1) Cython compilation is slow (it generates a lot of code for error checking, etc.). For Cython, this is not a big downside, since it is commonly used to compile libraries once, but for Brian it matters since it needs to compile dynamically generated code frequently 2) We need to maintain a third code generation target besides Python and C++, with small but non-trivial differences to both of them.

Aims: The aims of this project are to:

Replace the Python data structures that are currently used from within Cython code (dynamic arrays and the “spike queue”) by C++ equivalents
Research solutions to call the compiled C++ code from Python and make it directly access the memory in the shared data structures storing the simulation variables. This could build upon existing just-in-time compilation technologies such as numba, or a package such as scipy-weave.
Implement the above solution, and refactor the current code to make use of it

Project website: GitHub - brian-team/brian2: Brian is a free, open source simulator for spiking neural networks.

Tech keywords: Python, C++, compilation, JIT

namita-ach · March 10, 2025, 5:15pm

Hello!
I love this project!
I’m really love computational neuroscience and want to build my career in this space. I’ve been working with NeuroML and have experience with spiking neural networks, particularly in modeling and simulation. Given my background, I’d love to contribute to this project. What areas should I focus on to best prepare myself? Would a deeper dive into JIT compilation techniques (maybe Numba?) or a stronger understanding of C++ memory management for neural simulations be most beneficial? What do you suggest?

mstimberg · March 11, 2025, 3:58pm

Dear @namita-ach, many thanks for your interest in the project. This project is a bit different from the others we propose, in that there will be a significant “exploratory” component where we figure out what the best approach will be. Therefore, in addition to getting an idea of how Brian works, it would be good to get an overview of existing solutions to interface C/C++ and Python (Cython, ctypes, cffi, …). Regarding the way Brian’s runtime code generation currently works with Cython, I invite you to run the code posted below. It will simulate a (very simplified and not very interesting) neuron model and record its activity over time. You can find its generated code in the directory that gets printed out. Hopefully that should give you a bit of an idea of what the generated code looks like at the moment, and how it links between Python and Cython/C++. Please include a (not too detailed) “walkthrough” through these files, and in particular how the final C++ code accesses memory that was allocated on the Python side, in your application (in case you decide to apply, of course )

Don’t hesitate to ask any questions you will most likely have about this.

from brian2 import *
from brian2.codegen.runtime.cython_rt.extension_manager import get_cython_cache_dir

try:
    clear_cache("cython")  # Just to make sure that we start with a clean slate
except Exception:
    pass  # there might be nothing to clear

prefs.codegen.target = "cython"
prefs.codegen.runtime.cython.delete_source_files = False

# A not-so-interesting example of a neuron simulation
neurons = NeuronGroup(10, "dv/dt = -v / (10*ms) : 1", method="exact")
neurons.v = "rand()"
state_monitor = StateMonitor(neurons, "v", record=True)

run(25*ms)

print("Generated code in:", get_cython_cache_dir())

# Just to show that the simulation worked, not generating any Cython code
plt.plot(state_monitor.t/ms, state_monitor.v.T)
plt.show()

Please also see our general guidelines for GSoC applications here: GSoC 2025 | The Brian spiking neural network simulator

kkiran57 · March 12, 2025, 1:14am

Thank you, @mstimberg, for all the details!
I hope all the Brain projects get selected by Google. But if Brain gets only one slot, which project would be your top priority out of the 3?

karthik_sathish · March 12, 2025, 6:30am

Hello @mstimberg, I am Karthik Sathish, a junior year student from IIT Roorkee. I would like to work on this project. This would be my work flow for the next 2 days:
Understand the current Just in Time architecture that Brain follows (I want to know which part of the architecture is making it run slow.)
Once I figure out the actual problem, I will look for the possible alternatives.
If I face any problem in the journey, I will write it down here.

mstimberg · March 12, 2025, 9:44am

Indeed, sorry about that! I edited my post to fix the link.

mstimberg · March 12, 2025, 9:49am

We don’t have a priority project as such. If we only get one slot, we’d select the participant/project combination that we estimate to have the highest chance of achieving the aims of the project.

Mavericks · March 18, 2025, 8:30am

Hi @mstimberg sir,
I’m Sagar, I found this project exiciting! I am happy to work on it and would love if I can add any value to make Brian faster and better.

I have good C++ skills and some python experience, though I am still learning cython. I studied some of Brian’s code and understood parts of it, like how cython slows things down with extra checks and why switching to C++ for the heavy stuff (like neuron data) makes sense. Some parts were tricky for me, but I am starting to figure them out. I am trying to find a way to connect python and C++ smoothly (maybe with Numba) and looking for blocks that I can rewrite in C++ as of now. Please let me know if there is anything that I can improve.

mstimberg · March 18, 2025, 3:12pm

Hi @Mavericks,
Please have a look at the example I posted earlier in this thread, working through it should make clearer (hopefully…). I wouldn’t try to rewrite things in C++ as of now – what is the most important thing for this project is a simple/lightweight way of exchanging data between Python and a compiled C++ extension. For a relatively simple data structure, e.g. a dynamic array (which could be just a class wrapping a std::vector in C++), how can we declare this data structure together with a Python wrapper (which could be written in Cython) and use it both from Python and from the compiled C++ code, both accessing the same underlying memory. Not sure whether this is clear – please let me know if not

Actually, we are not that much worried about the slow speed of execution in Cython (despite the additional error checks, it is not much slower than “pure” C++), but about the speed of compilation. If you look at the C++ code that is generated from the Cython code, you can see that it is huge compared to what it would look like if you wrote it in C++.

Jiayi_Qian · April 1, 2025, 12:12am

Hi @mstimberg ,

I’m Jiayi, an undergrad at UIUC studying Computer Science and Statistics. I’ve been exploring the Brian JIT compilation replacement project and really appreciate the helpful context you’ve provided in the discussion thread!

I’ve started running and digging into how Brian currently manages runtime codegen and memory access, and I’m trying to align my early experiments with the broader project goals. I had a few higher-level questions as I prepare my proposal:

• Are there particular pain points or complexities with the current Python-based “spike queue” or dynamic arrays that we should prioritize when replacing them with C++ equivalents?

• In terms of memory access between compiled C++ code and Python, would you prefer strategies like the buffer protocol, raw pointers, or another approach to minimize copies and overhead?

• For the JIT backend, do you see libraries like Numba or LLVM-based approaches (e.g. llvmlite) as feasible directions, or would a simpler mechanism (like direct shared object loading) be preferred?

Thanks again for your guidance and for maintaining such an exciting project. I’m very excited to keep exploring this and share a small working prototype soon.

mstimberg · April 2, 2025, 3:42pm

Hi @Jiayi_Qian. Good questions! Here are some quick rough answers:

Note that we already have a C++ equivalent of both: the C++ dynamic array is used in C++ standalone mode, and the C++ spike queue is used even in Python mode, since we wrap it into a Python object via Cython. The main “pain point” in runtime mode for the spike queue is that we generate Cython/C++ code that uses the Python API to call a method on the Python object which wraps the C++ object, instead of interfacing with the C++ code directly. For the dynamic array, it is similar, but we use a pure Python implementation (which uses numpy under the hood which of course calls out to compiled C/Fortran code itself).

I don’t know to be honest – I was thinking of passing around raw pointers, but maybe there is a better solution.

I think numba is too limited for us, but potentially something LLVM-based could work – I do not know enough about it to judge, though. For now, I had direct shared object loading in mind (comparable to how for example scipy-weave worked when it was still maintained).

Hope that gives you some pointers!

MRIGESH_THAKUR · April 4, 2025, 9:46am

Hi @mstimberg ,

I’m Mrigesh Thakur, an undergrad at NIT Hamirpur studying Mathematics and Scientific Computing.

Feel free to check out my GitHub profile if you’d like!

First off, I just wanted to say how much I’ve enjoyed diving into the Brian codebase—it’s such a well-crafted project. I’m especially drawn to ideas that explore the low-level mechanics of systems, and Brian’s architecture—right from parsing differential equations to building and executing ASTs—is fascinating. I held off on introducing myself earlier because I wanted to take the time to deeply understand the project and try out a few proof-of-concept ideas before reaching out.

Thanks again for all your answers and insights so far—they really helped me get a better grasp of the direction this project aims for.

From what I’ve understood, one of the core goals of the GSoC project is to retain Brian’s efficient memory-sharing model between Python and compiled code, while removing the overhead of Cython-based JIT compilation. So I started digging into Brian’s runtime mode and ran a few small experiments to explore that space.

How Memory Sharing Currently Works in Brian

After analyzing your example simulation, I focused on how C++ code accesses memory that’s owned and allocated by Python. The current setup is smart and elegant:

No data copying between Python and compiled code
Native-speed execution via C++
Clean interoperability between Python and low-level code

But—as you mentioned—the bottleneck lies in Cython compilation:

Each simulation generates a large amount of C++ code (my test saw ~7k chars of Cython producing ~535k chars of C++)
Compilation is slow, especially for large models
The generated code includes a lot of error-checking and boilerplate for Python C-API integration

POCs and Benchmarks I’ve Tried

Here’s a quick overview of what I’ve explored so far — full benchmarks and images are available in this GitHub repo. Would love for you to take a look

Experimentation Strategy

Since this GSoC project is very experimentation- and benchmarking-heavy, I thought the best approach would be to leverage Brian’s Preferences system to toggle experimental features cleanly.

What I’ve done:

Used prefs to enable/disable raw-pointer mode for approach 2 and 3
Created parallel implementations rather than replacing existing ones
This lets us benchmark side-by-side without affecting stability

I’d love your thoughts on whether this is a good pattern to follow during GSoC—and if you have a preferred format for such modular experimentation.

Approach 1: Shared Function Includes to Reduce Code Duplication

Extracted common C functions into a common_functions.pxi include file
Slightly reduced the expansion ratio (from ~76.8x to ~53.2x), but still didn’t eliminate Python API usage or solve compilation time issues

Approach 2: C++ Data Structures + Python Wrappers

Created a C++ DynamicArray<T> class and wrapped it with a Python/Cython interface
This allowed me to manage memory in C++ while exposing NumPy views in Python
Still hit Python API calls in generated code, which slowed things down
Benchmarks showed mixed results (NumPy access was still slightly faster for some ops)

Approach 3: Raw Pointer Exposure from C++ to Templates

I initially thought this would be the most promising approach, but the results ended up being pretty similar to Approach 2. Might’ve messed something up somewhere.

Modified the wrapper to expose raw C++ pointers
Injected those pointers directly into the generated code
Eliminated all Python API calls from runtime computation code
Benchmarks :

Design Directions I’m Exploring

Based on your earlier feedback and what I’ve understood from the project goal, here are some implementation paths I’ve been thinking about:

Option A: Direct Shared Object Compilation (like `scipy-weave`)

Generate C++ code with raw pointers
Compile to .so/.dll dynamically and load at runtime
Minimal changes to current architecture
No Cython, just pure C++ with direct memory access

Option B: LLVM-Based JIT Compilation

Use something like llvmlite for low-level JIT codegen
Avoids C++ compilation overhead
More complex but very flexible
Might be a longer-term investment due to tooling complexity

Option C: Hybrid Backend with Simplified Cython

Stick with Cython but:
- Use typedefs for simplicity
- Centralize logic in header files
- Expose pointers to templates
Easier transition from existing codegen paths

Option D: Global Resource Manager

Instead of either Python or C++ owning the memory, create a global resource manager:

Using libraries like sharedstructures or Boost.Interprocess
Clear separation of concerns
Potential for multi-process access to shared data

A Few Questions and Next Steps

Proposal Review: Is there a mailing list or a preferred way I could share my draft proposal with you for feedback?
Benchmarks: Are there specific benchmarks or simulation patterns you’d like us to use for testing different approaches?
Memory Ownership Model: Which model do you think is best suited for Brian’s runtime mode going forward?
- C++ owns memory, Python gets views
- Python owns memory, C++ gets views (current method)
- Global/shared manager abstraction?
Codegen Strategy:
- Add a new backend separate from ‘numpy’ and ‘cython’?
- Extend the existing ‘cython’ target?
- Go fully custom/minimal?

Looking forward to your thoughts! Super excited to keep iterating on this and would love to know if I’m headed in the right direction

Thanks again,
Mrigesh Thakur

mstimberg · April 4, 2025, 9:51am

Hi @MRIGESH_THAKUR. Thank you for your interest in our project! I did not go through your post in detail yet, but just about this point:

You can send me a private message here with a link to your proposal, or simply to let me know that you uploaded it to the https://summerofcode.withgoogle.com/ platform (you can still modify the proposal until the deadline).

MRIGESH_THAKUR · April 4, 2025, 10:14am

Sounds great—thanks so much! I’ll share the proposal with you as soon as it’s ready

mstimberg · April 4, 2025, 2:37pm

Hi everyone. I just realized that my first comment in this thread (GSoC 2025 Project #8 Brian Simulator :: Replace Brian's just-in-time compilation mechanism (350h) - #3 by mstimberg) was directly addressed at the first student that commented, who was by then the only student who expressed interest in working on this project. But just to be clear: the suggestions what to include in the application that I made in that comment of course apply to everyone who wants to submit an application

mstimberg · April 4, 2025, 2:53pm

A few more comments on your questions @MRIGESH_THAKUR

As I noted earlier, I am not that worried about runtime performance, since that is already comparable to what we’d have with a C++ based solution. What is more important is the compilation time, or maybe as a proxy the size of the generated C++ files.

I think in runtime mode, we want to allocate all memory from within Python, but given that this allocation might be the creation of an object that is a wrapper around C++, the actual memory allocation might happen in C++ code. I am not 100% sure I am getting the difference between owned memory and view here, we are always dealing with pointers to the memory, aren’t we? Or do you mean this in the context of garbage collection?

My original plan for this project was to add a new backend, somewhat similar to the weave backend that we had in Python 2.x times, but without all the features around calling Python code from within C++. But I think a first step could be to change the approach in the Cython backend to interact with the spike queue and the dynamic array, where it currently calls back into Python.

Looks like a good direction so far

MRIGESH_THAKUR · April 7, 2025, 11:41am

@mstimberg Thank you , so much for the reviews :)

Apologies for the delayed reply — I was traveling the past two days and just got the chance to catch up.

I think in runtime mode, we want to allocate all memory from within Python, but given that this allocation might be the creation of an object that is a wrapper around C++, the actual memory allocation might happen in C++ code. I am not 100% sure I am getting the difference between owned memory and view here, we are always dealing with pointers to the memory, aren’t we? Or do you mean this in the context of garbage collection?

Totally fair — and sorry for the confusion on my part ! That was a bit of an imprecise statement. I was still working through the differences myself, but I think I’ve got a clearer picture now.

The issue isn’t so much about memory ownership per se — it’s more about how the generated Cython code unnecessarily routes through Python, even when the underlying data structures are implemented in C++. That indirection adds performance overhead and bloats the generated code.

Here’s what’s happening ( as in case of SpikeQueues , as much I have figured out ):
When a neuron spikes in Brian2, the information flows like this:

Generated Cython → Python Method → Cython Wrapper → C++ SpikeQueue

This creates a bunch of slowdowns because:

The Cython code has to look up Python objects in a dictionary
Then it calls Python methods (slow!)
These methods wrap C++ functionality
Only then does it reach the actual fast C++ code

For example, instead of directly accessing C++ objects, the generated code does stuff like:

# Slow dictionary lookup to get a Python object
_spike_queue = _namespace['_spike_queue']  
# Slow Python method call
_spike_queue.push(np.array(_spikes, dtype=np.int32))

So despite having fast C++ code under the hood, we’re paying the cost of:

Python-to-C++ boundary crossings
Type conversions and runtime checks
Indirection via dynamic namespace lookups

All of which cancel out the benefits of using Cython/C++ in the first place — and result in unnecessarily large and slow generated code.

But I think a first step could be to change the approach in the Cython backend to interact with the spike queue and the dynamic array, where it currently calls back into Python.

Absolutely — I agree this is the best place to start. Here’s what I was thinking:

Update the Code Generator
In brian2/codegen/generators/cython_generator.py, code it to recognize and handle SpikeQueue (and similar structures) as direct C++ objects, not just Python objects.
Fix the Cython Templates
Modify the runtime templates in brian2/codegen/runtime/cython_rt/templates/ to emit native C++ level calls instead of relying on Python-wrapped methods.
Pass Native Objects into Namespace
Update brian2/synapses/synapses.py (and similar files) to pass actual C++ object pointers into the namespace, instead of Python-wrapped versions.

Doing this should:

Shrink the size of the generated Cython files
Eliminate unnecessary dynamic dispatch
Bring us much closer to “true” Cython-level performance

So as you said, The plan to start with fixing the spike queue and dynamic array interaction seems like a good first step to test this approach…and Once we’ve verified the performance and codegen improvements there, we can consider broader structural changes like switching to a weave -style backend or even a new intermediate representation.

MRIGESH_THAKUR · April 7, 2025, 11:46am

Also, @mstimberg — just a quick question as I’m putting some notes together for the proposal.

I was digging into how the old weave backend used to work, and it looks like it allowed embedding raw C/C++ code directly inside Python, something like:

from scipy import weave

def fast_computation(a, b, n):
    result = 0
    code = """
    for (int i=0; i<n; i++) {
        result += a[i] * b[i];
    }
    """
    weave.inline(code, ['a', 'b', 'n', 'result'])
    return result

So from what I understand, what we’re aiming to do now is something similar in spirit, but much more modern and robust:

The Goal:

Build a new backend that:

Generates C++ code as strings or templates
JIT-compiles it without needing Cython (to avoid the overhead and boilerplate)
Enables direct memory access between Python and C++ (avoiding slow wrapper calls)
Keeps the simplicity and flexibility of Python, but benefits from native C++ speed

To do this, we could potentially explore modern tools like:

LLVM for custom JIT compilation
pybind11 for seamless C++/Python integration (and easier memory sharing)

Does that direction sound in line with what you had in mind for a future backend? Just wanted to confirm I’m understanding it correctly before I include it in the draft.

Thanks!

mstimberg · April 8, 2025, 12:48pm

Dear students/open source beginners interested in the “Replace Brian’s just-in-time compilation mechanism” project, please don’t forget that the deadline for applications on https://summerofcode.withgoogle.com is later today at 18:00 UTC, i.e. in about 5 hours. Note that it will be impossible for us to ask Google to finance an internship for a candidate that did not submit an application, and that there will be no extension of this deadline from Google’s side. Good luck everyone, hope to see a few of you staying around (with or without a GSoC internship)

mstimberg · April 16, 2025, 4:13pm

Dear students. Thanks to everyone who applied for the project to replace Brian’s just-in-time compilation The decision process of who we can accept has multiple levels (outlined in the post on our website: GSoC 2025 | The Brian spiking neural network simulator), and we will select/rank candidates from our side very soon.
I am mostly telling you this for full transparency: all contributions to Brian and its subprojects (PRs, issue discussions, etc.) in the coming weeks are of course very welcome, but please be aware that we will not be able to take them into account for the decision. In other words, don’t feel obliged to work on Brian to increase your chances to get accepted