How to combine pyAFQ bundles and MRtrix SIFT2 streamline weight?

wangw · September 28, 2025, 8:41am

Summary of what happened:

Hello everyone,
I have already get the whole-brain tractogram with QSIRecon (mrtrix_multishell_msmt_ACT-hsvs). It has generated the track file (*model-ifod2_streamlines.tck,10 million) and the corresponding sift2 weight for each streamline (*model-sift2_streamlineweights.csv). Then I want to use pyAFQ to select some bundle of interest (BOI) by custom waypoint ROIs and probmaps.
According to my understanding, when calculating the metrics of a fiber bundle, the weighted average based on sift2 weight is required. So, the question is how can I calculate the scalar metrics (e.g. FA, MD) for each streamline and each bundle of my BOI with the streamline weight from sift2? Or, could it be possible to combine the sift2 weight and streamline together during the pyAFQ process of filtering the fiber bundles, and perform the filtering simultaneously?

Thank you,
Wei

Ariel_Rokem · September 28, 2025, 2:47pm

Hi Wei,

That is a great question. I am not sure what a weighted average based on sift2 is “required”, but on the face of it, using sift2 weights seems like a great idea, and I would be quite interested to learn how/whether tract profiles that are weighted by sift2 weights differ from those that are weighted by the deviation of the streamline from the core of the tract (which is pyAFQ’s current default).

Fortunately, this should be possible to calculate with currently available software: The DIPY afq_profile function (see example here) takes an optional 'weights' key-word argument. I believe that if you provide the sift2 weights as an array that is ordered in the same order as your streamlines, the result would be a tract profile that is weighted proportionally by the sift2 weight assigned to each streamline.

I’d be quite interested to see a comparison if you conduct one.

Best,
Ariel

wangw · September 29, 2025, 3:25am

Hi Ariel,

Thank you very much for your kind reply. I agree that using the DIPY afq_profile function for weighted calculation of streamlines seems to make sense. But before that, we need to determine the index of the BOI streamlines in the original whole-brain tractogram, so that we can extract the sift2 weight in the right order. This is the problem that troubles me .

To achieve this, I have tried to write the original index into the metadata (data_per_streamline) of the whole-brain tractogram (both in .tck and .trk format) with DIPY. And then the tract file was provided to pyAFQ for bundle recognition, with the expectation that it can be filtered while carrying the index.

# Codes for writing the original index into the metadata
import numpy as np
from dipy.io.streamline import load_tractogram, save_tractogram

raw_file = "*desc-tracks_ifod2_100k.tck"    # The same procedure applies to.trk files as well.
out_file = "*desc-tracks_ifod2_100k_withidx.tck"

sft = load_tractogramn(raw_file, reference="*space-T1w_FA.nii.gz"
n_streamlines = len(sft.streamlines)
orig_idx=np.arange(n_streamlines)
sft.data_per_streamline["orig_idx"] = orig_idx

save_tractogram(sft, out_file)

Unfortunately, it failed. I found that the .tck format does not seem to support metadata writing (this information is automatically erased during saving with DIPY save_tractogram function). Although the .trk format support metadata writing, it cannot be retained after pyAFQ bundle recognition…

Now I have another idea: For each streamline in the BOI, compare it one by one with the streamlines in the original whole-brain tractogram, thereby determining its index. But this seems to involve a relatively large amount of calculation (thousands of people * 10 million streamline). Do you think this plan is feasible? Or are there any other better alternatives?

Many thanks,
Wei

wangw · September 29, 2025, 3:42am

By the way, although I had not thought about making a direct comparison between tract profiles that are weighted by sift2 weights and those under pyAFQ method before (I had only highly appreciated the bundle cleaning process of pyAFQ before), I now realize it is interesting. So after finding out the solution of the above problem, I’d like to do the comparison and share with you all

Ariel_Rokem · September 29, 2025, 4:44pm

Hi Wei,

The recognize function, which is the work-horse of pyAFQ bundle recognition, has a return_idx key-word argument that is designed specifically for this kind of use-case, where indices in the original tractogram need to be propagated. This does mean that you will have to set up everything and call it directly rather than depending on pyAFQ’s data management and workflow API. Does that provide a good enough solution?

Cheers,
Ariel

Ariel_Rokem · September 29, 2025, 4:59pm

Correction: I think that I might be wrong, and you can actually pass return_idx=True to the GroupAFQ class and also set that up as part of the config file, if you are using the CLI. I believe that this would save the original indices in the individual tract trk files, just like what you were trying to do. So no need to write all the low-level code

wangw · September 30, 2025, 7:07am

Hi Ariel,

Yes, just as you noticed, I also found that the clean function also has a return_idx parameter.

return_idx: bool
    Whether to return indices in the original streamlines. Default: False.

In general, the cleaning process needs to be completed through multiple iterations (defaults to 5 iterations in pyAFQ). So the indices recorded in the log file seem to be relative to the most recent tractogram iteration, as shown in the following screenshots.

It seems that some modifications to the code of pyAFQ are necessary (although for me, a novice in coding, this might take some time). Nevertheless, this is indeed a promising discovery.

Thanks again,
Wei

Ariel_Rokem · September 30, 2025, 6:20pm

Well spotted and diagnosed. That’s a bug, now recorded here and hopefully to be fixed before too long (but no promises )

wangw · October 1, 2025, 1:11am

Thanks for your timely answer and looking forward to seeing this bug fixed

JKruper · October 3, 2025, 1:40am

Hello Wei,
What version of pyAFQ are you running? I believe this is an error from an older version of pyAFQ as we no longer print “Removed indices” we print “Kept indicies”. I tested this on the latest pyAFQ version and it seems to work. Try updating to a newer version and see if this helps. Let me know how it goes.

Best,
John

wangw · October 8, 2025, 3:05am

Hi @JKruper ,

Thank you for the reply. Yes, you are right. The logfile now records "Kept indicies" (as shown below) after updating pyAFQ to the latest version

1709 DEBUG:AFQ:Rounds elapsed: 5, num kept: 174
1710 DEBUG:AFQ:Kept indicies: [ 8 12 16 20 23 24 25 29 30 60 71 79 83 89 93 32 34 42
1711 94 100 103 116 119 120 125 127 128 138 143 153 155 156 157 160 166 167
1712 176 179 180 188 190 200 203 204 209 210 219 221 222 229 231 237 249 251
1713 253 265 266 270 274 281 288 289 300 301 303 316 318 321 327 328 335 340
1714 353 357 368 371 373 374 377 379 381 407 417 420 421 423 437 442 445 449
1715 454 455 457 458 464 470 471 472 474 476 478 480 481 486 488 493 500 503
1716 504 509 511 512 515 522 541 553 558 562 563 571 577 578 581 583 586 588
1717 594 600 602 604 607 614 620 624 626 630 633 637 645 651 653 655 658 659
1718 663 664 666 667 676 684 686 688 693 694 696 708 716 724 726 746 749 752
1719 753 756 757 760 765 767 769 774 776 778 785 787]

However, these kept indicies in logfile (DEBUG:AFQ) do not seem to be the ones in the original whole-brain tractogram that I estimated using another pipeline and then fed into pyAFQ workflow for bundle recognition. Then I try to specify the return_idx argument in the workflow (ParticipantAFQ) like this:

# ===== Create a participantAFQ wrapper class =====

afq = ParticipantAFQ(
    dwi_data_file=dwi_file,
    bval_file=bval_file,
    bvec_file=bvec_file,
    bids_path=data_dir,
    output_dir=output_dir,
    mapping_definition=fnirt_map,
    import_tract=tract_file,
    bundle_info=bundle_dict,
    preproc_pipeline="none",
    segmentation_params=dict( 
        return_idx=True,
        cleaning_params=dict(
            clean_rounds=5,
            distance_threshold=3,
            length_threshold=4,
            return_idx=True))
)

Is my parameter configuration correct? It completed smoothly and I found the “tracking_idx” field in the result file “sub-*_acq-hardi_coordsys-RASMM_trkmethod-probCSD_recogmethod-AFQ_tractography.json” as below.

Are these indices relative to the indices in the original tractogram? If so, is there a parameter setting that can output the indices for different bundles separately? And by the way, what are the definitions and relationships between these three indices (i.e. 1-"Kept indicies" in logfile (DEBUG:AFQ), 2-“tracking_idx” field in the result .josn file, 3-indices in the original tractogram)?

Thanks,
Wei

wangw · October 8, 2025, 3:10pm

A new try for checking indices (failed):

In order to verify the correctness of the indices (i.e. the “tracking_idx” field in the result file “sub-*_acq-hardi_coordsys-RASMM_trkmethod-probCSD_recogmethod-AFQ_tractography.json”) generated with the return_idx argument, I manually selected the streamline set from the original whole-brain tractogram according to those indices using DIPY library. Then I compared the coordinates of streamlines between the streamline set I selected and the bundle recognized by pyAFQ. Unfortunately, nearly one-third to one-half of the streamlines are not matching! However, visually, the two bundles appear to be similar.

I’m confused about where the problem lies. Is there a bug in the indices generated by pyAFQ, or did I make a mistake in my inspection process?

JKruper · October 8, 2025, 6:01pm

I should have said earlier: these DEBUG indices from the logfile are really only useful for debugging. They are indices relative to what the cleaning received, not the original tractogram. So ignore (1). However the tracking_idx you found in the JSON file should be indices into the original tractogram. So (2) and (3) should be the same. So, manually selecting from the original tractogram using tracking_idx should have worked. So I am not sure what is wrong.

One question: do they visually look similar or the same? Because one thing pyAFQ does is flip streamlines so they are oriented in the same direction. So roughly half of the streamlines in the bundle output by pyAFQ should be flipped relative to the original tractogram. If you account for the flipping, the streamlines should match exactly.

Please keep me posted. If you keep running into issues and you are free to share the data, you could also send me the trk/json files at my email which can be found here: People - University of Washington Department of Psychology

wangw · October 9, 2025, 2:38am

Correction for previously inaccurate statement: the two bundles visually look the same. And thanks for pointing out the “streamline flipping” procedure in pyAFQ. The streamlines now matched exactly under a floating-point error tolerance of 1e-4 when accounting for the flipping.

Some additional details hope to confirm with you:
(1) Considering fiber bundle format conversion, is a tolerance of 1e-4 acceptable? (pyAFQ bring .tck to .trk; the manually-selected bundle saved in .tck)
(2) Does “streamline flipping” procedure affect subsequent streamline/bundle-level analysis (e.g., calculating streamline/bundle metrics based on SIFT2 weights, or extracting metrics for specific segments of streamlines/bundles)? Or does the subsequent analysis require additional consideration of the “streamline flipping” procedure?
(3) When identifying multiple fiber bundles simultaneously, is there a parameter setting that can extract the streamline indices corresponding to each individual fiber bundle separately, rather than storing them all in “tracking_idx” (as shown below)? If not, what is the best way to manually separate them?

Ariel_Rokem · October 9, 2025, 4:10pm

(1) I am not entirely sure where these differences come from, but I think that 1e-4 differences are acceptable. For consideration of this: assume that the coordinates are in units of voxels, and assume a voxel of 2 mm isotropic. This means that a 1e-4 difference is a difference of 0.2 microns between the coordinates in the two files. I doubt that the measurement has the precision where 0.2 microns makes any practical difference.
(2) I am not sure that I understand the question. Let me say this: The flipping is specifically intended for the purpose of extracting tract profiles from the tracts and maps of tissue properties such as FA/MD/etc. The reason we do this is that we collect information from the maps along the length of each streamline and we want them all to be co-extensive in space when we do that, so that we don’t start collecting from one end of the tract in one streamline and from the other end of the tract for another streamline. Not sure, did this answer your question.
(3) Not with current versions, but we’re working on it for future versions here. For now, you will have to rely on the order of the tracts in “bundle_ids” and the number of streamlines in each tract to know where one tract starts and ends in “tracking_idx”.

wangw · October 10, 2025, 1:35am

Hi Ariel,

Thank you and John. I think these did answer my questions.

Additional explanation for (2): I am considering performing analyses beyond tract profiles, such as calculating overall FA/MD of bundles recognized by pyAFQ combined with MRtrix SIFT2 weights. As you mentioned, flipping merely alters the spatial orientation of streamlines to make them spatially consistent, so I agree this procedure should not impact other analyses.

Cheers,
Wei