QSIRecon fails on automatic shell classification: silhouette_score error when bvals cannot form discrete shells

Summary of what happened:

Dear experts,

When running QSIRecon with dsi_studio_gqi on data with continuous, non‑shell b‑values, the workflow fails in _classify_shell_scheme due to a silhouette_score error from scikit‑learn.

Command used (and if a helper script was used, a link to the helper script or the command generated):

# Docker container setup
docker run -ti --rm --name qsirecon \
-v"D:\Project_File\Docker_workspace\Shanghai_5th_DTI_data:/workspace" \
-v D:\Project_File\Docker_workspace\Shanghai_5th_DTI_data\license.txt:/opt/freesurfer/license.txt \
--entrypoint /bin/bash pennlinc/qsirecon:1.1.1

# Reconstruction command
qsirecon QSIRecon_24B_input/ QSIRecon_24B_output/ participant \
--n-cpus 12 \
--b0-threshold 100 \
--output-resolution 2 \
--fs-license-file /opt/freesurfer/license.txt \
--recon-spec dsi_studio_gqi \
--input-type qsiprep \
--atlases 4S456Parcels \
-w QSIRecon_24B_workdir/ \
 --resource-monitor

Version:

QSIRecon version 1.1.1

Environment (Docker, Singularity / Apptainer, custom installation):

Docker

Data formatted according to a validatable standard? Please provide the output of the validator:

PASTE VALIDATOR OUTPUT HERE

Relevant log outputs (up to 20 lines):

Process Process-2:
Traceback (most recent call last):
  File "/opt/conda/envs/qsiprep/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/opt/conda/envs/qsiprep/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/qsirecon/cli/workflow.py", line 132, in build_workflow
    retval["workflow"] = init_qsirecon_wf()
  File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/qsirecon/workflows/base.py", line 44, in init_qsirecon_wf
    single_subject_wf = init_single_subject_recon_wf(subject_id=subject_id)
  File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/qsirecon/workflows/base.py", line 270, in init_single_subject_recon_wf
    shell_scheme = _classify_shell_scheme(bval_file, 5)
  File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/qsirecon/interfaces/gradients.py", line 427, in _classify_shell_scheme
    score = silhouette_score(X, shells)
  File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/sklearn/utils/_param_validation.py", line 213, in wrapper
    return func(*args, **kwargs)
  File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/sklearn/metrics/cluster/_unsupervised.py", line 140, in silhouette_score
    return np.mean(silhouette_samples(X, labels, metric=metric, **kwds))
  File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/sklearn/utils/_param_validation.py", line 186, in wrapper
    return func(*args, **kwargs)
  File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/sklearn/metrics/cluster/_unsupervised.py", line 286, in silhouette_samples
    check_number_of_labels(len(le.classes_), n_samples)
  File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/sklearn/metrics/cluster/_unsupervised.py", line 37, in check_number_of_labels
    raise ValueError(
ValueError: Number of labels is 24. Valid values are 2 to n_samples - 1 (inclusive)

Screenshots / relevant information:

The DWI data were obtained from the OASIS dataset database, and the corresponding b-values and b-vectors are shown below:
bval:

0
50
350
600
900
1150
100
400
650
950
150
450
700
1000
1300
200
500
800
1050
1350
300
850
1100
1400

bvec:

-0.00000000 -0.99825638 -0.94633920 -0.92534410 -0.79029252 -0.58419934 -0.60244829 -0.56673688 -0.50539370 -0.33841074 -0.27808398 -0.33247143 -0.28013253 0.03402919 -0.00448077 -0.01687484 -0.02620551 0.32860071 0.49470433 0.51204901 0.48409445 0.82262356 0.79201283 0.83369384
-0.00000000 0.05206002 -0.11507651 0.17759139 0.61224768 0.28900612 -0.75841484 -0.26478201 -0.68723894 0.85445427 0.96049027 -0.41048121 0.41712831 0.84331375 0.53098025 -0.07792818 -0.73339192 0.94446857 0.15978116 -0.20024485 -0.72087341 0.56850721 -0.28596385 0.21018086
0.00000000 -0.02782001 0.30199257 -0.33496208 -0.02430052 0.75841057 0.24872271 -0.78019215 -0.52180442 0.39419040 -0.01130271 0.84909830 -0.86459802 -0.53634312 0.84737235 0.99681615 0.67930079 0.00083285 0.85424681 -0.83528906 -0.49597792 0.00948833 0.53938887 -0.51066484

This is a sampling scheme I never imagined - it’s assigning every coordinate to its own cluster so the silhouette score can’t actually compute. This is a bug, we’ll fix in the next release

Hi,

Thank you very much for your clarification and for pointing out the issue.

In this case, could you kindly suggest any practical workaround or alternative reconstruction strategies that would be more suitable for data where b-values do not form clear discrete shells (e.g., DSI-like or irregular sampling)?

I would really appreciate any recommendations.

Thanks again for your help!