QSIRecon fails on automatic shell classification: silhouette_score error when bvals cannot form discrete shells

Summary of what happened:

Dear experts,

When running QSIRecon with dsi_studio_gqi on data with continuous, non‑shell b‑values, the workflow fails in _classify_shell_scheme due to a silhouette_score error from scikit‑learn.

Command used (and if a helper script was used, a link to the helper script or the command generated):

# Docker container setup
docker run -ti --rm --name qsirecon \
-v"D:\Project_File\Docker_workspace\Shanghai_5th_DTI_data:/workspace" \
-v D:\Project_File\Docker_workspace\Shanghai_5th_DTI_data\license.txt:/opt/freesurfer/license.txt \
--entrypoint /bin/bash pennlinc/qsirecon:1.1.1

# Reconstruction command
qsirecon QSIRecon_24B_input/ QSIRecon_24B_output/ participant \
--n-cpus 12 \
--b0-threshold 100 \
--output-resolution 2 \
--fs-license-file /opt/freesurfer/license.txt \
--recon-spec dsi_studio_gqi \
--input-type qsiprep \
--atlases 4S456Parcels \
-w QSIRecon_24B_workdir/ \
 --resource-monitor

Version:

QSIRecon version 1.1.1

Environment (Docker, Singularity / Apptainer, custom installation):

Docker

Data formatted according to a validatable standard? Please provide the output of the validator:

PASTE VALIDATOR OUTPUT HERE

Relevant log outputs (up to 20 lines):

Process Process-2:
Traceback (most recent call last):
  File "/opt/conda/envs/qsiprep/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/opt/conda/envs/qsiprep/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/qsirecon/cli/workflow.py", line 132, in build_workflow
    retval["workflow"] = init_qsirecon_wf()
  File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/qsirecon/workflows/base.py", line 44, in init_qsirecon_wf
    single_subject_wf = init_single_subject_recon_wf(subject_id=subject_id)
  File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/qsirecon/workflows/base.py", line 270, in init_single_subject_recon_wf
    shell_scheme = _classify_shell_scheme(bval_file, 5)
  File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/qsirecon/interfaces/gradients.py", line 427, in _classify_shell_scheme
    score = silhouette_score(X, shells)
  File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/sklearn/utils/_param_validation.py", line 213, in wrapper
    return func(*args, **kwargs)
  File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/sklearn/metrics/cluster/_unsupervised.py", line 140, in silhouette_score
    return np.mean(silhouette_samples(X, labels, metric=metric, **kwds))
  File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/sklearn/utils/_param_validation.py", line 186, in wrapper
    return func(*args, **kwargs)
  File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/sklearn/metrics/cluster/_unsupervised.py", line 286, in silhouette_samples
    check_number_of_labels(len(le.classes_), n_samples)
  File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/sklearn/metrics/cluster/_unsupervised.py", line 37, in check_number_of_labels
    raise ValueError(
ValueError: Number of labels is 24. Valid values are 2 to n_samples - 1 (inclusive)

Screenshots / relevant information:

The DWI data were obtained from the OASIS dataset database, and the corresponding b-values and b-vectors are shown below:
bval:

0
50
350
600
900
1150
100
400
650
950
150
450
700
1000
1300
200
500
800
1050
1350
300
850
1100
1400

bvec:

-0.00000000 -0.99825638 -0.94633920 -0.92534410 -0.79029252 -0.58419934 -0.60244829 -0.56673688 -0.50539370 -0.33841074 -0.27808398 -0.33247143 -0.28013253 0.03402919 -0.00448077 -0.01687484 -0.02620551 0.32860071 0.49470433 0.51204901 0.48409445 0.82262356 0.79201283 0.83369384
-0.00000000 0.05206002 -0.11507651 0.17759139 0.61224768 0.28900612 -0.75841484 -0.26478201 -0.68723894 0.85445427 0.96049027 -0.41048121 0.41712831 0.84331375 0.53098025 -0.07792818 -0.73339192 0.94446857 0.15978116 -0.20024485 -0.72087341 0.56850721 -0.28596385 0.21018086
0.00000000 -0.02782001 0.30199257 -0.33496208 -0.02430052 0.75841057 0.24872271 -0.78019215 -0.52180442 0.39419040 -0.01130271 0.84909830 -0.86459802 -0.53634312 0.84737235 0.99681615 0.67930079 0.00083285 0.85424681 -0.83528906 -0.49597792 0.00948833 0.53938887 -0.51066484

This is a sampling scheme I never imagined - it’s assigning every coordinate to its own cluster so the silhouette score can’t actually compute. This is a bug, we’ll fix in the next release