Debugging duplicate study sessions in DICOM to BIDS conversion

jhlegarreta · October 30, 2024, 6:38pm

Hi,
I have been given a set of DICOM data and I am using heudiconv to convert them to NIfTI and BIDS. Unfortunately, I have very limited experience converting DICOM data.

I have a custon heuristic file that I have developped based on the reproin example available on the heudiconv repository. The data I have contains T1w data, dMRI data (AP and PA) and SE field maps.

I am using a singularity image built on Oct 29 from the latest version. My call to heudiconv is simple:

singularity exec \
  --bind ${data_dirname} \
  ${heudiconv_singularity_fname} \
  heudiconv \
    -f ${config_fname} \
    --bids \
    -o ${out_dirname} \
    --files ${in_dirname}

where the variables have been appropriately set to the folders of interest.

The DICOM files are located in three separate folders, T1, DWI_AP, DWI_PA, Spin_Echo_Maps, each one containing one folder containing all the DICOM files per participant.

When running the conversion, the task raises an assertion error because when checking that the study session that is being parsed is not already in the list,

github.com

nipy/heudiconv/blob/2eb52910af77c890e40847e8dd6bcea09f67179f/heudiconv/parser.py#L302


      
                  # split into multiple sessions!!!! but then it should be provided
                  # full seqinfo with files which it would place into multiple groups
                  study_session_info = StudySessionInfo(
                      ids.get("locator"),
                      ids.get("session", session) or session,
                      sid or ids.get("subject", None),
                  )
                  lgr.info("Study session for %r", study_session_info)
          
                  if grouping != "all":
                      assert study_session_info not in study_sessions, (
                          f"Existing study session {study_session_info} "
                          f"already in analyzed sessions {study_sessions.keys()}"
                      )
                  study_sessions[study_session_info] = seqinfo
          return study_sessions

it finds that the session is already in the list of study sessions.

The data belongs to the same study (at least for my purposes), but the StudySessionInfo instances show that they have no session information, and the locator (study) is different, e.g.

StudySessionInfo(locator='Investigators/MyStudySZPain10820', session=None, subject='092743')
(...)
StudySessionInfo(locator='Investigators/MyStudySZPain8232021', session=None, subject='197668')
(...)

There seems to be 4 different locators (MyStudy is a substitute for the real name):

Investigators/MyStudy
Investigators/MyStudyFDNeuro08242021
Investigators/MyStudyFDPain08242021
Investigators/MyStudySZPain
Investigators/MyStudySZPain10820
Investigators/MyStudySZPain8232021

If I keep only the DWI folders in my in_dirname, heudiconv is able to complete the conversion, and organizes the data into an Investigators main folder with the rest of MyStudy* subfolders containing a subset of the 68 participants contained in the dataset.

If I keep the T1 and DWI_AP folder in the in_dirname, I get the error mentioned above.

I found this PR

ENH: grouping by mgxd · Pull Request #359 · nipy/heudiconv · GitHub

and thought that maybe the --grouping all flag could be helpful to solve this. When using the flag I get an assertion error from my config file, because the seq infos are not unique (as said, inherited from the reproin example):

github.com

nipy/heudiconv/blob/2eb52910af77c890e40847e8dd6bcea09f67179f/heudiconv/heuristics/reproin.py#L727


      
                                  "{} is already known to info={}. "
                                  "May be a bug for per_series=True handling?"
                                  "".format(dup_template, info)
                              )
                          info[dup_template] = [dup_series_id]
                      info[template] = series_ids[-1:]
                  assert len(info[template]) == 1
              return info
          
          
          def get_unique(seqinfos: list[SeqInfo], attr: str) -> Any:
              """Given a list of seqinfos, which must have come from a single study,
              get specific attr, which must be unique across all of the entries
          
              If not -- fail!
          
              """
              values = set(getattr(si, attr) for si in seqinfos)
              if len(values) != 1:
                  raise AssertionError(
                      f"Was expecting a single value for attribute {attr!r} "

I do not know how to go about this or how to debug this.

Any help is highly appreciated.

Thanks.

egor.levchenko · November 1, 2024, 12:22pm

Your data was probably acquired during different MRI sessions. If so, I think you need to convert them separately and save them under different ses folders in a final bids-valid folder. For example, in my study, I had two sessions per participant and I converted each session separately using the code provided below:

for subj_id in $subjects; do
  subj_dir="${project_path}/raw_data/sub-$subj_id/"
  sess_i=1

  # Check if the subject directory exists
  if [ -d "$subj_dir" ]; then
    sess_ids=$(ls "$subj_dir")

    for full_sess_id in $sess_ids; do
      sess_id="${full_sess_id:5}"

      docker run --rm -v ${PWD}:/base nipy/heudiconv:latest \
       -d /base/raw_data/sub-{subject}/sess-${sess_id}/*/*.dcm \
       -o /base/bids_data/ \
       -f /base/analysis/heuristic_sess0"${sess_i}".py \
       -s "$subj_id" -ss "00${sess_i}" \
       -c dcm2niix -b

      sess_i=$((sess_i+1))
   done
  else
    echo "Directory $subj_dir does not exist."
  fi
done

I hope it helps!

jhlegarreta · November 3, 2024, 5:32pm

Thanks for the answer @egor.levchenko.

I applied heudiconv to incremental subsets of the data starting from the first participant until I was able to locate where heudiconv would error telling me that the that the current study session had already been analyzed. I removed that participant and heudiconv run without any apparent errors. Not sure if that is the appropriate strategy, but I cannot think of other ways to make heudiconv proceed.

I had tried cloning the heudiconv source code and debugging but the thing was failing with simple things such as importing the __version__, the queue module shadowing some other system module, etc. so I did not follow that path.

Your data was probably acquired during different MRI sessions.

They were definitely acquired on different days but using the same protocol (in theory), and there is no functional data, so not sure how a “session” would be defined here.

After heudiconv has finished the conversion (after I had removed the allegedly duplicate participant), participants are distributed across 6 folders/study names:

Investigators/MyStudy
Investigators/MyStudyFDNeuro08242021
Investigators/MyStudyFDPain08242021
Investigators/MyStudySZPain
Investigators/MyStudySZPain10820
Investigators/MyStudySZPain8232021

each having its own BIDS structure files (CHANGES, dataset_description.json, participants.json, participants.tsv, README, etc.).

If so, I think you need to convert them separately and save them under different ses folders in a final bids-valid folder.

Thanks for the snippet, but I do not know which participants were acquired in which session; I was given all the DICOM files split by modality/acquisition (T1, DWI_AP, DWI_PA, Spin_Echo_Maps) and each participant has a folder within each where the identifier is an arbitrary one that does not match with the participant ID of the DICOM data.

So unless I am missing something, there is no way for me to distinguish session folders.

jhlegarreta · November 8, 2024, 3:48pm

Not happy with my previous conclusion, I dug into this a little bit more. It turns out that my thinking was wrong: rather than the error arising from duplicate participants, it stems from a single participant: I took the participant at issue and tried heudiconv only on that participant, and it turns out that it fails with essentially the same message:

INFO: Running heudiconv version 1.3.1 latest 1.3.2
INFO: Analyzing 714 dicoms
INFO: Filtering out 0 dicoms based on their filename
INFO: Generated sequence info for 2 studies with 5 entries total
INFO: Processing sequence infos to deduce study/session
INFO: Considering study (ac6b6cc555964ad21e74ec331a03fb3a) specific substitutions
INFO:  protocol_name: 'MEMPRAGE_gr2' -> 'anat-T1w'
INFO:  series_description: 'MEMPRAGE_gr2' -> 'anat-T1w'
INFO: Study session for StudySessionInfo(locator='Investigators/MyStudySZPain8232021', session=None, subject='256979')
INFO: Processing sequence infos to deduce study/session
INFO: Considering study (ac6b6cc555964ad21e74ec331a03fb3a) specific substitutions
INFO:  protocol_name: 'SpinEchoFieldMap_AP' -> 'fmap-epi_dir-ap_acq-dir99_run+'
INFO:  series_description: 'SpinEchoFieldMap_AP' -> 'fmap-epi_dir-ap_acq-dir99_run+'
INFO:  protocol_name: 'SpinEchoFieldMap_PA' -> 'fmap-epi_dir-pa_acq-dir99_run+'
INFO:  series_description: 'SpinEchoFieldMap_PA' -> 'fmap-epi_dir-pa_acq-dir99_run+'
INFO:  protocol_name: 'dMRI_dir99_AP' -> 'dwi_dir-ap_acq-dir99'
INFO:  series_description: 'dMRI_dir99_AP' -> 'dwi_dir-ap_acq-dir99'
INFO:  protocol_name: 'dMRI_dir99_PA' -> 'dwi_dir-pa_acq-dir99'
INFO:  series_description: 'dMRI_dir99_PA' -> 'dwi_dir-pa_acq-dir99'
INFO: Study session for StudySessionInfo(locator='Investigators/MyStudySZPain8232021', session=None, subject='256979')
Traceback (most recent call last):
  File "/opt/miniconda-py39_4.12.0/bin/heudiconv", line 8, in <module>
    sys.exit(main())
  File "/src/heudiconv/heudiconv/cli/run.py", line 30, in main
    workflow(**kwargs)
  File "/src/heudiconv/heudiconv/main.py", line 410, in workflow
    study_sessions = get_study_sessions(
  File "/src/heudiconv/heudiconv/parser.py", line 283, in get_study_sessions
    assert study_session_info not in study_sessions, (
AssertionError: Existing study session StudySessionInfo(locator='Investigators/MyStudySZPain8232021', session=None, subject='256979') already in analyzed sessions dict_keys([StudySessionInfo(locator='Investigators/MyStudySZPain8232021', session=None, subject='256979')])

So the heudiconv protocols2fix dictionary key from my heuristics file being used is the same:

(...)
    "ac6b6cc555964ad21e74ec331a03fb3a":
        [
            ('^MEMPRAGE.*', 'anat-T1w'),
            ('^dMRI_dir99_AP', 'dwi_dir-ap_acq-dir99'),
            ('^dMRI_dir99_PA', 'dwi_dir-pa_acq-dir99'),
            ('SpinEchoFieldMap_AP', r'fmap-epi_dir-ap_acq-dir99_run+'),
            ('SpinEchoFieldMap_PA', r'fmap-epi_dir-pa_acq-dir99_run+'),
        ],
(...)

but the data is not identified all at once for some reason, despite all data (T1, DWI AP/PA and SE field maps) belonging to the same participant.

My original heuristics file is exactly the same as the one in commit a77541c with my protocols2fix dictionary additions and an additional change to remove whitespaces and “^” characters from the study name:

        # Remove all remaining whitespaces and ^
        split = [elem.replace(" ", "").replace("^", "") for elem in split]

in this block.

I have also tried the 2eb529 version with the dictionary addition and got the same error.

The log says that there are 2 studies there, which I do not understand either as the locator being printed is exactly the same.

What am I missing here to get it working?

Thank you.

jhlegarreta · November 8, 2024, 10:27pm

Inspecting the DICOM tags for the participant at issue, I’ve seen that the T1 and DWI data have different values for the StudyID tag.

Can this be related (maybe this is the different sessions meant by Egor)? If yes, can this be fixed using the BIDS heuristic?

For any downstream analysis (e.g. using QSIPrep) I’d need both the T1 and DWI to be picked together, and ideally I’d like the participant to be included with the rest of the participants that belong to the same locator.

Thanks.

jhlegarreta · November 13, 2024, 2:59pm

@yarikoptic any chance to comment on how to fix/work around the above issue?

egor.levchenko · November 22, 2024, 1:56pm

Yes, if data were acquired on different days then it’s different sessions!

I believe you need to process files with different study tags separately. If your T1 and DWI have different values for the StudyID tag, my guess is that it was acquired during different sessions.

I hope it helps!

jhlegarreta · January 17, 2025, 10:18pm

If your T1 and DWI have different values for the StudyID tag, my guess is that it was acquired during different sessions.

After speaking to one of the people that were involved in the acquisition, that is what happened.

I ended up by discarding the participant. Poor solution, I know, but had other more worrying issues. Thanks anyways for all your insightful explanations and help @egor.levchenko, and sorry for my late, late reply.