Problem assigning sub / ses to DICOMDIR layout dataset

Hello,

I have a dataset in the DICOMDIR layout:
sourcedata
└── patient-01
├── DICOM
│ └── 0000AC37
│ ├── AA140CCD
│ │ └── AABD7737
│ │ ├── 000009F8
│ │ ├── 00003FCD
│ │ ├── 000053FD
│ │ ├── 000066BD
│ │ ├── 00006949
│ │ ├── 000071EF
│ │ ├── 00008532
│ │ ├── 00008629
│ │ ├── 00008D55
│ │ ├── 000098FB
│ │ ├── 0000AFC0
│ │ ├── 0000D518
│ │ └── 0000FD10
│ ├── AA3268B8
│ │ └── AAFA7B6D
│ │ ├── 00001A46
│ │ ├── 00001C04
│ │ ├── 00002FF4
│ │ ├── 00003036
│ │ ├── 000051BB
│ │ ├── 00005AF1
│ │ ├── 00007033
│ │ ├── 00008ED6
│ │ ├── 0000D787
│ │ ├── 0000DD61
│ │ ├── 0000DF02
│ │ ├── 0000E626
│ │ └── 0000EDD7
│ ├── AA5C483A
│ │ └── AA5BB5F3
│ │ ├── 00000459
│ │ ├── 000017A8
│ │ ├── 00001ACB
│ │ ├── 000020CA
│ │ ├── 00003723
│ │ ├── 00003A72
│ │ ├── 00003A91
│ │ ├── 00003E18
│ │ ├── 00003FB1
│ │ ├── 000044E0
│ │ ├── 00004638
│ │ ├── 000050C5
│ │ ├── 000059BA
│ │ ├── 00005B99
│ │ ├── 00006DD6
│ │ ├── 0000770A
│ │ ├── 0000794E
│ │ ├── 000079EC
│ │ ├── 00009751
│ │ ├── 00009842
│ │ ├── 000098FA
│ │ ├── 00009C61
│ │ ├── 0000A3A7
│ │ ├── 0000ADCA
│ │ ├── 0000B2BC
│ │ ├── 0000D325
│ │ └── 0000F288
│ ├── AA5C6C3B
│ │ └── AA2EF4AD
│ │ ├── 00003E6D
│ │ ├── 000048D9
│ │ ├── 000074A8
│ │ ├── 0000874F
│ │ ├── 000096C4
│ │ ├── 00009A36
│ │ ├── 00009B42
│ │ ├── 0000A2CC
│ │ ├── 0000AA55
│ │ ├── 0000AEB1
│ │ ├── 0000B3E3
│ │ ├── 0000C5A0
│ │ ├── 0000C612
│ │ ├── 0000EA33
│ │ └── 0000EF26
│ ├── AA740734
│ │ └── AAB9BE77
│ │ ├── 0000084E
│ │ ├── 00002BB4
│ │ ├── 0000489F
│ │ ├── 000053F8
│ │ ├── 000060D5
│ │ ├── 00007A3B
│ │ ├── 00008CF2
│ │ ├── 0000FEEC
│ │ └── 0000FF54
│ └── AAD32921
│ └── AA448862
│ ├── 00002009
│ ├── 00003120
│ ├── 00004F36
│ ├── 000050D8
│ ├── 0000730C
│ ├── 000095FB
│ └── 0000EB19
└── DICOMDIR

I am struggling to organise my data into a sub- and ses- hierarchy.
I first used bidsmapper in the following way:
bidsmapper sourcedata BIDS -n ‘’ -m '
This gives a few errors including:
ERROR | Missing ‘SeriesDescription’ DICOM field specified in the ‘{SeriesNumber:03d}-{SeriesDescription}’ folder/naming scheme, cannot find a safe name for: …
ERROR | Cannot create subfolders, aborting dicomsort()…
WARNING | Not all -files in … have the same size. This may be OK but can also be indicative of a truncated acquisition or file corruption(s)
WARNING | /var/folders//… not found

In the bidseditor GUI, participant data was set as:
Participant_id: <filepath:/sourcedata/(.*?)/>
Session_id: <filepath:/sourcedata/.*?/(.*?)/>
Giving rise to samples, for example, as:
anat/sub-patient01_ses-002HeadSpiral10AxHr383_acq-HeadSpiral10AxHr383_ct.*
anat/sub-patient01_ses-DICOM_acq-DEPPDEHeadAngio10Qr403BSn150kV_ct.*
I changed the participant data according to the bidscoin documentation tip:
<filepath:/sourcedata/patient(.*?)/>
Which now shows the sub correctly: anat/sub-01_ses-DICOM_acq-DEHeadAngio10Hr383F08_ct.*
However, the samples do not show a session number. In fact a few have a session number but most do not and seem to have ‘DICOM’ assigned as session, e.g.
has session number: anat/sub-01_ses-002HeadSpiral10AxHr383_acq-HeadSpiral10AxHr383_ct.*
does not have session number: anat/sub-patient01_ses-DICOM_acq-DEPPDEHeadAngio10Qr403BSn150kV_ct.*

What is the reason for this? I have also tried to use dicomsort to no avail.

Thank you!

First, to get things clear for me, is the DICOM inside your patient-01 directory or is it next to it? It seems to be the latter, which I presume is going to make it difficult to add more patients? I would put the DICOM folder inside the patient-01 folder and use

bidsmapper sourcedata BIDS -s -n 'patient-' -m '*'

If the bidsmapper finds a DICOMDIR file (as in your case), it re-organizes the data by running dicomsort using default values. These default values use the SeriesDecsription field in the DICOM header to create new names/subfolders. That field does not exist in your case, presumably because your data has been anonymized, right? Your errors in the GUI will be solved using the right prefixes (as shown above).

To run dicomsort yourself, you need to come up with a naming scheme using the DICOM fields that have not been erased by your anonymization software. Please let me know if you didn’t anonymize your data

p.s. if you don’t want to have a session folder, simply delete the session pattern altogether

Dear Marcel, thanks for the speedy reply!

The hierarchy appears misformatted once pasted into this reply box - I can confirm that the DICOM folder and the DICOMDIR file are both within the patient-01 folder.

My dataset has been anonymised - however, I’ve checked the DICOM headers, and the SeriesDescription field does still exist so it has not been deleted, e.g. t2_tse_cor_448.

I’ve used the prefixes as you’ve suggested: bidsmapper sourcedata BIDS -s -n ‘patient-’ -m '', and it assigns the subject correctly, however, it still does not assign sessions, e.g. dwi/sub-01_ses-DICOM_acq-DTI2x2x22Multishell20ch2020negpefftscale_sbref. It appears to be incorrectly assigning ‘DICOM’ (maybe from the folder hierarchy) as the session ID - is that right?

For patient-01, there are 6 sessions, so I would like to be able to retain a session folder.

DICOMs typically have file names that are hard to decipher. If you want to get a better idea for a DICOM dataset you may want to rename them. dcm2niix provides a renaming argument (-r y) that renames rather than converts the images. You can provide names using the dcm2niix filenaming options. For example, to create folderss of nested session data time (%t), series number (%s) with protocol name (%p), instance number (%r) and and mediaObjectInstanceUID (%o`) you could use:

dcm2niix -r y -f %t/%s_%p/%4r_%o.dcm /path/to/DICOMs

If you are sure that your data does not come from a Siemens scanner, you could omit the %o - it is there to prevent name clashes because Siemens instance numbers are not unique.

Other useful tags that might help sort data but also reveal identifying features would be the name of patient (%n from 0010,0010) and ID of patient (%i from 0010,0020).

I’ve seen this before and then the SeriesDescription was present, but not for all sequences. In the error message that you copied over you left out the file in which the SeriesDescription field was missing. Perhaps you can check the header of that file?

Sure, that should be possible. Is the session info stored in the DICOMDIR or where is this information available?

I’ve checked the headers of the files that are missing SeriesDescription. These appear to be Structured Report Objects - I believe radiology reports. They only have 4 dicom tags.

I can see that there are 6 sessions (or ‘studies’) when I load the raw files in a dicom viewer so I assume that there should 6 sessions, ses-01, ses-02, etc

Ah, that’s good to know, I’ve never encountered those, dicomsort should just skip those files. Can you perhaps share those 4 tags/values with me? As for your session prefix, perhaps you can tell me how the output structure looks when you run:

cp -r patient-01 tmp_data
rm tmp_data/your_report_files_here        # I hope these are only a few files?
dicomsort tmp_data
tree tmp_data

Hi, thank you for your advice.
I have renamed my dataset as you have suggested, and now the session folders are with the date/time. How do I translate this to creating a BIDS directory, i.e. sub-01/ses-01, etc? Because the folders do not follow an ascending numerical order - I cannot seem to edit the SessionID property correctly.

The 4 tags and values are:
SeriesDate, SeriesTime, SeriesInstanceUID, SeriesNumber - with their numerical values.

I’ve run the code above (after already having run your bidsmapper code). It generates:

  • rm: your_report_files_here: No such file or directory
  • Then error for missing SeriesDescription for 5 files.
  • ERROR | Cannot create subfolders, aborting dicomsort()…

It then gives a very long tree - how should I upload the output tree structure here?

This is just the first part that I’ve copied:
tmp_data
├── Anonymous
│ ├── 01-
│ ├── 02-
│ ├── 03-
│ ├── 04-
│ │ └── 501-Dose Report
│ │ └── FFE86D52
│ ├── 05-
│ └── 06-
│ ├── 001-Topogram 1.0 Tr20
│ │ └── EE13C29D
│ ├── 002-Head Spiral 1.0 Ax Hr38 3

The last part leads to DICOMDIR

Ah, that helps. You still got that error because you didn’t remove all report files I suppose? Anyhow, that’s not the issue here. The default scheme for DICOMDIR sorting is: {PatientName}/{n:02}-{StudyDescription}, which in your case is problematic because PatientName is replaced with a fixed “Anonymous” string. The session folders are ok (although the description is missing of course), I guess. I’ll see if I can do something smarter in the code, but what you could try in the meantime is to remove the DICOMDIR file (next to the report files with the 4 tags) and then run dicomsort. Let me know what comes out (tree …)

Hi Marcel, I don’t quite understand which DICOMDIR file (next to the report files with the 4 tags) you’re referring to? I can see only one DICOMDIR in sourcedata/tmp_data.
If I delete the DICOMDIR file in sourcedata/tmp_data, and I run dicomsort tmp_data, there is no output.

Ah, ok, try this (that’s because your DICOM files have no file dcm/IMA file extension):

dicomsort tmp_data -p '.*'

This is what I get:
INFO | >> Sorting: tmp_data (6504 files)
WARNING | tmp_data/.DS_Store is not a DICOM file, cannot read SeriesNumber
WARNING | tmp_data/.DS_Store is not a DICOM file, cannot read SeriesInstanceUID
Traceback (most recent call last):
File “/opt/anaconda3/envs/bidscoin/bin/dicomsort”, line 8, in
sys.exit(main())
~~~~^^
File “/opt/anaconda3/envs/bidscoin/lib/python3.13/site-packages/bidscoin/utilities/dicomsort.py”, line 219, in main
raise error
File “/opt/anaconda3/envs/bidscoin/lib/python3.13/site-packages/bidscoin/utilities/dicomsort.py”, line 215, in main
sortsessions(**vars(args))
~~~~~~~~~~~~^^^^^^^^^^^^^^
File “/opt/anaconda3/envs/bidscoin/lib/python3.13/site-packages/bidscoin/utilities/dicomsort.py”, line 197, in sortsessions
sortsession(sourcefolder, dicomfiles, folderscheme, namescheme, force, dryrun)
~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/opt/anaconda3/envs/bidscoin/lib/python3.13/site-packages/bidscoin/utilities/dicomsort.py”, line 112, in sortsession
subfolder = construct_name(folderscheme, dicomfile, force)
File “/opt/anaconda3/envs/bidscoin/lib/python3.13/site-packages/bidscoin/utilities/dicomsort.py”, line 41, in construct_name
value = int(value.replace(‘.’,‘’)) # Convert the SeriesInstanceUID to an int
ValueError: invalid literal for int() with base 10: ‘’

Mhhh, so SeriesNumber/SeriesInstanceUID is also anonymized I guess. I’ll at least make sure dicomsort will never crash on that in the future

I am inspecting the DICOM tags, and the SeriesInstanceUID and SeriesNumber are present - they both have values in them.

tmp_data/.DS_Store is not a DICOM file, cannot read SeriesNumber

I see that your Mac has dropped .DS_Store files in your datafolder, these files are now causing dicomsort to crash

Normally this is not a problem because only .dcm and .IMA files are considered, but in your extensionless case all files are considered (with the -p option). And then you have a Mac that combination is the root of the problem