FMRIPREP JSONDecoding Error in dataset_description.json

Summary of what happened:

Hi all,

Running fmriprep through a singularity container, with the exact commands that we have run before, no changes as far as I am aware. Failure occurs very early on within json decoder, triggering on line 1 column 1 (char 0) of dataset_description.json.

Command used (and if a helper script was used, a link to the helper script or the command generated):

The following bash script is getting called through our SLURM scheduler:

#!/bin/bash

# usage: bash fmriprep.sh /path/to/BIDS/dataset sub-12345

ml singularity 
ml freesurfer

export FREESURFER_HOME="/export/freesurfer/freesurfer-7.2.0/"
export TEMPLATEFLOWHOME="/path/to/tlbx/templateflow"
export SINGULARITYENV_TEMPLATEFLOW_HOME="/templateflow"

unset PYTHONPATH; singularity run --cleanenv \
    --no-home \
    --home $1 \
    -B $1:/data \
    -B $FREESURFER_HOME \
    -B $TEMPLATEFLOWHOME:/templateflow \
    -B /path/to/tmp/fmriprep_work:/work \
    /path/to/tlbx/singularity/fmriprep-22.0.2.simg \
    /data /data/derivatives/fmriprep \
    participant --participant-label $1 \
    -w /work/"${2}" \
    --fs-license-file $FREESURFER_HOME/.license \
    --skip_bids_validation \
    --use-aroma \
    --nthreads 8 \
    --mem-mb 60000 \
    --output-spaces MNIPediatricAsym:cohort-2:res-2

Version:

fMRIPrep-22.0.2

Environment (Docker, Singularity, custom installation):

Singularity
We are using an SBATCH script to call fMRIPrep through a Singularity container. Add’tl note, some workarounds were used in the past to allow the use of the MNIPediatric Template / use TemplateFlow with Singularity.

Data formatted according to a validatable standard? Please provide the output of the validator:

I don’t have access to the BIDS validator output as a .txt file (no server access for me anymore, I’m retired), and am reluctant to share a colleague’s screenshots. There are only warnings related to some subjects have completed movieA vs. movieB, etc.

Relevant log outputs (up to 20 lines):

Process Process-2:
Traceback (most recent call last):
  File "/opt/conda/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/opt/conda/lib/python3.9/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/conda/lib/python3.9/site-packages/fmriprep/cli/workflow.py", line 68, in build_workflow
    msg = check_pipeline_version(version, fmriprep_dir / "dataset_description.json")
  File "/opt/conda/lib/python3.9/site-packages/niworkflows/utils/bids.py", line 471, in check_pipeline_version
    desc = json.loads(data_desc.read_text())
  File "/opt/conda/lib/python3.9/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
  File "/opt/conda/lib/python3.9/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/opt/conda/lib/python3.9/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Screenshots / relevant information:

Seems that the dataset_description.json file is causing a problem, at line 1 column 1 on top of that… We ran through a JSON validator, and it was validated just fine. We ran iconv -f utf-8 -t utf-8 -c FILE.txt -o NEW_FILE to ensure that any non utf-8 characters are being stripped, since we had seen on this stack overflow thread that there may be some sneaky characters hiding at the start of the file, but this did not result in any change.

We are able to load the dataset_description.json file in a python3 shell using the json library, with the following syntax:

import json 

with open("dataset_description.json", 'r') as j:
    data = json.loads(j.read())

Any help is appreciated…

I am actually no longer working for this lab and in industry currently, but I’m trying to lend a hand to my comrades over in the lab to help them get through preprocessing, and this error is throwing me for a loop, so I know they will struggle without assistance. I may be slow to respond on that account, but I’ll try to get them to follow this thread and attempt any suggestions, etc.

To be clear, no funny whitespace is seen in the .json file, everything is validated, all strings are occupied (not empty). We are at a loss here. I know you will all be tempted to ask for the dataset_description.json file itself, but as there’s identifying information, I’m reluctant to share it. I trust that it is at least visually correct, we ran it through multiple JSON validators without error. If there is a hidden character or the encoding is incorrect, it does not display anywhere. We saved as ‘UTF-8’ in Matlab, and used iconv to ensure the absence of any non-UTF-8 characters. Not sure how to proceed.

Thanks,
Clayton

Hi @claytonjschneider.

I have retagged your post as software support and added in the corresponding question template. As you can see, there is some information missing that would help us address your issue. Please fill in this information (you can edit your post) so we can help.

Best,
Steven

1 Like

Thanks for doing that @Steven, I’ve updated with the additional info.

Thanks. Can you try just creating a blank dataset_description.json?

1 Like

Yep, I’ll have them run rm dataset_description.json, touch dataset_description.json and give FMRIPREP another whirl.

They don’t need to delete the dataset description, especially if there’s important info, but they can temporarily put it somewhere else to see if the JSON really is the problem.

I’d had them make a backup of it while we were doing tests and the inconv stripping function anyhow.

@Steven Creating an empty dataset_description.json caught a different error.

bids.exceptions.BIDSValidationError: 'dataset_description.json' is not a valid json file. There is likely a typo in your 'dataset_description.json'.
Example contents of 'dataset_description.json':
{"Name": "Example dataset", "BIDSVersion": "1.0.2"}

Sorry I should have been clearer, blank as an empty description with the valid fields. Put the following in your dataset description:

{
    "Name": "",
    "BIDSVersion": "1.2.1",
    "License": "",
    "Authors": [""
    ],
    "Acknowledgments": "",
    "HowToAcknowledge": "",
    "Funding": [
        ""
    ],
    "ReferencesAndLinks": [
        ""
    ],
    "DatasetDOI": ""
}

I wondered if that’s what you meant - I should have been clear in my original post that we had tried this previously, and still got the same JSON error on line 1 character 1.

Hmm that is strange. My next guess would be perhaps a copy/paste error where there are some different text encodings between source and destination (e.g., curly vs straight quotes). I’ll try sending you a dataset_description file, and instruct the users to download it and not make any modifications.

Probably a byte-order mark, which is invalid JSON. If you’re creating files with Microsoft notepad, it will add a BOM. If you run dos2unix, that will fix it.

As far as I know, all changes to the file have been made in Vim and MATLAB running on Linux, CentOS I believe. But I will pass this along anyhow. Thank you

If you’re using Linux, then running the file command could be a quick diagnostic. With the BOM, it won’t be recognized as JSON:

x.json:              ASCII text
y.json:              JSON data

I would ask the people with the file to post it here so someone can look at it. I’m not sure how much more help a game of telephone will be.

Good afternoon! My name is Khalil and I’m the individual working in the lab specified above that is experiencing the issue with the JSON file. When I ran the file command it tagged the file as ASCII text. The current copy of the JSON file we are using is below. There should be 3 spaces between the curly brackets and the contents of the file, it just didn’t translate correctly. And it seems as a new user I cannot send it as an attachment currently. Thanks!

{
    "Name": "",
    "BIDSVersion": "1.2.1",
    "License": "",
    "Authors": [""
    ],
    "Acknowledgments": "",
    "HowToAcknowledge": "",
    "Funding": [
        ""
    ],
    "ReferencesAndLinks": [
        ""
    ],
    "DatasetDOI": ""
}

Feel free to email me the attachment at this username @ gmail.com.

Sounds good I am preparing the email now, thanks!

Hi @khalilt, when I look at the files you sent, both appear as JSON files to me:

$ file dataset-description.*
dataset-description.json: JSON data
dataset-description.txt:  JSON data

And loading with Python:

In [1]: import json

In [2]: with open('dataset-description.json') as fobj:
   ...:     dd = json.load(fobj)
   ...: 

In [3]: dd
Out[3]: 
{'Name': '',
 'BIDSVersion': '1.2.1',
 'License': '',
 'Authors': [''],
 'Acknowledgments': '',
 'HowToAcknowledge': '',
 'Funding': [''],
 'ReferencesAndLinks': [''],
 'DatasetDOI': ''}

The main thing I can see is that what you sent has a hyphen (-) instead of an underscore (_). Is it possible that you still have a bad dataset_description.json file?

Hey @effigies !

Sorry about that, the files do in fact contain an underscore within our hard drive here so that shouldn’t be an issue. What you obtained is very interesting however since I was told it was still in ASCII format when I tried the file command on my local terminal. Ok so if it is actually being recognized as a JSON this makes the issue even stranger. However, the new imaging engineer we hired just received word from one of the individuals that manages the remote server we do preprocessing and analysis on, and says that there some metadata from the server itself may have been corrupted and interfering with the ability of the software to recognize the JSON. Is this something that you have ever experienced?

Khalil

@effigies I’m going to email you another copy of the JSON, this one will be directly from the local drive.