Behavior Metadata without (tsv) Event Data Related to a Neuroimaging Data

What is the best way to store hierarchical behavior data related to neuroimaging data that doesn’t have any accompanying events?

In a specific case, I have some resting state EEG data (that doesn’t need a tsv because there are no events, it’s resting data) but I have metadata that can only be stored hierarchically; in this case it’s survey data broken down into subgroups and medication data (med-> when was is last given, what was the dose).

Currently I have it stored as a json sidecar file to the sub-{sub}/beh/{basename}_events.tsv with the tsv file empty (with one dummy variable). This seems very hacky (although it passes the validator) and so I suppose there is a better structure with which to store these data.

Is this section about phenotypic data in the bids spec relevant at all to you?

If not, can you perhaps provide a mock-example of your folder structure?

Something like this, using the tree tool:

foo/
└── bar
    └── no.txt

that might help to better understand your problem :slight_smile:

Perfect! I had never seen the phenotype directory used before.

Perhaps it would be useful to have this directory in some of the example datasets…

Hmmm now that I’m actually trying to use it, it looks like it has to be under root/phenotype and the issue is that for each subject there are different numbers of medications/questionnaires given for instance here are two different tsv files that could work but would need to be in a specific beh/phenotype directory to that EEG data

        Dose (mg)       times/day       hours since meds
Lev     25/100  3       3
Ras     1       1       11.5


        Dose (mg)       times/day       hours since meds
Sel     5       2       2.5
Pr      1.5     3       2.5

Perhaps it would be useful to have this directory in some of the example datasets…

yes, the phenotype directory is not very well advertised, I agree. See also this issue and the PR linked therein. Further improvements in the form of a PR are welcome.

this is one example: https://openneuro.org/datasets/ds000030/versions/1.0.0

the issue is that for each subject there are different numbers of medications/questionnaires given

mmmh, I see. Why would it not work if each participant is a row and you make a column for each medication?

I assume that Lev, Ras, Sel, Pr are some kinds of medication … because that’s what they seem to me from your example.

Example: /phenotype/myfile.tsv

participant_id	dose_lev	dose_ras	dose_sel	dose_pr
sub-01	0.25	1	n/a	n/a
sub-02	n/a	n/a	5	1.5

and then in the accompanying /phenotype/myfile.json you put something like:

{
    "dose_lev": {
        "Description": "Dosage of Lev",
        "Units": "mg"
    }
    ...
}

Thanks for the suggestion, I like the idea but I think with dose, times/day and time since dose all paired with each of the ~10 medications in the dataset I’m working with, 30 columns becomes rather inelegant. Not to mention the ambiguity between missing data (which exists or rather does not exist) and medications not being prescribed.

I can’t pass the validator and post the data on OpenNeuro until https://github.com/bids-standard/bids-validator/pull/946 is resolved but the beh json without a matching tsv works for now and I can post an issue in the BIDS specification and we can discuss whether there is a better way to store this information. I think it’s hard to figure out what’s best without seeing an example.

1 Like

Ok it’s up! https://openneuro.org/datasets/ds002778/versions/1.0.0

1 Like

it’s interesting that you got a participants.json.swp file in there and the validator didn’t tell you to remove it :thinking:

cc @rwblair

1 Like

What can I say, I have a talent :wink:

Thanks for noticing that, I’ll take it off.

I just checked and the validator didn’t warn you because it’s a hidden file: .participants.json.swp

I think that’s an issue …

see here: https://github.com/bids-standard/bids-validator/issues/951

I could try to fix that but it would take someone else probably much less time.

@sappelhoff any ideas about how best to store the data that’s in the sub-{sub}/beh/…_beh.json file?

I don’t have new ideas beyond the ones I mentioned before (e.g., in Behavior Metadata without (tsv) Event Data Related to a Neuroimaging Data)

But how you do it currently is not very self-explanatory, I think:

image

This JSON accompanies an “empty” TSV.

This is how you currently do it.

On the other hand, there is “my method” that I outlined above. I know that my attempt would involve many columns and a “sparse coding” (many n/as), but at least it could be documented well with the TSV + JSON combo :slight_smile:

but I don’t know how it’d be best to be honest :grimacing: perhaps someone else can help?

Oh gosh you found a typo in my data entry. That subject’s meds are messed up bc I missed a space in comma separated values. The other ones should make more sense.

Thanks for catching that, all fixed. Now it should be more intuitive/the json files should make more sense.

@sappelhoff, anyone you know that might have a minute to weigh in on a suggestion? @ChrisGorgolewski maybe? (sorry to bother)

@robert may have dealt with this problem. Maybe he can spend a few minutes to read about your problem and suggest a solution.

1 Like

Hi @alexrockhill

Perhaps I can clarify w.r.t to OpenNeuro. The file was able to sneak through our validator for the reason @sappelhoff mentioned - we are working to resolve this bug with hidden folders. Your dataset on OpenNeuro looks good! Perhaps there was another concern?

Thank you,
Franklin

Hey @franklin ,

There was a concern that I have metadata is hierarchal in nature (meds where for each med there is a dose, last time taken and number of times taken per day) that I have put in a json sidecar to an beh/events.tsv file that is empty. The sidecar to my knowledge should traditionally be used to describe the data file (in this case an events tsv) so this is a really hacky way to store this data. I wanted to start a thread on whether there was a better way to store this type of data. See for instance:

{
    "meds": {
        "Lev": {
            "Dose (mg)": "25/100",
            "hours since meds": "3",
            "times/day": "3"
        },
        "Ras": {
            "Dose (mg)": "1",
            "hours since meds": "11.5",
            "times/day": "1"
        }
    },
    "questionairres": {
        "Beck": "n/a",
        "Brady kinesia UPDRS": 8,
        "H&Y": 2,
        "Left UPDRS": 4,
        "Rest Tremor UPDRS": 2,
        "Right UPDRS": 10,
        "Rigidity UPDRS": 4,
        "Total UPDRS": 20,
        "UPDRS 18-26": 17
    }
}

where there are really too many meds in the entire dataset for a tsv med+dose, med+time since last dose, med+number of doses a day structure that would end up being 90% 'n/a’s because most patients don’t take all the meds that exist within the group. That was a good suggestion by Stefan but I think it becomes not human readable. The question is if there is a better way.

Thanks,
Alex

Hey @alexrockhill

Ahh I see thank you for clarifying! Perhaps my opinion - it seems more challenging to ensure the subject level bookkeeping and maintenance rather than one phenotype file to maintain (very much echoing @sappelhoff 's suggestion). It’ll be pretty sparse but it may also be easier to programmatically easier to access too. The additional information you mentioned can be described in the sidecar json.

I would be hesitant to further specify these files (may be BIDS 2.0?).

Thank you,
Franklin