Behavior Metadata without (tsv) Event Data Related to a Neuroimaging Data

alexrockhill · April 27, 2020, 6:20pm

Thanks for the suggestion, I like the idea but I think with dose, times/day and time since dose all paired with each of the ~10 medications in the dataset I’m working with, 30 columns becomes rather inelegant. Not to mention the ambiguity between missing data (which exists or rather does not exist) and medications not being prescribed.

alexrockhill · April 27, 2020, 6:49pm

I can’t pass the validator and post the data on OpenNeuro until https://github.com/bids-standard/bids-validator/pull/946 is resolved but the beh json without a matching tsv works for now and I can post an issue in the BIDS specification and we can discuss whether there is a better way to store this information. I think it’s hard to figure out what’s best without seeing an example.

alexrockhill · May 6, 2020, 5:56am

Ok it’s up! https://openneuro.org/datasets/ds002778/versions/1.0.0

sappelhoff · May 6, 2020, 7:56am

it’s interesting that you got a participants.json.swp file in there and the validator didn’t tell you to remove it

cc @rwblair

alexrockhill · May 6, 2020, 8:16am

What can I say, I have a talent

Thanks for noticing that, I’ll take it off.

sappelhoff · May 7, 2020, 8:37am

I just checked and the validator didn’t warn you because it’s a hidden file: .participants.json.swp

I think that’s an issue …

see here: https://github.com/bids-standard/bids-validator/issues/951

alexrockhill · May 7, 2020, 4:01pm

I could try to fix that but it would take someone else probably much less time.

@sappelhoff any ideas about how best to store the data that’s in the sub-{sub}/beh/…_beh.json file?

sappelhoff · May 7, 2020, 4:33pm

I don’t have new ideas beyond the ones I mentioned before (e.g., in Behavior Metadata without (tsv) Event Data Related to a Neuroimaging Data)

But how you do it currently is not very self-explanatory, I think:

This JSON accompanies an “empty” TSV.

This is how you currently do it.

On the other hand, there is “my method” that I outlined above. I know that my attempt would involve many columns and a “sparse coding” (many n/as), but at least it could be documented well with the TSV + JSON combo

but I don’t know how it’d be best to be honest perhaps someone else can help?

alexrockhill · May 7, 2020, 10:39pm

Oh gosh you found a typo in my data entry. That subject’s meds are messed up bc I missed a space in comma separated values. The other ones should make more sense.

alexrockhill · May 8, 2020, 12:13am

Thanks for catching that, all fixed. Now it should be more intuitive/the json files should make more sense.

alexrockhill · May 12, 2020, 4:59pm

@sappelhoff, anyone you know that might have a minute to weigh in on a suggestion? @ChrisGorgolewski maybe? (sorry to bother)

sappelhoff · May 12, 2020, 5:49pm

@robert may have dealt with this problem. Maybe he can spend a few minutes to read about your problem and suggest a solution.

franklin · May 12, 2020, 6:12pm

Hi @alexrockhill

Perhaps I can clarify w.r.t to OpenNeuro. The file was able to sneak through our validator for the reason @sappelhoff mentioned - we are working to resolve this bug with hidden folders. Your dataset on OpenNeuro looks good! Perhaps there was another concern?

Thank you,
Franklin

alexrockhill · May 12, 2020, 6:35pm

Hey @franklin ,

There was a concern that I have metadata is hierarchal in nature (meds where for each med there is a dose, last time taken and number of times taken per day) that I have put in a json sidecar to an beh/events.tsv file that is empty. The sidecar to my knowledge should traditionally be used to describe the data file (in this case an events tsv) so this is a really hacky way to store this data. I wanted to start a thread on whether there was a better way to store this type of data. See for instance:

{
    "meds": {
        "Lev": {
            "Dose (mg)": "25/100",
            "hours since meds": "3",
            "times/day": "3"
        },
        "Ras": {
            "Dose (mg)": "1",
            "hours since meds": "11.5",
            "times/day": "1"
        }
    },
    "questionairres": {
        "Beck": "n/a",
        "Brady kinesia UPDRS": 8,
        "H&Y": 2,
        "Left UPDRS": 4,
        "Rest Tremor UPDRS": 2,
        "Right UPDRS": 10,
        "Rigidity UPDRS": 4,
        "Total UPDRS": 20,
        "UPDRS 18-26": 17
    }
}

where there are really too many meds in the entire dataset for a tsv med+dose, med+time since last dose, med+number of doses a day structure that would end up being 90% 'n/a’s because most patients don’t take all the meds that exist within the group. That was a good suggestion by Stefan but I think it becomes not human readable. The question is if there is a better way.

Thanks,
Alex

franklin · May 12, 2020, 9:23pm

Hey @alexrockhill

Ahh I see thank you for clarifying! Perhaps my opinion - it seems more challenging to ensure the subject level bookkeeping and maintenance rather than one phenotype file to maintain (very much echoing @sappelhoff 's suggestion). It’ll be pretty sparse but it may also be easier to programmatically easier to access too. The additional information you mentioned can be described in the sidecar json.

I would be hesitant to further specify these files (may be BIDS 2.0?).

Thank you,
Franklin

alexrockhill · May 12, 2020, 9:27pm

Hmm, they also have values for each of these medication fields specific to the session: i.e. Parkinson’s disease patients on and off medications so that doubles the amount of fields but also, more importantly, makes it ambiguous which session the phenotype data applies to. I don’t think that is a solution that works in this instance unfortunately.

franklin · May 12, 2020, 9:42pm

Would it work to split along these two groups (on and off meds within your sample)? In this case there will be 2 phenotype files. Is there further hierarchy built into your design? Within say on meds group Though it appears your dataset has 1 session? (ses-hc?)

I may be missing something? It may be worth taking a step back and clarifying your experimental design w.r.t. how the groups were designed. This will assist in thinking through how to build in and maintain this hierarchy and ease reusability of your dataset

alexrockhill · May 12, 2020, 9:45pm

I’m not sure what you mean. The on meds and off meds groups are the same subjects recorded on two sessions: on med and off meds. I’m not sure that splitting those up would make sense.

The healthy controls (hc) don’t have meds so they only do one session which I called hc to disambiguate and head off any misunderstandings.

franklin · May 12, 2020, 10:44pm

That’s was what I interested in - insights into how this experiment was designed

If I am interpreting correctly (the behavioral portion) -
You have 2 groups: PD and Controls
Controls had 1 session
PD had 2 sessions: on and off meds
Within a session there was behavioral testing

Please correct me if I misunderstood

In this case, to account for the two different sessions and address the potential ambiguity between the 2 sessions it seems reasonable to split. They will be in the same phenotype folder with the filename clearly differentiating these two sessions. You’ll be using the same subject identifiers across the two files. This would also help remedy the sparse matrix concern (still sparse but not as much as if they were pooled together). Controls will also have their own file. The human readability of this data is derived in the sidecar json. Downstream scripts can extract and collect the pertinent data for modeling.

This may come down to preference and style for representing your dataset. Perhaps others in our community have addressed this coding previously

alexrockhill · May 13, 2020, 12:09am

Ok, great, I think we’re on the same page. My concern is that I don’t see how phenotype can be linked specifically to sessions from the documentation here: https://bids-specification.readthedocs.io/en/latest/03-modality-agnostic-files.html#phenotypic-and-assessment-data. You could include ‘ses-{on|off|hc}’ in the name but my concern is that this is unstructured/not in the specification so it seems to me likely to cause ambiguity. What I like about the way it is structured now is that the data only relevant to the session of interest is in that particular session directory. In my interpretation, that is a general principle of BIDS: data/metadata ideally goes in the most upstream directory in which it applies to everything downstream. For example, it wouldn’t make sense to me to have a bids_root/task-rest_ses-on-beh.json file because it doesn’t apply to other sessions which are downstream of the root directory, in that case it would be better to have a sidecar for each behavior file in the session folder it applies to even if it’s a bit redundant for clarity. Maybe that’s not essential, but I think it’s better to err on the side of putting metadata in the same directory/paired behavior structure as the data rather than putting everything upstream and relying on someone to sort out which downstream structures it applies to.