Place for stimulus lists used during the experiment

kalenkovich · June 9, 2021, 3:07pm

Hi there!

In my experiment, I had a CSV file with a stimulus list prepared for each participant before the experiment started. I then used PsychoPy which loaded one of the CSV’s and ran the experiment. I am confused as to where I should put those CSV’s.

I put the PsychoPy output files under e.g. sub-a68d5xp5/beh/sub-a68d5xp5_task-listeningToSpeech_beh.csv and wanted to put the stimulus list alongside it but didn’t know what suffix to use. And it does not really belong there because it is not the recorded behavioral output, it is the input.

It kind of makes sense to put it under “stimuli” but it is not really a stimulus file that I can connect via the stim_file column in events.tsv unless I put its name for every event.

Could someone please advise me as to where I can put these files?

rwblair · June 9, 2021, 5:15pm

Depending on what users of the dataset will do with the stimulus list it might be appropriate to put them in code, we could think of them as configuration files for PsychoPy programs that were used to generate the data:
https://bids-specification.readthedocs.io/en/stable/03-modality-agnostic-files.html#code

kalenkovich · June 10, 2021, 9:05am

Thanks, @rwblair, that makes sense! If someone wanted to re-run the experiment, they would go in “code” and find the necessary files.

The only problem is that the same files are used during analysis because they contain information that is not present in the PsychoPy output files (correct answers to two control questions at the end of each trial). Would you still put the files in “code” in this case?

kalenkovich · June 10, 2021, 9:59am

BTW, do you think it would be terribly confusing if I put them in the “beh” folder after all and use acq to differentiate the files? Something like this:

sub-1/beh/sub-1_task-memory_acq-PsychoPy_beh.tsv
sub-1/beh/sub-1_task-memory_acq-stimulusList_beh.tsv

rwblair · June 10, 2021, 8:37pm

Since it is needed in the analysis this makes things harder. The spec may need to be updated to work comfortably here. Acq, as is, isn’t quite appropriate:

The OPTIONAL acq-<label> key/value pair corresponds to a custom label to distinguish different conditions present during multiple runs of the same task. For example, if a study includes runs of an n-back task, with deep brain stimulation turned on or off, the data files may be labelled sub-01_task-nback_acq-dbson_beh.tsv and sub-01_task-nback_acq-dbsoff_beh.tsv.

That being said the validator won’t complain if you do use Acq for this purpose, and if its well documented for the dataset users then it may be ok. (Never let an ambiguous rule get in the way of a good dataset)

I’m curious what the shape of the data in the results and simulus list files is. I’m not too familar with psychopy, are there any examples similar to what you have online I could look at?

As a dataset creator and user what would the ideal solution for you look like to handle this?

Calling @effigies, you mentioned in the meeting running into a similar issue some time ago, does this sound similar to your experience?

effigies · June 10, 2021, 9:35pm

The way I did this was that I included the CSV files that I generated during the PsychoPy stimulation in sourcedata/ and wrote a small script to convert them to events.tsv files and placed that script in the code/ directory.

If it’s useful, I can put an example CSV file in a gist, along with the converter. That computer is suspended and in a room with someone on a call, at the moment, so it would probably be tomorrow if you remind me…

Anyway, not sure if that’s helpful. Without knowing more about what these CSV files look like. But if the data correspond to a functional run, they should be events.tsv and placed in func/, not beh.tsv in beh/.

kalenkovich · June 11, 2021, 9:04am

Yeah, you are right. I felt the same but was hoping you would say it’s actually fine

Where would one document this? In the sidecar json files? Or somewhere else?

The stimulus list file is n_trials x n_columns. The psychopy output file has a few more rows (training trials, intermissions, etc.) but could be trimmed to n_trials rows as well, I guess. The extra rows are unlikely to contain any information anyone would be interested in. So, I guess I could just dump the contents of both files into events.tsv.

The problem is that they contain a lot of rather confusing columns that are irrelevant to the analysis so I don’t want to have those columns in events.tsv. But I do want to preserve the information somewhere because someone else might find some of it important.

I’ve shared files from one subject: stimulus list and psychopy output on Google Drive (I couldn’t figure out how to do it directly here, sorry).

I guess I would prefer if:

the original files were saved somewhere (anywhere really) in the original form,
necessary information from them was copied to events.tsv and
the connection is documented in events.json so the user didn’t have to guess where it came from.

So, putting the files in code as you originally suggested would totally work. I would then add the necessary columns to events.tsv and explain how they got there in events.json.

While replying, I realized that you already gave me all the information on how to solve this! Thank you !

kalenkovich · June 11, 2021, 9:10am

Thank you! Putting a script there simplifies explaining the connection between the original csv files and events.tsv a lot!

That is really not necessary. I think the converter script is very data-specific.

You are absolutely right, of course! I resorted to putting files in beh/ because it was the only place I could find where bids-validator would allow me to have arbitrarily structured data. I’ll add some of the data to events.tsv files in the folder with the recordings.

It really is! Your answer perfectly complements @rwblair’s advice. Thank you for your help!