Example Code for Batch (analyzing multiple suibjects)

Steven · January 12, 2022, 4:19am

func_self_filename = os.path.join (data_path_self, sub+‘sess001_task-CupsSelf_space-MNI152NLin2009cAsym_desc-preproc_bold_postproc.nii.gz’)

That way, the variable sub, which as explained above, gets passed into the python script, is concatenated with the rest of the path. As long as the organization is consistent, it should work.

In terms of what exactly the path will be, I do not know since I don’t know your data organization. But essentially, for anything that is subject specific, just do a string concatenation like I showed above in bold to make sure the path varies with what subject name is input.

jacobatunc · January 14, 2022, 8:30pm

Hi Steven,

Thanks again - I have a follow-up question about this. So in my data, there are three different conditions. In my script, I wrote three repeated analysis steps for these conditions respectively. However, we don’t have all three conditions for all subjects, a very small portion of participants only have two conditions. Do you think it might be an issue while using the scripts you provided? Can I skip the errors if they don’t find the corresponding functional and confound files for some subjects?

Thanks
Jacob

Steven · January 14, 2022, 9:28pm

You can surround the appropriate code in

try:
     #CODE THAT WONT WORK FOR EVERYONE HERE
except: print('code did not work for this subject, continuing')

jacobatunc · January 24, 2022, 4:38pm

Hi Steven,

Sorry for getting this question to you after a while, was busy with something else. I will be trying this script in the coming days. I wondered where I should put this? Should it put it at the end of the python script I am working on? So if I did it, I should be able to make the python script run for every subject even in those subjects I don’t have all the conditions?

Steven · January 24, 2022, 4:49pm

You can put that block surrounding code that won’t work for everyone and that you would not want to cause everything to crash.

jacobatunc · January 25, 2022, 5:40pm

Hi Steven, thank you so much, it works out and I know how to adjust for my purpose already. Quick question about the sh. files. So in these two scripts, the subject ID has been defined as “sub”. My understanding is that any “sub” on my python analysis script should refer to the defined variable, right? I think I can use the “sub” whenever I need it in the python script, even saving the data frame with the corresponding ID by using sub_RSA_df.to_csv ('/proj/"path"/sub+leftNACC.csv', index=False, header=True, sep =',')? We have a large dataset and your information would help me plan to organize the data and file. Thanks in advance!

Steven · January 25, 2022, 5:54pm

I don’t know what you mean by this, but I will say that python does not care what variables are called in the bash scripts. Whatever string is passed into the python script in the 2nd bash script will be renamed to whatever you want it to be called by the argparse code block in the python script.

I do not know what this dataframe is, so I cannot answer that. It looks like you would run into errors in the outpath, as string concatenation is not being done right. A + should separate all string objects. So maybe something like (’/proj/’+path+’/’+sub+‘leftNACC.csv’)? Again, I do not know your specific needs and what all these variables are, so you will have to use your best judgement. In general, as long as you aren’t overwriting variables accidentally you should be fine.

jacobatunc · January 26, 2022, 6:10pm

Hi Steven, thank you so much again. I run the sh. file (submit_job_array) to collect subjects, which will call upon the run_python sh. file, based on my understanding. I encountered some errors. In each created slurm.out file, it says, for example sub-003sess001 python: can't open file '.py': [Errno 2] No such file or directory I wondered whether you have any idea about this? I think the two sh. files identify the subject correctly as I have 169 subjects in that folder, and it says “Spawning 169 sub-jobs” after I ran the submit_job_array sh. file.

For the two sh. files, I basically did not change anything, just add the path /proj/name/projname/data_fmriprep/Cups_RSA_JD_T1 for pushd in submit file. For the run_python shell, nothing was changed except the CPU stuff, I guess it would not matter too much.

For my python script, here is part of it:
data_path_self = (’/proj/ name/projname /data_fmriprep/Cups_RSA_JD_T1/sub/func’)
func_self_filename = os.path.join (data_path_self, sub+’_task-CupsSelf_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz’)
confound_self_filename = os.path.join (data_path_self,sub+’_task-CupsSelf_desc-confounds_regressors.tsv’)
confound_self = pd.read_csv(confound_self_filename, delimiter=’\t’)

The data structure looks like this for each subject:
/proj/name/projname/data_fmriprep/Cups_RSA_JD_T1/sub-001sess001:
anat
figures
func (this folder contains func images, confound tsv, and event tsv etc.,)

Do you think my python script will work?

Steven · January 26, 2022, 6:22pm

Did you add the name of the python script to the python call? It sounds to me like you just left it as $YOUR_PYTHON_FUNCTION.py, where you did not define $YOUR_PYTHON_FUNCTION

jacobatunc · January 26, 2022, 6:33pm

Yes - I changed it already, I even try adding a path name for that python script. I saved all the shell files and python scripts in the same folder

Steven · January 26, 2022, 6:35pm

This seems to suggest otherwise, can you attach or paste the text from that script?

jacobatunc · January 26, 2022, 6:52pm

Sure thank you Steven, see below:

There is something I cannot directly copy over, so I just show you the screenshot

Steven · January 26, 2022, 7:01pm

You have to remove the $. That symbol means that you are referencing a variable name.

jacobatunc · January 26, 2022, 7:36pm

Hi Steven, thanks for helping! Yes, I think it works now - the only error I am getting now is more about the compatibility of my python and the python on the HPC server I guess… ImportError: A scipy version of at least 1.2 is required to use nilearn. 1.1.0 was found. Please upgrade scipy

jacobatunc · January 26, 2022, 11:20pm

Hi Steven, I have a quick question hopefully you can help - so I have successfully run the shell files to call upon my python analysis script and have saved data in the right folders. However, one issue is that there is no data calculated and recorded in the corresponding data files. I guess this might be due to the fact that I use “try except” argument (in this case, if “except” was conducted, then there will be no data being recorded in the data frame file)? I have tested the python file with some data and it works for individual data before I adjusted for sbatch purpose. Do you think the whole python script will be conducted as a whole script so that the argument can work?

Steven · January 26, 2022, 11:24pm

I am not sure what you mean by this, sorry.

If you change the except code block to:

except Exception as e:
print(e)
#and other things you want to do in case of exception

then you can see what caused the failure.

Steven

jacobatunc · January 26, 2022, 11:51pm

So for example, here is a section of my script:

If the script is run by line, instead of as a whole, the argument under try would not work, and then it will go to “except”, then there is no defined variable for further calculation, which ultimately results in no data in the data frame. I think that’s why all the data frame files for the subjects are blank. Did I make it clear?

Steven · January 27, 2022, 12:14am

If there is not data for the subject, then why is it a problem that the data frames are empty? Using my earlier suggestion you can change the print statement to print what caused the error if you think the subject should have run successfully.

jacobatunc · January 27, 2022, 1:39am

hey Steven, thanks again! This is really great and helps me identify what are the errors, I got some issues in defining the subject folder, after I identified that and revised the script, everything goes well.

ajschadler · January 27, 2022, 7:28am

Quick question: could you programmatically create job SLURM scripts from within python?
E.g. write a function that accepts (1) job parameters and (2) commands to run, which will then create the job file with necessary header and commands? Or would this not be a good practice?

examples of what I mean can be found here: SLURM Job Submission with R, Python, Bash | Research Computing Lessons