It would help to know the matrix size (3 integer values, describing number of voxels in x-, y- and z-directions) and voxel sizes.
Before contemplating “merging” (or regridding) the data, it would be good to why your data are on different grids. Namely, are you sure that they were really acquired under comparable conditions that can be merged? Also, since sounds like FMRI data, does the TR differ, as well?
Another consideration is that any regridding process smooths your data, necessary, due to spatial interpolation. That is something we typically try to avoid as much as possible during processing-- in AFNI pipelines, we calculate all necessary alignments (EPI motion correction, EPI->anat, anat->template and anything else) and concatenate those transforms and then apply the single one to the EPI. If you regrid your EPI data in the beginning, then you are changing properties of it (smoothing it) from the get-go, and any further processing/alignment will add in more smoothing.
If you are sure it makes sense to combine the data, There are probably several ways to merge them by 1) zeropadding or removing rows, if they are on simialr grids that only differ in matrix dimensions; 2) resampling, leaving each subject data in same space, but regridding the voxels to different sizes/centroids; or 3) aligning the data to a standard space and using the final space’s grid.
Case 1 does not involve smoothing-- if your voxel sizes are the same, and the centroids of the voxels are the same where the FOVs do overlap across your dsets, this might be an option. AFNI’s 3dZeropad can add/subtract slices in any direction of the volume.
Cases 2 and 3 do involve smoothing necessarily, and should be taken with a note of caution, then. You mention putting data into a learning model-- does that assume your data are in standard space (case 3)? Or can they be in individual space (case 2)? Also, are you taking the raw EPIs into your model, or processed/output/derived quantities like betas and statistics from modelling? If using derived quantities, part of your processing might naturally put these into a final standard space for all subjects (so, case 3), as part of the processing (e.g., with afni_proc.py, including the “tlrc” block).
For case 2, you could use AFNI:
3dresample -master REF_DSET -input ODD_DSET -prefix ODD_DSET_IN_REF
to regrid the ODD_DSET to the REF_DSET grid.
For case 3, you could use full processing (afni_proc.py), or probably nonlinear alignment (3dQwarp or @SSwarper, where the latter does both skullstripping (ss) and nonlinear warping).
Sorry for the probably much-longer-than-desired answer, but there are a lot of considerations for trying to answer this simply-phrased question.