Update: I found some helpful info about this issue here:
It seems that although B0 spatial distortion may vary with echo time (depending on sequence), the currently recommended approach is to use a single transform (e.g. echo-1) for all images.
(sorry - I previously searched pepolar so rather than blip so missed this response!)