Hi all,

in the (really nice) intro video of W1D4 about GLMs, Cristina Savin mentions briefly that besides L1 and L2 norms, there are other regularization methods, including ones that encourage neighboring parameters to be similar. I thought that was really interesting and wanted to ask: Does anyone have any more information/links to relevant publications on that?

Thanks in advance!

I think that you can find something more on book on Statistical Learning theory, (e.g. Elements of Statistical Learning by Hastie, Tibshirani )

Cheers,

You can use the same L1 and L2 regularization methods on *differences* of nearby parameters, instead of on the parameters themselves. I.e, instead of the normal L2 regularization:

cost_function + lam * \sum_t |param|^2

you would do

cost_function + lam * \sum_t |param_t - param_{t+1}|^2

If your parameters are arranged on a 2D grid, like an image, then you’d replace t with (x,y) positions and do the sum over nearby (x,y) pairs. If you are using an optimization framework like Pytorch to fit your model, then it is easy to add this kind of regularization, and you don’t have to deal with the pain of figuring out the gradients yourself.

Cool, can’t wait for the PyTorch introduction day of the NMA then. Thanks for answering!