Learning to diagnose failure modes in time-series ML models — seeking guidance & best practices

Hello everyone,

I’m a B.Tech student in Electronics, currently preparing for Google Summer of Code 2026, and I’ve recently started exploring open-source work related to machine learning and time-series analysis.

As part of my learning, I’m working on a small exploratory prototype that focuses on understanding why time-series ML models fail during training and evaluation. Right now, this includes very basic aspects such as:

  • observing overfitting and underfitting via loss curves,
  • simple sanity checks on training vs validation behavior,
  • and structuring the code in a way that makes debugging more systematic.

I’m aware that these are foundational ideas and not novel by themselves. My goal at this stage is to build a solid conceptual understanding before moving towards more advanced issues like data leakage, distribution shift, and robustness in real neural or signal-based datasets.

I’d really appreciate:

  • pointers to recommended practices or case studies in this area,
  • common pitfalls beginners often miss when analyzing time-series models,
  • or suggestions on how to meaningfully extend such basic tools toward research-relevant workflows.

Thank you for your time, and I’m looking forward to learning from the community.