DR. ATABAK KH
Cloud Platform Modernization Architect specializing in transforming legacy systems into reliable, observable, and cost-efficient Cloud platforms.
Certified: Google Professional Cloud Architect, AWS Solutions Architect, MapR Cluster Administrator
Purpose: A short checklist to avoid inflated or unstable GO results.
1) Random splits <> real life
Use time-based splits; report T0/T1 explicitly.
2) Ancestor leakage in labels
Propagate labels up the DAG in both train and eval.
3) Non-hierarchical inference
Post-process to enforce ancestor closure or use hierarchical losses.
4) Cherry-picked metrics
Always report Fmax + micro/macro-auPRC, coverage, ECE.
5) Long-tail collapse
Balance classes (weights), evaluate by IC bins, and show rare-term PR.
6) No calibration
Add isotonic/temperature scaling; include reliability plots.
7) Irreproducible environment
Pin versions; export results.json, seeds, and cfgs.
Rule of thumb: if someone else can’t re-run your eval.py and get the same results.json, the benchmark isn’t done.
This is a personal blog. The views, thoughts, and opinions expressed here are my own and do not represent, reflect, or constitute the views, policies, or positions of any employer, university, client, or organization I am associated with or have been associated with.