DR. ATABAK KH
Cloud Platform Modernization Architect specializing in transforming legacy systems into reliable, observable, and cost-efficient Cloud platforms.
Certified: Google Professional Cloud Architect, AWS Solutions Architect, MapR Cluster Administrator
Takeaway: Frozen PLM embeddings + linear classifier = strong, fast baseline for GO prediction.
1) Embed proteins (batchable; GPU helpful but not required). 2) Train one-vs-rest Logistic Regression (balanced). 3) Calibrate; close predictions under GO ancestors. 4) Threshold per class for Fmax.
# Load embeddings and labels
X_tr, Y_tr = ... # [N_tr, D], [N_tr, C]
X_val, Y_val = ...
from sklearn.linear_model import LogisticRegression
from sklearn.multiclass import OneVsRestClassifier
clf = OneVsRestClassifier(LogisticRegression(max_iter=4000, class_weight="balanced"))
clf.fit(X_tr, Y_tr)
P = clf.predict_proba(X_val)
P = close_under_ancestors(P, go_dag) # hierarchy consistency
th = tune_thresholds(P, Y_val, metric="Fmax")
Y_hat = (P >= th).astype(int)
C=1.0 (grid: 0.1-10), early stop on val Fmax.class_weight="balanced"; consider focal loss for MLP.Tip: Keep PLM frozen at first. Fine-tune only with careful regularization and time-split eval.
This is a personal blog. The views, thoughts, and opinions expressed here are my own and do not represent, reflect, or constitute the views, policies, or positions of any employer, university, client, or organization I am associated with or have been associated with.