DR. ATABAK KH
Cloud Platform Modernization Architect specializing in transforming legacy systems into reliable, observable, and cost-efficient Cloud platforms.
Certified: Google Professional Cloud Architect, AWS Solutions Architect, MapR Cluster Administrator
Takeaway: Frozen PLM embeddings + linear classifier = strong, fast baseline for GO prediction.
1) Embed proteins (batchable; GPU helpful but not required). 2) Train one-vs-rest Logistic Regression (balanced). 3) Calibrate; close predictions under GO ancestors. 4) Threshold per class for Fmax.
# Load embeddings and labels
X_tr, Y_tr = ... # [N_tr, D], [N_tr, C]
X_val, Y_val = ...
from sklearn.linear_model import LogisticRegression
from sklearn.multiclass import OneVsRestClassifier
clf = OneVsRestClassifier(LogisticRegression(max_iter=4000, class_weight="balanced"))
clf.fit(X_tr, Y_tr)
P = clf.predict_proba(X_val)
P = close_under_ancestors(P, go_dag) # hierarchy consistency
th = tune_thresholds(P, Y_val, metric="Fmax")
Y_hat = (P >= th).astype(int)
C=1.0 (grid: 0.1-10), early stop on val Fmax.class_weight="balanced"; consider focal loss for MLP.Tip: Keep PLM frozen at first. Fine-tune only with careful regularization and time-split eval.