DR. ATABAK KH
Cloud Platform Modernization Architect specializing in transforming legacy systems into reliable, observable, and cost-efficient Cloud platforms.
Certified: Google Professional Cloud Architect, AWS Solutions Architect, MapR Cluster Administrator
Problem. Independent multilabel classifiers for Gene Ontology (GO) often violate the ancestor rule: if a child term is predicted “on,” all its ancestors must also be on. Curators then fix outputs by hand, and metrics inflate/deflate unpredictably.
This note shows fast, reproducible ways to enforce hierarchy consistency at inference time, plus alternatives and measurement guidance.
is_a, part_of (and others, e.g., regulates).is_a and optionally selected transitive relations.You must state which relations you close under (e.g., is_a only, or is_a + part_of) and ensure your train/eval labels were also ancestor-closed.
Ensures every predicted child has all ancestors after thresholding.
def close_under_ancestors(scores, go_dag, thr):
"""
scores: np.ndarray [N, C] class probabilities (0..1)
go_dag: object exposing .ancestors(term) and .topo_order()
thr: float or per-class array; threshold(s) in [0,1]
returns: np.uint8 matrix [N, C] with ancestor-closed multilabels
"""
Y = (scores >= thr).astype(np.uint8)
# parents appear before children in topo order
order = go_dag.topo_order() # e.g., list of term indices
for t in reversed(order): # visit children before parents
for a in go_dag.ancestors(t): # ancestors(a) excludes t
# If child t is on, force ancestor a on
Y[:, a] = np.maximum(Y[:, a], Y[:, t])
return Y
When to use: simple, transparent, and fast for most setups; great as a default.
Precompute a binary ancestor matrix (A\in{0,1}^{C\times C}) where (A_{a,t}=1) if (a\in\mathrm{Anc}(t)).
# A_anc: [C, C] with 1 if row=ancestor, col=child
# Y: [N, C] initial post-threshold labels
# Y_closed = Y OR (Y @ A_anc^T)
Y_closed = (Y | (Y.dot(A_anc.T) > 0)).astype(np.uint8)
This does the same “turn on all ancestors” in a single matrix multiply (Boolean semiring approximated with dot+>0). Build A_anc once from your DAG (careful with memory if C is large; use scipy.sparse).
If you keep probabilities consistent before thresholding, you can apply class-specific thresholds later without re-violating parents.
Simple projection: for each edge (a\to t), enforce (p(t)\le p(a)):
def monotone_project_probs(P, edges, iters=1):
"""
P: np.ndarray [N, C] class probabilities
edges: list of (parent, child) index pairs consistent with topo order
"""
P = P.copy()
for _ in range(iters): # a few passes often suffice
for a, t in edges: # parent, child
# clamp child to parent if it violates monotonicity
P[:, t] = np.minimum(P[:, t], P[:, a])
return P
Hierarchical loss (train-time): add a penalty or constraint that encourages (p(child)\le p(parent)).
Lowest-ancestor prediction (LA): predict only the minimal set (no predicted term has a predicted child). Expand to ancestors for the final set.
| Energy-based / projected inference: solve (\min | p - \hat p | ^2) s.t. (p(child)\le p(parent)) over the DAG (isotonic regression on a DAG). |
Pre-threshold (probabilities): monotone projection -> then per-class thresholds.
Post-threshold (labels): fastest, easiest; takes your Y and enforces ancestor closure.
Recommendation: keep the post-threshold closure as your mainline (transparent), and optionally provide pre-threshold monotone projection for calibration-sensitive users.
is_a only (common), or is_a + part_of (state it).scipy.sparse.csr_matrix.Sparse example (post-threshold vectorized):
import numpy as np
from scipy.sparse import csr_matrix
# Build sparse ancestor matrix A_anc [C, C] once:
# rows = ancestor indices, cols = child indices (value=1)
A_anc = csr_matrix((data, (rows, cols)), shape=(C, C))
def close_labels_sparse(Y):
# Y: [N, C] uint8
# Y_closed = Y OR (Y @ A_anc^T) (Boolean-OR semantics)
propagated = (Y.dot(A_anc.T) > 0).astype(np.uint8)
return np.maximum(Y, propagated)
import torch
# P: [B, C] probs; edges: LongTensor [E, 2] (parent, child) topo-sorted
def monotone_project_probs_torch(P, edges, iters=1):
P = P.clone()
parents, children = edges[:,0], edges[:,1]
for _ in range(iters):
P[:, children] = torch.minimum(P[:, children], P[:, parents])
return P
# Post-threshold closure using a sparse COO ancestor matrix A_anc: [C, C]
def close_labels_torch(Y, A_anc):
# Y: [B, C] uint8 or bool; A_anc: torch.sparse_coo_tensor
propagated = torch.sparse.mm(A_anc.transpose(0,1), Y.T.float()).T > 0
return (Y.bool() | propagated).to(torch.uint8)
Always report both the raw (possibly inconsistent) and the consistent results in ablations.
Ablation table template
| Method | Viol% | Fmax_BPP | Fmax_MF | Fmax_CC | micro-auPRC |
|--------------------------------|-------|----------|---------|---------|-------------|
| PLM + per-class thr (raw) | 12.4 | 0.41 | 0.52 | 0.49 | 0.36 |
| + Post-thr closure (set) | 0.0 | 0.43 | 0.54 | 0.50 | 0.37 |
| + Pre-thr monotone projection | 0.0 | 0.44 | 0.55 | 0.51 | 0.38 |
Over-activation of very general terms (high up the DAG) can reduce precision. Mitigation: use per-class thresholds (higher for general terms), or apply closure after minimal-set selection (lowest-ancestor strategy).
Noisy children forcing noisy parents on. Mitigation: apply probability-space projection first, which tends to dampen child scores that exceed parents.
https://pmc.ncbi.nlm.nih.gov/articles/PMC3654867/ https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1835-8 https://www.science.org/doi/10.1126/science.ade2574
This is a personal blog. The views, thoughts, and opinions expressed here are my own and do not represent, reflect, or constitute the views, policies, or positions of any employer, university, client, or organization I am associated with or have been associated with.