DR. ATABAK KH
Cloud Platform Modernization Architect specializing in transforming legacy systems into reliable, observable, and cost-efficient Cloud platforms.
Certified: Google Professional Cloud Architect, AWS Solutions Architect, MapR Cluster Administrator
Problem. Independent multilabel classifiers for Gene Ontology (GO) often violate the ancestor rule: if a child term is predicted “on,” all its ancestors must also be on. Curators then fix outputs by hand, and metrics inflate/deflate unpredictably.
This note shows fast, reproducible ways to enforce hierarchy consistency at inference time, plus alternatives and measurement guidance.
is_a, part_of (and others, e.g., regulates).is_a and optionally selected transitive relations.You must state which relations you close under (e.g., is_a only, or is_a + part_of) and ensure your train/eval labels were also ancestor-closed.
Ensures every predicted child has all ancestors after thresholding.
def close_under_ancestors(scores, go_dag, thr):
"""
scores: np.ndarray [N, C] class probabilities (0..1)
go_dag: object exposing .ancestors(term) and .topo_order()
thr: float or per-class array; threshold(s) in [0,1]
returns: np.uint8 matrix [N, C] with ancestor-closed multilabels
"""
Y = (scores >= thr).astype(np.uint8)
# parents appear before children in topo order
order = go_dag.topo_order() # e.g., list of term indices
for t in reversed(order): # visit children before parents
for a in go_dag.ancestors(t): # ancestors(a) excludes t
# If child t is on, force ancestor a on
Y[:, a] = np.maximum(Y[:, a], Y[:, t])
return Y
When to use: simple, transparent, and fast for most setups; great as a default.
Precompute a binary ancestor matrix (A\in{0,1}^{C\times C}) where (A_{a,t}=1) if (a\in\mathrm{Anc}(t)).
# A_anc: [C, C] with 1 if row=ancestor, col=child
# Y: [N, C] initial post-threshold labels
# Y_closed = Y OR (Y @ A_anc^T)
Y_closed = (Y | (Y.dot(A_anc.T) > 0)).astype(np.uint8)
This does the same “turn on all ancestors” in a single matrix multiply (Boolean semiring approximated with dot+>0). Build A_anc once from your DAG (careful with memory if C is large; use scipy.sparse).
If you keep probabilities consistent before thresholding, you can apply class-specific thresholds later without re-violating parents.
Simple projection: for each edge (a\to t), enforce (p(t)\le p(a)):
def monotone_project_probs(P, edges, iters=1):
"""
P: np.ndarray [N, C] class probabilities
edges: list of (parent, child) index pairs consistent with topo order
"""
P = P.copy()
for _ in range(iters): # a few passes often suffice
for a, t in edges: # parent, child
# clamp child to parent if it violates monotonicity
P[:, t] = np.minimum(P[:, t], P[:, a])
return P
Hierarchical loss (train-time): add a penalty or constraint that encourages (p(child)\le p(parent)).
Lowest-ancestor prediction (LA): predict only the minimal set (no predicted term has a predicted child). Expand to ancestors for the final set.
| Energy-based / projected inference: solve (\min | p - \hat p | ^2) s.t. (p(child)\le p(parent)) over the DAG (isotonic regression on a DAG). |
Pre-threshold (probabilities): monotone projection -> then per-class thresholds.
Post-threshold (labels): fastest, easiest; takes your Y and enforces ancestor closure.
Recommendation: keep the post-threshold closure as your mainline (transparent), and optionally provide pre-threshold monotone projection for calibration-sensitive users.
is_a only (common), or is_a + part_of (state it).scipy.sparse.csr_matrix.Sparse example (post-threshold vectorized):
import numpy as np
from scipy.sparse import csr_matrix
# Build sparse ancestor matrix A_anc [C, C] once:
# rows = ancestor indices, cols = child indices (value=1)
A_anc = csr_matrix((data, (rows, cols)), shape=(C, C))
def close_labels_sparse(Y):
# Y: [N, C] uint8
# Y_closed = Y OR (Y @ A_anc^T) (Boolean-OR semantics)
propagated = (Y.dot(A_anc.T) > 0).astype(np.uint8)
return np.maximum(Y, propagated)
import torch
# P: [B, C] probs; edges: LongTensor [E, 2] (parent, child) topo-sorted
def monotone_project_probs_torch(P, edges, iters=1):
P = P.clone()
parents, children = edges[:,0], edges[:,1]
for _ in range(iters):
P[:, children] = torch.minimum(P[:, children], P[:, parents])
return P
# Post-threshold closure using a sparse COO ancestor matrix A_anc: [C, C]
def close_labels_torch(Y, A_anc):
# Y: [B, C] uint8 or bool; A_anc: torch.sparse_coo_tensor
propagated = torch.sparse.mm(A_anc.transpose(0,1), Y.T.float()).T > 0
return (Y.bool() | propagated).to(torch.uint8)
Always report both the raw (possibly inconsistent) and the consistent results in ablations.
Ablation table template
| Method | Viol% | Fmax_BPP | Fmax_MF | Fmax_CC | micro-auPRC |
|--------------------------------|-------|----------|---------|---------|-------------|
| PLM + per-class thr (raw) | 12.4 | 0.41 | 0.52 | 0.49 | 0.36 |
| + Post-thr closure (set) | 0.0 | 0.43 | 0.54 | 0.50 | 0.37 |
| + Pre-thr monotone projection | 0.0 | 0.44 | 0.55 | 0.51 | 0.38 |
Over-activation of very general terms (high up the DAG) can reduce precision. Mitigation: use per-class thresholds (higher for general terms), or apply closure after minimal-set selection (lowest-ancestor strategy).
Noisy children forcing noisy parents on. Mitigation: apply probability-space projection first, which tends to dampen child scores that exceed parents.
https://pmc.ncbi.nlm.nih.gov/articles/PMC3654867/ https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1835-8 https://www.science.org/doi/10.1126/science.ade2574