Ideal Parent Structure Learning School of Engineering & Computer Science The Hebrew University, Jerusalem, Israel Gal Elidan with Iftach Nachman and Nir Friedman
Problems: Need to score many candidates Each one requires costly parameter optimization Structure learning is often impractical S C E D S C E D S C E D S C E D Learning Structure Data Variables Input: Instances S C E D Output: Init: Start with initial structure Consider local changes 1 Score each candidate 2 Apply best modification 3 The Ideal Parent Approach Approximate improvements of changes (fast) Optimize & score promising candidates (slow)
E C P(E| C) D A C E B Linear Gaussian Networks
Goal: Score only promising candidates The Ideal Parent Idea Parent Profile Child Profile Instances Pred(X|U) U X
Goal: Score only promising candidates The Ideal Parent Idea Ideal Profile Instances Pred(X|U) U X Y Step 1: Compute optimal hypothetical parent Pred(X|U,Y) Instances potential parents Step 2: Search for similar parent Z1Z1 Z2Z2 Z3Z3 Z4Z4 Parent Profile Child Profile
Step 3: Add new parent and optimize parameters Goal: Score only promising candidates The Ideal Parent Idea Instances U X Step 1: Compute optimal hypothetical parent Instances potential parents Step 2: Search for similar parent Z1Z1 Z2Z2 Z3Z3 Z4Z4 Pred(X|U,Y) Ideal Profile Y Parent(s) Profile Z2Z2 Predicted(X|U,Z) Child Profile
Choosing the best parent Z Our goal: Choose Z that maximizes U X Z U X Likelihood of Theorem: likelihood improvement when only z is optimized y,z Y Z We define:
Similarity vs. Score C 2 is more accurate C 1 will be useful later score C 2 Similarity score C 1 Similarity We now have an efficient approximation for the score effect of fixed variance is large
Ideal Parent in Search Structure search involves O(N 2 ) Add parent O(NE) Replace parent O(E) Delete parent O(E) Reverse edge S C E D S C E D S C E D S C E D Vast majority of evaluations are replaced by ideal approximation Only K candidates per family are optimized and scored
Gene Expression Experiment 4 Gene expression datasets with 44 (Amino), 89 (Metabolism) and 173 (2xConditions) variables K test -log-likelihood Amino Metabolism Conditions (AA) Conditions (Met) K speedup K 0.4%-3.6% changes evaluated greedy Speedup:
Scope Conditional probability distribution (CPD) of the form link function white noise General requirement: g(U) be any invertible (w.r.t u i ) function Linear GaussianChemical ReactionSigmoid Gaussian
Problem: No simple form for similarity measures Sigmoid Gaussian CPD P(X=0.5|Z) Z P(X=0.85|Z) 0 1 g(z) Z X = 0.5 X = g(z) 0.5 Y(0.5)Y(0.85) Linear approximation around Y=0 Exact Approx Z X Likelihood Solution: Sensitivity to Z depends on gradient of specific instance Z
Sigmoid Gaussian CPD Z x 0.25 ( g 0.5 ) Z x ( g 0.85 ) Z (X=0.5) Z (X=0.85) Equi-Likelihood PotentialAfter gradient correction We can now use the same measure
Sigmoid Gene Expression 4 Gene expression datasets with 44 (Amino), 89 (Metabolism) and 173 (Conditions) variables test -log-likelihood K Amino Metabolism Conditions (AA) Conditions (Met) greedy speedup K 2.2%-6.1% moves evaluated times faster
For the Linear Gaussian case: Challenge: Find that maximizes this bound Adding New Hidden Variables Idea Profile Idea: Introduce hidden parent for nodes with similar ideal profiles H X1X1 X2X2 X4X4 X1X1 X2X2 X3X3 X4X4 X5X5 Y1Y1 Y2Y2 Y3Y3 Y4Y4 Y5Y5 Instances
where is the matrix whose columns are must lie in the span of is the eigenvector with largest eignevalue Setting and using the above (with A invertible) Scoring a parent Rayleigh quotient of the matrix and. Finding h* amounts to solving an eigenvector problem where |A|=size of cluster
X1X1 X2X2 X3X3 X4X4 X1X1 X2X2 X3X3 X4X4 compute only once Compute using X1X1 X2X X1X1 X3X X3X3 X4X Finding the best Cluster
X1X1 X2X2 X3X3 X4X4 X1X1 X2X2 X3X3 X4X4 compute only once X1X1 X3X3 X1X1 X3X3 X1X1 X2X X1X1 X3X X3X3 X4X X1X1 X3X3 X2X2 X2X X4X4 X1X1 X3X3 X2X2 X4X Finding the best Cluster wSelect cluster with highest score wAdd hidden parent and continue with search
Bipartite Network Instances from biological expert network with 7 (hidden) parents and 141 (observed) children test log-likelihood Instances train log-likelihood Instances Greedy Ideal K=2 Ideal K=5 Gold Speedup is roughly x 10 Greedy takes over 2.5 days!
Summary New method for significantly speeding up structure learning in continuous variable networks Offers promising time vs. performance tradeoff Guided insertion of new hidden variables Future work Improve cluster identification for non-linear case Explore additional distributions and relation to GLM Combine the ideal parent approach as plug-in with other search approaches