Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California

Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California http://cll.stanford.edu/~langley Knowledge, Data, and Search in Computational Discovery Thanks to Kevin Arrigo, Stuart Borrett, Will Bridewell, and Ljupco Todorovski for their contributions to this work, and to the National Science Foundation for funding.

Qualitative Laws of Intelligence  the ability to store, retrieve, and manipulate list structures  since computers are general symbol manipulators  the ability to solve novel problems by heuristic search  with problem spaces defined by states and operators In their 1975 Turing Award speech, Newell and Simon claimed that intelligence depends on two factors: Moreover, one can constrain search with knowledge that is cast as symbolic list structures. These insights underlie the fields of artificial intelligence and cognitive science.

Two Basic Claims  knowledge structures are important results of machine learning and discovery  knowledge structures are important inputs to machine learning and discovery Newell and Simon’s insights suggest the two claims of this talk: In other words, knowledge plays as crucial a role as data in the automation of discovery. I will illustrate these ideas using recent work on induction of scientific process models.

The Mainstream View Learning/Discovery Process Training Data Predictive Model Nearly all current research in machine learning and data mining takes this perspective.

An Alternative View Learning/Discovery Process Existing Knowledge Acquired Knowledge Training Data This perspective is now uncommon, but the ideas themselves are not new to machine learning and discovery.

Historical Landmarks in Machine Learning  1980 – Machine learning launched as an outgrowth of symbolic AI  1983 – Early emphasis on knowledge-guided approaches to learning  1986 – First issue of the journal Machine Learning published  1989 – Advent of UCI repository and routine experimental evaluation  1989 – Introduction of statistical methods from pattern recognition  1993 – Workshop on fielded applications of machine learning  1995 – First conference on knowledge discovery and data mining  1997 – Explosion of the Web and associated research on text mining  2001 – Strong focus on predictive accuracy over understandability  2004 – Prevalence of statistical methods over symbolic approaches

Knowledge as Output of Discovery Systems  have been stated in some declarative format  that can be communicated clearly and precisely  which helps people understand observations  in terms that they find plausible and familiar Discovery systems produce models that are useful for prediction, but they should also produce models that: We typically refer to the content of such models as knowledge.

What is Knowledge?  criteria tables (M of N rules) in diagnostic medicine  molecular structures and reaction pathways in chemistry  qualitative causal models in biology and geology  structural equations in economics and sociology  differential equations in physics and ecology Knowledge can be cast in many different formalisms, such as: Discovery systems should generate knowledge in a format that is familiar to domain users. Fortunately, computers can encode all such forms of knowledge.

Successes of Scientific Knowledge Discovery Over the past decade, computational discovery systems have helped uncover new knowledge in many scientific fields:  qualitative chemical factors in mutagenesis (King et al., 1996)  quantitative laws of metallic behavior (Sleeman et al., 1997)  qualitative conjectures in number theory (Colton et al., 2000)  temporal laws of ecological behavior (Todorovski et al., 2000)  reaction pathways in catalytic chemistry (Valdes-Perez, 1994) Each has led to publications in the refereed scientific literature, the key measure of academic success. For a review of these scientific results, see Langley (IJHCS, 2000).

Description vs. Explanation  move beyond superficial descriptive summaries  to account for observations at a deeper theoretical level  in terms of unobserved concepts and mechanisms  that are familiar and plausible to domain experts Traditional discovery systems have focused on descriptive models that summarize data and make accurate predictions. But many sciences are concerned with explanatory models that: Explanations may or may not have quantitative aspects, but they invariably have qualitative structure not captured by statistics.

Two Accounts of the Ross Sea Ecosystem d[phyto,t,1] =  0.307  phyto  0.495  zoo + 0.411  phyto d[zoo,t,1] =  0.251  zoo + 0.615  0.495  zoo d[detritus,t,1] = 0.307  phyto + 0.251  zoo + 0.385  0.495  zoo  0.005  detritus d[nitro,t,1] =  0.098  0.411  phyto + 0.005  detritus As phytoplankton uptakes nitrogen, its concentration increases and nitrogen decreases. This continues until the nitrogen supply is exhausted, which leads to a phytoplankton die off. This produces detritus, which gradually remineralizes to replenish the nitrogen. Zooplankton grazes on phytoplankton, which slows the latter’s increase and also produces detritus.

Relating Equation Terms to Processes d[phyto,t,1] =  0.307  phyto  0.495  zoo + 0.411  phyto d[zoo,t,1] =  0.251  zoo + 0.615  0.495  zoo d[detritus,t,1] = 0.307  phyto + 0.251  zoo + 0.385  0.495  zoo  0.005  detritus d[nitro,t,1] =  0.098  0.411  phyto + 0.005  detritus As phytoplankton uptakes nitrogen, its concentration increases and nitrogen decreases. This continues until the nitrogen supply is exhausted, which leads to a phytoplankton die off. This produces detritus, which gradually remineralizes to replenish the nitrogen. Zooplankton grazes on phytoplankton, which slows the latter’s increase and also produces detritus.

A Process Model for the Ross Sea model Ross_Sea_Ecosystem variables: phyto, zoo, nitro, detritus observables: phyto, nitro process phyto_loss equations:d[phyto,t,1] =  0.307  phyto equations:d[phyto,t,1] =  0.307  phyto d[detritus,t,1] = 0.307  phyto process zoo_loss equations:d[zoo,t,1] =  0.251  zoo equations:d[zoo,t,1] =  0.251  zoo d[detritus,t,1] = 0.251  zoo process zoo_phyto_grazing equations:d[zoo,t,1] = 0.615  0.495  zoo equations:d[zoo,t,1] = 0.615  0.495  zoo d[detritus,t,1] = 0.385  0.495  zoo d[phyto,t,1] =  0.495  zoo process nitro_uptake equations:d[phyto,t,1] = 0.411  phyto equations:d[phyto,t,1] = 0.411  phyto d[nitro,t,1] =  0.098  0.411  phyto process nitro_remineralization; equations:d[nitro,t,1] = 0.005  detritus equations:d[nitro,t,1] = 0.005  detritus d[detritus,t,1 ] =  0.005  detritus This model is equivalent to a standard differential equation model, but it makes explicit assumptions about which processes are involved. For completeness, we must also make assumptions about how to combine influences from multiple processes.

Advantages of Process Models  they embed quantitative relations within qualitative structure;  that refer to notations and mechanisms familiar to experts;  they provide dynamical predictions of changes over time;  they offer causal and explanatory accounts of phenomena;  while retaining the modularity that is needed for induction. Process models are a promising representational scheme because: Quantitative process models provide an important alternative to formalisms typically used in modeling and discovery.

The Task of Inductive Process Modeling We can use these ideas to reformulate the modeling problem:  Given: A set of variables of interest to the scientist;  Given: Observations of how these variables change over time;  Given: Background knowledge about plausible processes;  Find: A process model that explains these variations and that generalizes well to future observations. The resulting model encodes new knowledge about the domain.

Challenges of Inductive Process Modeling  process models characterize behavior of dynamical systems;  variables are continuous but can have discontinuous behavior;  observations are not independently and identically distributed;  models may contain unobservable processes and variables;  multiple processes can interact to produce complex behavior. We can use ideas from machine learning to induce process models, but this differs from typical learning tasks in that: Compensating factors include a focus on deterministic systems and ways to constrain the search for models.

Machine Learning as Heuristic Search Heuristic search depends on ways to guide exploration of the space.

Knowledge as Input to Discovery Systems  by providing constraints on the space searched  as in work on declarative bias for induction  by providing operators used during search  as in ILP research on relational cliches  by providing a starting point for heuristic search  as in work on theory revision and refinement One can also use knowledge to guide discovery mechanisms: Using knowledge to influence discovery can reduce prediction error but also improve model understandability.

Background Knowledge as Constraints  Horn clause programs (e.g., King et al., 1996)  context-free grammars (e.g., Dzeroski & Todorovski, 1997)  prior probability distributions (e.g., Friedman et al., 2000) We can use background knowledge about the domain to constrain search for candidate models. Previous work has encoded background knowledge in terms of: However, none of these notations are familiar to domain scientists, which suggests the need for another approach.

Generic Processes as Background Knowledge  the variables involved in a process and their types;  the parameters appearing in a process and their ranges;  the forms of conditions on the process; and  the forms of associated equations and their parameters. We cast background knowledge as generic processes that specify: Generic processes are building blocks from which one can compose specific process models.

Generic Processes for Aquatic Ecosystems generic process exponential_lossgeneric process remineralization variables: S{species}, D{detritus} variables: N{nutrient}, D{detritus} variables: S{species}, D{detritus} variables: N{nutrient}, D{detritus} parameters:  [0, 1] parameters:  [0, 1] parameters:  [0, 1] parameters:  [0, 1] equations:d[S,t,1] =  1    S equations:d[N, t,1] =   D equations:d[S,t,1] =  1    S equations:d[N, t,1] =   D d[D,t,1] =   Sd[D, t,1] =  1    D generic process grazinggeneric process constant_inflow variables: S1{species}, S2{species}, D{detritus} variables: N{nutrient} variables: S1{species}, S2{species}, D{detritus} variables: N{nutrient} parameters:  [0, 1],  [0, 1] parameters: [0, 1] parameters:  [0, 1],  [0, 1] parameters: [0, 1] equations:d[S1,t,1] =     S1 equations:d[N,t,1] = equations:d[S1,t,1] =     S1 equations:d[N,t,1] = d[D,t,1] = (1   )    S1 d[S2,t,1] =  1    S1 generic process nutrient_uptake variables: S{species}, N{nutrient} variables: S{species}, N{nutrient} parameters:  [0,  ],  [0, 1],  [0, 1] parameters:  [0,  ],  [0, 1],  [0, 1] conditions:N >  conditions:N >  equations:d[S,t,1] =   S equations:d[S,t,1] =   S d[N,t,1] =  1      S Our current library contains about 20 generic processes, including ones with alternative functional forms for loss and grazing processes.

A Method for Process Model Construction 1. Find all ways to instantiate known generic processes with specific variables, subject to type constraints; 2. Combine instantiated processes into candidate generic models subject to additional constraints (e.g., number of processes); 3. For each generic model, carry out search through parameter space to find good coefficients; 4. Return the parameterized model with the best overall score. We have developed IPM, a system that constructs explanatory process models from generic components in four stages: Our typical evaluation metric is squared error, but we have also explored other measures of explanatory adequacy.

Estimating Parameters in Process Models 1. Selects random initial values that fall within ranges specified in the generic processes; 2. Improves these parameters using the Levenberg-Marquardt method until it reaches a local optimum; 3. Generates new candidate values through random jumps along dimensions of the parameter vector and continue search; 4. If no improvement occurs after N jumps, it restarts the search from a new random initial point. To estimate the parameters for each generic model structure, the IPM algorithm: This multi-level method gives reasonable fits to time-series data from a number of domains, but it is computationally intensive.

Results on Training Data from Ross Sea We provided IPM with 188 samples of phytoplankton, nitrogen, light, and ice measures for the Ross Sea. From 2035 distinct model structures, it found accurate models that limited phyto growth by the nitrate and the light available. Some high-ranking models incorporated zooplankton, whereas others did not.

Results on a Protist Ecosystem We also ran the system on protist data from Villeaux (1979), using 54 samples of two variables (P. aurelia and P. nasutum). In this run, IPM considered a space of 470 distinct model structures and reproduced basic trends.

Results on Rinkobing Fjord Data from a Danish fjord included measurements on fjord height, sea level, water inflow, and wind direction and speed. We used 1100 samples for training and 551 samples for testing over a space of 32 model structures.

Results on Battery Data from Space Station Data from the Space Station batteries included current, voltage, and temperature, with resistance and state of charge unobserved. We used 6000 samples for training and 2640 samples for testing over a space of 162 model structures.

Results on Biochemical Kinetics We also ran IPM on 14 samples of six chemicals involved in glycolysis from a pulse response study. Here the system considered some 172 model structures. The best model fit the data but reproduced only part of the known pathway.

Hierarchical Induction of Process Models  organizes background knowledge into a hierarchy of processes;  specifies required vs. optional components and mutual exclusion;  associates variables with entities that occur in processes;  carries out beam search through the resulting AND/OR space. Despite its success, we have observed IPM produce models that lack required components or include mutually exclusive ones. In response, we have developed an extended system, HIPM, that: We hypothesized this additional knowledge would reduce search effort and variance, thus improving generalization error. For more details about HIPM, see Todorovski et al. (AAAI-2005).

HIPM Results on Ross Sea Data BeamSystem # models Test SSE Test r 2 4IPM49221260.53 HIPM 68 6815770.68 8IPM84517870.62 HIPM11715640.68 32IPM357216330.68 HIPM 447 44711130.84 HIPM examines fewer models and has better predictive accuracy.

Research on Theory Revision  Horn clause programs (e.g., Ourston & Mooney, 1990)  diagnostic fault hierarchies (e.g., Langley et al., 1994)  qualitative causal models (e.g., Bay et al., 2003)  sets of quantitative equations (e.g., Todorovski et al., 2003) We can also use background knowledge to specify initial models from which to start search. Research on theory revision has applied this idea to models cast as: This approach typically produces models that are more accurate and easier to comprehend than ones induced from scratch.

Inductive Revision of Process Models Revision initial model observations revised model model RossSeaEcosystem variables: phyto, zoo, nitro, residue observables: phyto, nitro d[phyto,t,1] =  0.307  phyto  0.495  zoo + 0.411  phyto + 0.411  phyto d[zoo,t,1] =  0.251  zoo + 0.615  0.495  zoo d[residue,t,1] = 0.307  phyto + 0.251  zoo + 0.385  0.495  zoo  0.005  residue + 0.385  0.495  zoo  0.005  residue d[nitro,t,1] =  0.098  0.411  phyto + 0.005  residue model RossSeaEcosystem variables: phyto, zoo, nitro, residue observables: phyto, nitro d[phyto,t,1] =  0.307  phyto  0.495  zoo + 0.411  phyto + 0.411  phyto d[zoo,t,1] =  0.251  zoo + 0.615  0.495  zoo d[residue,t,1] = 0.307  phyto + 0.251  zoo + 0.385  0.495  zoo  0.005  residue + 0.385  0.495  zoo  0.005  residue d[nitro,t,1] =  0.098  0.411  phyto + 0.005  residue process exponential_growth variables: P {population} variables: P {population} equations: d[P,t] = [0, 1,  ]  P equations: d[P,t] = [0, 1,  ]  P process logistic_growth variables: P {population} variables: P {population} equations: d[P,t] = [0, 1,  ]  P  (1  P / [0, 1,  ]) equations: d[P,t] = [0, 1,  ]  P  (1  P / [0, 1,  ]) process constant_inflow variables: I {inorganic_nutrient} variables: I {inorganic_nutrient} equations: d[I,t] = [0, 1,  ] equations: d[I,t] = [0, 1,  ] process consumption variables: P1 {population}, P2 {population}, nutrient_P2 variables: P1 {population}, P2 {population}, nutrient_P2 equations: d[P1,t] = [0, 1,  ]  P1  nutrient_P2, equations: d[P1,t] = [0, 1,  ]  P1  nutrient_P2, d[P2,t] =  [0, 1,  ]  P1  nutrient_P2 d[P2,t] =  [0, 1,  ]  P1  nutrient_P2 process no_saturation variables: P {number}, nutrient_P {number} variables: P {number}, nutrient_P {number} equations: nutrient_P = P equations: nutrient_P = P process saturation variables: P {number}, nutrient_P {number} variables: P {number}, nutrient_P {number} equations: nutrient_P = P / (P + [0, 1,  ]) equations: nutrient_P = P / (P + [0, 1,  ]) generic processes

Comprehensible Bagging of Process Models  creates multiple training sets by sampling the original data;  uses HIPM to induce one process model from each training set;  creates a new model structure that includes common processes;  estimates parameters for this structure from the original data. We have seen HIPM produce models that fit the training data but generalize poorly, so we created another system – FUSE – that: We hypothesized this method would reduce generalization error while keeping models understandable, unlike bagging. This shows one can combine ideas about knowledge and statistics. For more details about FUSE, see Bridewell et al. (ICML-2005).

r2r2r2r2 SSE FUSE Results on Ross Sea Data Five-fold cross validation on 188 measurements of two variables Cross-validation fold

Process Modeling and Missing Data 1.replaces missing values with interpolated estimates; 2.uses HIPM to find the model that minimizes squared error; 3.replaces the estimated values with ones the model predicts; 4.if some values have changed, then return to Step 2. Our initial algorithms assumed that the variables have no missing samples, so we have developed another extension to HIPM that: Experiments suggest that this expectation-maximization variant reduces error substantially on unseen data. This shows another way to combine knowledge with statistics. For more details, see Bridewell et al. (submitted to ICML-2006).

Contributions of the Research  a formalism that states scientific knowledge as process models  an encoding for background knowledge as generic processes  a computational method for inducing process models  a related technique for revising initial process models  extended methods that combine knowledge with statistics In summary, our work on computational discovery has produced: Inductive process modeling has great potential to help scientists construct explanatory models of dynamical systems.

Future Research on Process Modeling  produce additional results on other scientific data sets  develop more efficient methods for fitting model parameters  extend framework to handle partial differential equations  explore evaluation metrics like match to trajectory shape  introduce subsystems to support large-scale modeling Despite our progress to date, we need further work in order to: Taken together, these will make inductive process modeling a more robust approach to scientific knowledge discovery.

Relevance for Feature Selection  placing constraints on acceptable combinations of features  provide an initial set of features from which to start search  biasing selection to produce understandable models Knowledge can also assist in the search for useful features by: We can apply these ideas to any representation of discovered knowledge, since they must include features as components.

Feature Selection in Process Modeling  construct initial models that include only a few variables  use generic processes, type constraints, and available terms to expand the best-scoring models  by adding new terms between ones in the current model  by adding new terms to the fringe of the current model  continue this forward selection scheme to construct ever more inclusive process models We hope to extend our methods for inducing process models to: This strategy mirrors the incremental way that scientists improve their models over time.

Concluding Remarks  serve as understandable results of discovery systems  provide useful inputs to discovery systems that guide search In summary, ideas from symbolic AI remain highly relevant to machine learning and discovery. These ideas revolve around using structural knowledge that can: One can combine knowledge-based approaches with statistical techniques to gain the benefits of both paradigms. Taken together, they offer a balanced and productive approach to computational induction.

End of Presentation

Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California

Similar presentations

Presentation on theme: "Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California

Similar presentations

Presentation on theme: "Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California"— Presentation transcript:

Similar presentations

About project

Feedback