Pat Langley Center for the Study of Language and Information Stanford University, Stanford, California

Slides:



Advertisements
Similar presentations
Computational Revision of Ecological Process Models
Advertisements

Pat Langley Dileep George Stephen Bay Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford,
Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California
Pat Langley Institute for the Study of Learning and Expertise Javier Sanchez CSLI / Stanford University Ljupco Todorovski Saso Dzeroski Jozef Stefan Institute.
Pat Langley Institute for the Study of Learning and Expertise Palo Alto, California and Center for the Study of Language and Information Stanford University,
Pat Langley Institute for the Study of Learning and Expertise Palo Alto, California and Center for the Study of Language and Information Stanford University,
Pat Langley Institute for the Study of Learning and Expertise and Center for the Study of Language and Information Stanford University, Stanford, California.
Pat Langley Institute for the Study of Learning and Expertise Palo Alto, California and Center for the Study of Language and Information Stanford University,
Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California
Pat Langley School of Computing and Informatics Arizona State University Tempe, Arizona USA Modeling Social Cognition in a Unified Cognitive Architecture.
Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California USA
Filtering Information in Complex Temporal Domains
Pat Langley School of Computing and Informatics Arizona State University Tempe, Arizona Institute for the Study of Learning and Expertise Palo Alto, California.
Pat Langley Institute for the Study of Learning and Expertise Palo Alto, California and Center for the Study of Language and Information Stanford University,
Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California USA
Pat Langley School of Computing and Informatics Arizona State University Tempe, Arizona A Cognitive Architecture for Integrated.
Pat Langley Institute for the Study of Learning and Expertise Palo Alto, California and Center for the Study of Language and Information Stanford University,
Pat Langley School of Computing and Informatics Arizona State University Tempe, Arizona Computational Discovery.
Pat Langley Center for the Study of Language and Information Stanford University, Stanford, California
Pat Langley Center for the Study of Language and Information Stanford University, Stanford, California
Pat Langley Institute for the Study of Learning and Expertise 2164 Staunton Court, Palo Alto, California and School of Computing and Informatics Arizona.
Pat Langley School of Computing and Informatics Arizona State University Tempe, Arizona Computational Discovery of Explanatory Process Models Thanks to.
Javier Sanchez Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California.
Pat Langley Institute for the Study of Learning and Expertise Palo Alto, California and Center for the Study of Language and Information Stanford University,
Pat Langley Institute for the Study of Learning and Expertise Palo Alto, California A Cognitive Architecture for Complex Learning.
Pat Langley Computer Science and Engineering / Psychology Arizona State University Tempe, Arizona Challenges and Opportunities in Informatics Research.
Construction process lasts until coding and testing is completed consists of design and implementation reasons for this phase –analysis model is not sufficiently.
Agent-based Modeling: A Brief Introduction Louis J. Gross The Institute for Environmental Modeling Departments of Ecology and Evolutionary Biology and.
Pat Langley School of Computing and Informatics Arizona State University Tempe, Arizona Institute for the Study of Learning and Expertise Palo Alto, California.
APRIL, Application of Probabilistic Inductive Logic Programming, IST Albert-Ludwigs-University, Freiburg, Germany & Imperial College of Science,
Integrating Bayesian Networks and Simpson’s Paradox in Data Mining Alex Freitas University of Kent Ken McGarry University of Sunderland.
1 Quantifying Opinion about a Logistic Regression using Interactive Graphics Paul Garthwaite The Open University Joint work with Shafeeqah Al-Awadhi.
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
Relational Data Mining in Finance Haonan Zhang CFWin /04/2003.
Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California
P ROCESSES AND C ONSTRAINTS IN S CIENTIFIC M ODEL C ONSTRUCTION Will Bridewell † and Pat Langley †‡ † Cognitive Systems Laboratory, CSLI, Stanford University.
Marakas: Decision Support Systems, 2nd Edition © 2003, Prentice-Hall Chapter Chapter 7: Expert Systems and Artificial Intelligence Decision Support.
Dynamic Models Lecture 13. Dynamic Models: Introduction Dynamic models can describe how variables change over time or explain variation by appealing to.
Classification and Prediction: Regression Analysis
Framework for K-12 Science Education
Crosscutting Concepts and Disciplinary Core Ideas February24, 2012 Heidi Schweingruber Deputy Director, Board on Science Education, NRC/NAS.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
B. RAMAMURTHY EAP#2: Data Mining, Statistical Analysis and Predictive Analytics for Automotive Domain CSE651C, B. Ramamurthy 1 6/28/2014.
Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved. Decision Support Systems Chapter 10.
Discovering Dynamic Models Lecture 21. Dynamic Models: Introduction Dynamic models can describe how variables change over time or explain variation by.
Pat Langley Adam Arvay Department of Computer Science University of Auckland Auckland, NZ Heuristic Induction of Rate-Based Process Models Thanks to W.
A Design Science (Multi-Methodological) Approach to IS Research Presented by: Dr. Jay F. Nunamaker, Jr. 1.
Speeding Up Relational Data Mining by Learning to Estimate Candidate Hypothesis Scores Frank DiMaio and Jude Shavlik UW-Madison Computer Sciences ICDM.
1 5 Nov 2002 Risto Pohjonen, Juha-Pekka Tolvanen MetaCase Consulting AUTOMATED PRODUCTION OF FAMILY MEMBERS: LESSONS LEARNED.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Developing and Evaluating Theories of Behavior.
Methodological Problems in Cognitive Psychology David Danks Institute for Human & Machine Cognition January 10, 2003.
Discovering Descriptive Knowledge Lecture 18. Descriptive Knowledge in Science In an earlier lecture, we introduced the representation and use of taxonomies.
Chapter 7 Sampling Distributions Statistics for Business (Env) 1.
Basics of Research and Development and Design STEM Education HON4013 ENGR1020 Learning and Action Cycles.
Estimating Component Availability by Dempster-Shafer Belief Networks Estimating Component Availability by Dempster-Shafer Belief Networks Lan Guo Lane.
Lecture 02.
Dendral: A Case Study Lecture 25.
Neural Modeling - Fall NEURAL TRANSFORMATION Strategy to discover the Brain Functionality Biomedical engineering Group School of Electrical Engineering.
Theme 2: Data & Models One of the central processes of science is the interplay between models and data Data informs model generation and selection Models.
Research for Nurses: Methods and Interpretation Chapter 1 What is research? What is nursing research? What are the goals of Nursing research?
Designing Factorial Experiments with Binary Response Tel-Aviv University Faculty of Exact Sciences Department of Statistics and Operations Research Hovav.
A comparative approach for gene network inference using time-series gene expression data Guillaume Bourque* and David Sankoff *Centre de Recherches Mathématiques,
Pat Langley School of Computing and Informatics Arizona State University Tempe, Arizona Computational Assistance for Systems Biology of Aging Thanks to.
Slide 7.1 Saunders, Lewis and Thornhill, Research Methods for Business Students, 5 th Edition, © Mark Saunders, Philip Lewis and Adrian Thornhill 2009.
Borrett et al Computational Discovery of Process Models for Aquatic Ecosystems August 2006 Ecological Society of America, Memphis, TN Natasa Atanasova.
Research Design
Introduction to Machine Learning, its potential usage in network area,
Chapter 7. Classification and Prediction
CSc4730/6730 Scientific Visualization
Causal Models Lecture 12.
Presentation transcript:

Pat Langley Center for the Study of Language and Information Stanford University, Stanford, California Computational Discovery of Communicable Scientific Models Thanks to N. Asgharbeygi, K. Arrigo, S. Bay, S. Dzeroski, J. Sanchez, Oren Shiran, and L. Todorovski for their contributions to this research, which is funded by a grant from the National Science Foundation.

Data Mining vs. Scientific Discovery Data mining generates knowledge cast as decision trees, logical rules, or other notations invented by AI researchers; Data mining generates knowledge cast as decision trees, logical rules, or other notations invented by AI researchers; Computational scientific discovery instead uses equations, structural models, reaction pathways, or other formalisms invented by scientists and engineers. Computational scientific discovery instead uses equations, structural models, reaction pathways, or other formalisms invented by scientists and engineers. There exist two computational paradigms for discovering explicit knowledge from data: Both approaches draw on heuristic search to find regularities in data, but they differ considerably in their emphases.

Lesson 1 NPPc = month max (E · IPAR, 0) E = 0.56 · T1 · T2 · W T1 = · Topt – · Topt 2 T2 = 1.18 / [(1 + e 0.2 · (Topt – Tempc – 10) ) · (1 + e 0.3 · (Tempc – Topt – 10) )] W = · EET / PET PET = 1.6 · (10 · Tempc / AHI) A · PET-TW-M if Tempc > 0 PET = 0 if Tempc < 0 A = · AHI 3 – · AHI · AHI IPAR = 0.5 · FPAR-FAS · Monthly-Solar · Sol-Conver FPAR-FAS = min [(SR-FAS – 1.08) / SR (UMD-VEG), 0.95] SR-FAS = (Mon-FAS-NDVI ) / (Mon-FAS-NDVI – 1000) DFR NBLANBLR RRPhoto PBS Health psbA1 psbA2 cpcB Light + Traditional notations from machine learning are not communicated easily to domain scientists. Ecosystem model Gene regulation model

m Lesson 2 Scientists often have initial models that should influence the discovery process. Discovery Initial model DFR NBLANBLR RRPhoto PBS Health psbA1 psbA2 cpcB Light + Observations Revised model × DFR NBLANBLR RRPhoto PBS Health psbA1 psbA2 cpcB Light + ×

Lesson 3 Scientific data are often rare and difficult to obtain rather than being plentiful. Ecosystem model Gene regulation model Number of variables Number of initial links Number of initial links Number of possible links Number of possible links Number of samples Number of samples Number of variables Number of equations Number of parameters Number of samples

Lesson 4 Scientists want models that move beyond description to provide explanations of their data. Ecosystem model Gene regulation model DFR NBLANBLR RRPhoto PBS Health psbA1 psbA2 cpcB Light + NPPc IPAR PET T1T2We_max E EET Tempc Topt NDVI SOLAR AHI A PETTWM SR FPAR VEG

Lesson 5 Scientists want computational assistance rather than automated discovery systems. Discovery Initial model DFR NBLANBLR RRPhoto PBS Health psbA1 psbA2 cpcB Light + Observations Revised model × DFR NBLANBLR RRPhoto PBS Health psbA1 psbA2 cpcB Light + ×

The Nature of Systems Science focus on synthesis rather than analysis in their operation; focus on synthesis rather than analysis in their operation; rely on computer modeling as one of their central methods; rely on computer modeling as one of their central methods; develop system-level models with many variables and relations; develop system-level models with many variables and relations; require that models make contact with known mechanisms. require that models make contact with known mechanisms. Disciplines like Earth science and computational biology differ from traditional fields in that they: However, existing methods for computational scientific discovery were not designed with systems science in mind.

Time Series from the Ross Sea Ecosystem

Inductive Process Modeling Our approach is to design and implement computational methods for inductive process modeling, which: represent scientific models as sets of quantitative processes; represent scientific models as sets of quantitative processes; use these models to predict and explain observational data; use these models to predict and explain observational data; search a space of process models to find good candidates; search a space of process models to find good candidates; utilize background knowledge to constrain this search. utilize background knowledge to constrain this search. This framework has great potential both for modeling scientific reasoning and aiding practicing scientists.

Existing Formalisms Are Inadequate d[ice_mass,t] = (18 heat) / 6.02 d[water_mass,t] = (18 heat) / 6.02 systems of equations B>6 C>0 C> regression trees gcd(X,X,X). gcd(X,Y,D) :- X<Y,Z is Y–X,gcd(X,Z,D). gcd(X,Y,D) :- Y<X,gcd(Y,X,D). Horn clause programs x =12, x =1 x =12, x =1 y =18, x =2 y =18, x =2 x =12, x =1 x =12, x =1 y =10, x =2 y =10, x =2 x =16, x =2 x =16, x =2 y =13, x =1 y =13, x =1 x =19, x =1 x =19, x =1 y =11, x =2 y =11, x = hidden Markov models

A Process Model for an Aquatic Ecosystem model AquaticEcosystem variables: phyto, zoo, nitro, residue observables: phyto, nitro process phyto_loss equations:d[phyto,t,1] = phyto equations:d[phyto,t,1] = phyto d[residue,t,1] = phyto process zoo_loss equations:d[zoo,t,1] = zoo equations:d[zoo,t,1] = zoo d[residue,t,1] = process zoo_phyto_grazing equations:d[zoo,t,1] = zoo equations:d[zoo,t,1] = zoo d[residue,t,1] = zoo d[phyto,t,1] = zoo process nitro_uptake conditions:nitro > 0 conditions:nitro > 0 equations:d[phyto,t,1] = phyto equations:d[phyto,t,1] = phyto d[nitro,t,1] = phyto process nitro_remineralization; equations:d[nitro,t,1] = residue equations:d[nitro,t,1] = residue d[residue,t,1 ] = residue

Advantages of Quantitative Process Models they embed quantitative relations within qualitative structure; they embed quantitative relations within qualitative structure; that refer to notations and mechanisms familiar to experts; that refer to notations and mechanisms familiar to experts; they provide dynamical predictions of changes over time; they provide dynamical predictions of changes over time; they offer causal and explanatory accounts of phenomena; they offer causal and explanatory accounts of phenomena; while retaining the modularity needed for induction/abduction. while retaining the modularity needed for induction/abduction. Process models offer scientists a promising framework because: Quantitative process models provide an important alternative to formalisms used currently in computational discovery.

Challenges of Inductive Process Modeling process models characterize behavior of dynamical systems; process models characterize behavior of dynamical systems; variables are continuous but can have discontinuous behavior; variables are continuous but can have discontinuous behavior; observations are not independently and identically distributed; observations are not independently and identically distributed; models may contain unobservable processes and variables; models may contain unobservable processes and variables; multiple processes can interact to produce complex behavior. multiple processes can interact to produce complex behavior. Process model induction differs from typical learning tasks in that: Compensating factors include a focus on deterministic systems and the availability of background knowledge.

Encoding Background Knowledge Horn clause programs (e.g., Towell & Shavlik, 1990) Horn clause programs (e.g., Towell & Shavlik, 1990) context-free grammars (e.g., Dzeroski & Todorovski, 1997) context-free grammars (e.g., Dzeroski & Todorovski, 1997) prior probability distributions (e.g., Friedman et al., 2000) prior probability distributions (e.g., Friedman et al., 2000) To constrain candidate models, we can utilize available backround knowledge about the domain. Previous work has encoded background knowledge in terms of: However, none of these notations are familiar to domain scientists, which suggests the need for another approach.

Generic Processes as Background Knowledge the variables involved in a process and their types; the variables involved in a process and their types; the parameters appearing in a process and their ranges; the parameters appearing in a process and their ranges; the forms of conditions on the process; and the forms of conditions on the process; and the forms of associated equations and their parameters. the forms of associated equations and their parameters. We cast background knowledge as generic processes that specify: Generic processes are building blocks from which one can compose a specific process model.

Generic Processes for Aquatic Ecosystems generic process exponential_lossgeneric process remineralization variables: S{species}, D{detritus} variables: N{nutrient}, D{detritus} variables: S{species}, D{detritus} variables: N{nutrient}, D{detritus} parameters: [0, 1] parameters: [0, 1] parameters: [0, 1] parameters: [0, 1] equations:d[S,t,1] = 1 S equations:d[N, t,1] = D equations:d[S,t,1] = 1 S equations:d[N, t,1] = D d[D,t,1] = Sd[D, t,1] = 1 D generic process grazinggeneric process constant_inflow variables: S1{species}, S2{species}, D{detritus} variables: N{nutrient} variables: S1{species}, S2{species}, D{detritus} variables: N{nutrient} parameters: [0, 1], [0, 1] parameters: [0, 1] parameters: [0, 1], [0, 1] parameters: [0, 1] equations:d[S1,t,1] = S1 equations:d[N,t,1] = equations:d[S1,t,1] = S1 equations:d[N,t,1] = d[D,t,1] = (1 ) S1 d[S2,t,1] = 1 S1 generic process nutrient_uptake variables: S{species}, N{nutrient} variables: S{species}, N{nutrient} parameters: [0, ], [0, 1], [0, 1] parameters: [0, ], [0, 1], [0, 1] conditions:N > conditions:N > equations:d[S,t,1] = S equations:d[S,t,1] = S d[N,t,1] = 1 S

process exponential_growth variables: P {population} variables: P {population} equations: d[P,t] = [0, 1, ] P equations: d[P,t] = [0, 1, ] P process logistic_growth variables: P {population} variables: P {population} equations: d[P,t] = [0, 1, ] P (1 P / [0, 1, ]) equations: d[P,t] = [0, 1, ] P (1 P / [0, 1, ]) process constant_inflow variables: I {inorganic_nutrient} variables: I {inorganic_nutrient} equations: d[I,t] = [0, 1, ] equations: d[I,t] = [0, 1, ] process consumption variables: P1 {population}, P2 {population}, nutrient_P2 variables: P1 {population}, P2 {population}, nutrient_P2 equations: d[P1,t] = [0, 1, ] P1 nutrient_P2, equations: d[P1,t] = [0, 1, ] P1 nutrient_P2, d[P2,t] = [0, 1, ] P1 nutrient_P2 d[P2,t] = [0, 1, ] P1 nutrient_P2 process no_saturation variables: P {number}, nutrient_P {number} variables: P {number}, nutrient_P {number} equations: nutrient_P = P equations: nutrient_P = P process saturation variables: P {number}, nutrient_P {number} variables: P {number}, nutrient_P {number} equations: nutrient_P = P / (P + [0, 1, ]) equations: nutrient_P = P / (P + [0, 1, ]) Inducing Process Models model AquaticEcosystem variables: nitro, phyto, zoo, nutrient_nitro, nutrient_phyto observables: nitro, phyto, zoo process phyto_exponential_growth equations: d[phyto,t] = 0.1 phyto equations: d[phyto,t] = 0.1 phyto process zoo_logistic_growth equations: d[zoo,t] = 0.1 zoo / (1 zoo / 1.5) equations: d[zoo,t] = 0.1 zoo / (1 zoo / 1.5) process phyto_nitro_consumption equations: d[nitro,t] = 1 phyto nutrient_nitro, equations: d[nitro,t] = 1 phyto nutrient_nitro, d[phyto,t] = 1 phyto nutrient_nitro d[phyto,t] = 1 phyto nutrient_nitro process phyto_nitro_no_saturation equations: nutrient_nitro = nitro equations: nutrient_nitro = nitro process zoo_phyto_consumption equations: d[phyto,t] = 1 zoo nutrient_phyto, equations: d[phyto,t] = 1 zoo nutrient_phyto, d[zoo,t] = 1 zoo nutrient_phyto d[zoo,t] = 1 zoo nutrient_phyto process zoo_phyto_saturation equations: nutrient_phyto = phyto / (phyto + 0.5) equations: nutrient_phyto = phyto / (phyto + 0.5) Induction training data generic processes process model

A Method for Process Model Construction 1. Find all ways to instantiate known generic processes with specific variables, subject to type constraints; 2. Combine instantiated processes into candidate generic models subject to additional constraints (e.g., number of processes); 3. For each generic model, carry out search through parameter space to find good coefficients; 4. Return the parameterized model with the best overall score. The IPM algorithm constructs explanatory models from generic elements components in four stages: Our typical evaluation metric is squared error, but we have also explored other measures of explanatory adequacy.

Estimating Parameters in Process Models 1. Selects random initial values that fall within ranges specified in the generic processes; 2. Improves these parameters using the Levenberg-Marquardt method until it reaches a local optimum; 3. Generates new candidate values through random jumps along dimensions of the parameter vector and continue search; 4. If no improvement occurs after N jumps, it restarts the search from a new random initial point. To estimate the parameters for each generic model structure, the IPM algorithm: This multi-level method gives reasonable fits to time-series data from a number of domains, but it is computationally intensive.

Observations from the Ross Sea

Results on Training Data from Ross Sea

Results on Test Data from Ross Sea

Results on a Protist Ecosystem

Results on Rinkobing Fjord

Results on Biochemical Kinetics observed trajectories predicted trajectories

specify a quantitative process model of the target system; specify a quantitative process model of the target system; display and edit the models structure and details graphically; display and edit the models structure and details graphically; simulate the models behavior over time and situations; simulate the models behavior over time and situations; compare the models predicted behavior to observations; compare the models predicted behavior to observations; invoke a revision module in response to detected anomalies. invoke a revision module in response to detected anomalies. Because few scientists want to be replaced, we are developing an interactive environment, P ROMETHEUS, that lets users: The environment offers computational assistance in forming and evaluating models but lets the user retain control. Interfacing with Scientists

Viewing a Process Model Graphically

Indicating Processes to Consider Adding

Specifying Data and Search Parameters

Inspecting Revised Process Models

computational scientific discovery (e.g., Langley et al., 1983); computational scientific discovery (e.g., Langley et al., 1983); theory revision in machine learning (e.g., Towell, 1991); theory revision in machine learning (e.g., Towell, 1991); qualitative physics and simulation (e.g., Forbus, 1984); qualitative physics and simulation (e.g., Forbus, 1984); languages for scientific simulation (e.g., STELLA, MATLAB ); languages for scientific simulation (e.g., STELLA, MATLAB ); interactive tools for data analysis (e.g., Schneiderman, 2001). interactive tools for data analysis (e.g., Schneiderman, 2001). Intellectual Influences Our approach to computational discovery incorporates ideas from many traditions: Our work combines, in novel ways, insights from machine learning, AI, programming languages, and human-computer interaction.

Contributions of the Research a new formalism for representing scientific process models; a new formalism for representing scientific process models; a computational method for simulating these models behavior; a computational method for simulating these models behavior; an encoding for background knowledge as generic processes; an encoding for background knowledge as generic processes; an algorithm for inducing process models from time-series data; an algorithm for inducing process models from time-series data; an interactive environment for model construction/utilization. an interactive environment for model construction/utilization. In summary, our work on computational scientific discovery has, in responding to various challenges, produced: We have demonstrated this approach to model creation on domains from Earth science, microbiology, and engineering.

Some Recent Extensions heuristic beam search through the space of process models; heuristic beam search through the space of process models; hierarchical generic processes that further constrain search; hierarchical generic processes that further constrain search; an ensemble-like method that mitigates overfitting effects; an ensemble-like method that mitigates overfitting effects; metrics for explanatory adequacy based on trajectory shapes. metrics for explanatory adequacy based on trajectory shapes. In recent work, we have extended our approach to incorporate: Inductive process modeling has great potential to speed progress in systems science and engineering.

End of Presentation