Pat Langley Center for the Study of Language and Information Stanford University, Stanford, California

Slides:



Advertisements
Similar presentations
Computational Revision of Ecological Process Models
Advertisements

Chapter 14: Usability testing and field studies
Jeremy S. Bradbury, James R. Cordy, Juergen Dingel, Michel Wermelinger
1 Probability and the Web Ken Baclawski Northeastern University VIStology, Inc.
Pat Langley Dileep George Stephen Bay Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford,
Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California
Pat Langley Institute for the Study of Learning and Expertise Javier Sanchez CSLI / Stanford University Ljupco Todorovski Saso Dzeroski Jozef Stefan Institute.
Pat Langley Institute for the Study of Learning and Expertise Palo Alto, California and Center for the Study of Language and Information Stanford University,
Pat Langley Institute for the Study of Learning and Expertise Palo Alto, California and Center for the Study of Language and Information Stanford University,
Pat Langley Institute for the Study of Learning and Expertise and Center for the Study of Language and Information Stanford University, Stanford, California.
Pat Langley Institute for the Study of Learning and Expertise Palo Alto, California and Center for the Study of Language and Information Stanford University,
Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California
General learning in multiple domains transfer of learning across domains Generality and Transfer in Learning training items test items training items test.
Pat Langley Center for the Study of Language and Information Stanford University, Stanford, California
Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California USA
Filtering Information in Complex Temporal Domains
Pat Langley Institute for the Study of Learning and Expertise Palo Alto, California and Center for the Study of Language and Information Stanford University,
Pat Langley Institute for the Study of Learning and Expertise Palo Alto, California and Center for the Study of Language and Information Stanford University,
Pat Langley School of Computing and Informatics Arizona State University Tempe, Arizona Computational Discovery.
Pat Langley Center for the Study of Language and Information Stanford University, Stanford, California
Pat Langley Institute for the Study of Learning and Expertise 2164 Staunton Court, Palo Alto, California and School of Computing and Informatics Arizona.
Pat Langley School of Computing and Informatics Arizona State University Tempe, Arizona Computational Discovery of Explanatory Process Models Thanks to.
Javier Sanchez Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California.
Pat Langley Institute for the Study of Learning and Expertise Palo Alto, California and Center for the Study of Language and Information Stanford University,
Construction process lasts until coding and testing is completed consists of design and implementation reasons for this phase –analysis model is not sufficiently.
Multiple Regression and Model Building
The University of Michigan Georgia Institute of Technology
Pat Langley School of Computing and Informatics Arizona State University Tempe, Arizona Institute for the Study of Learning and Expertise Palo Alto, California.
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
Relational Data Mining in Finance Haonan Zhang CFWin /04/2003.
Knowledge Acquisitioning. Definition The transfer and transformation of potential problem solving expertise from some knowledge source to a program.
Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Marakas: Decision Support Systems, 2nd Edition © 2003, Prentice-Hall Chapter Chapter 7: Expert Systems and Artificial Intelligence Decision Support.
Developing Ideas for Research and Evaluating Theories of Behavior
Section 2: Science as a Process
Virginia Standard of Learning BIO.1a-m
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Thanks to K. Arrigo, G. Bradshaw, S. Borrett, W. Bridewell, S. Dzeroski, H. Simon, L. Todorovski, and J. Zytkow for their contributions to this research,
Unit 2: Engineering Design Process
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Taxonomies and Laws Lecture 10. Taxonomies and Laws Taxonomies enumerate scientifically relevant classes and organize them into a hierarchical structure,
Discovering Dynamic Models Lecture 21. Dynamic Models: Introduction Dynamic models can describe how variables change over time or explain variation by.
Pat Langley Adam Arvay Department of Computer Science University of Auckland Auckland, NZ Heuristic Induction of Rate-Based Process Models Thanks to W.
Thanks to G. Bradshaw, W. Bridewell, S. Dzeroski, H. A. Simon, L. Todorovski, R. Valdes-Perez, and J. Zytkow for discussions that led to many of these.
Introduction to Science Informatics Lecture 1. What Is Science? a dependence on external verification; an expectation of reproducible results; a focus.
Speeding Up Relational Data Mining by Learning to Estimate Candidate Hypothesis Scores Frank DiMaio and Jude Shavlik UW-Madison Computer Sciences ICDM.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Developing and Evaluating Theories of Behavior.
Discovering Descriptive Knowledge Lecture 18. Descriptive Knowledge in Science In an earlier lecture, we introduced the representation and use of taxonomies.
Discovering Structural Models Lecture 19. Structural Models in Science Structural models encode the spatial relationships among the components of some.
Estimating Component Availability by Dempster-Shafer Belief Networks Estimating Component Availability by Dempster-Shafer Belief Networks Lan Guo Lane.
Introduction to Earth Science Section 2 Section 2: Science as a Process Preview Key Ideas Behavior of Natural Systems Scientific Methods Scientific Measurements.
Dendral: A Case Study Lecture 25.
Neural Modeling - Fall NEURAL TRANSFORMATION Strategy to discover the Brain Functionality Biomedical engineering Group School of Electrical Engineering.
The Scientific Method. Objectives Explain how science is different from other forms of human endeavor. Identify the steps that make up scientific methods.
Theme 2: Data & Models One of the central processes of science is the interplay between models and data Data informs model generation and selection Models.
Lecture 2: Statistical learning primer for biologists
Research for Nurses: Methods and Interpretation Chapter 1 What is research? What is nursing research? What are the goals of Nursing research?
Data Mining and Decision Support
Nonlinear differential equation model for quantification of transcriptional regulation applied to microarray data of Saccharomyces cerevisiae Vu, T. T.,
Pat Langley School of Computing and Informatics Arizona State University Tempe, Arizona Computational Assistance for Systems Biology of Aging Thanks to.
CORRELATION-REGULATION ANALYSIS Томский политехнический университет.
Expert System / Knowledge-based System Dr. Ahmed Elfaig 1.ES can be defined as computer application program that makes decision or solves problem in a.
Chapter 7. Classification and Prediction
Section 2: Science as a Process
Pat Langley Department of Computer Science University of Auckland
CSc4730/6730 Scientific Visualization
Developing and Evaluating Theories of Behavior
Causal Models Lecture 12.
Discovery Informatics
Presentation transcript:

Pat Langley Center for the Study of Language and Information Stanford University, Stanford, California Computational Discovery of Explanatory Process Models Thanks to N. Asgharbeygi, K. Arrigo, S. Bay, A. Pohorille, J. Sanchez, K. Saito, and J. Shrager for their contributions to this research.

Data Mining vs. Scientific Discovery induce predictive models from large, often business, data sets; induce predictive models from large, often business, data sets; cast models as decision trees, logical rules, or other notations invented by AI researchers. cast models as decision trees, logical rules, or other notations invented by AI researchers. There exist two computational paradigms for discovering explicit knowledge from data. The data mining movement develops computational methods that: Both approaches draw on heuristic search to find regularities in data, but they differ considerably in their emphases. constructing models from (often small) scientific data sets; constructing models from (often small) scientific data sets; stated in formalisms invented by scientists and engineers. stated in formalisms invented by scientists and engineers. In contrast, computational scientific discovery focuses on:

In Memoriam Herbert A. Simon (1916 – 2001) Herbert A. Simon (1916 – 2001) Jan M. Zytkow (1945 – 2001) Jan M. Zytkow (1945 – 2001) Three years ago, computational scientific discovery lost two of its founding fathers: Both contributed to the field in many ways: posing new problems, inventing methods, training students, and organizing meetings. Moreover, both were interdisciplinary researchers who contributed to computer science, psychology, philosophy, and statistics. Herb Simon and Jan Zytkow were excellent role models who we should all aim to emulate.

Time Line for Research on Computational Scientific Discovery Bacon.1–Bacon.5 Abacus, Coper Fahrehneit, E*, Tetrad, IDS N Hume, ARC DST, GP N LaGrange SDS SSF, RF5, LaGramge Dalton, Stahl RL, Progol Gell-Mann BR-3, Mendel Pauli Stahlp, Revolver Dendral AM GlauberNGlauber IDS Q, Live IE Coast, Phineas, AbE, Kekada Mechem, CDP Astra, GP M HR BR-4 Numeric lawsQualitative lawsStructural modelsProcess models Legend

Successes of Computational Scientific Discovery Over the past decade, systems of this type have helped discover new knowledge in many scientific fields: qualitative chemical factors in mutagenesis (King et al., 1996) qualitative chemical factors in mutagenesis (King et al., 1996) quantitative laws of metallic behavior (Sleeman et al., 1997) quantitative laws of metallic behavior (Sleeman et al., 1997) qualitative conjectures in number theory (Colton et al., 2000) qualitative conjectures in number theory (Colton et al., 2000) temporal laws of ecological behavior (Todorovski et al., 2000) temporal laws of ecological behavior (Todorovski et al., 2000) reaction pathways in catalytic chemistry (Valdes-Perez, 1994) reaction pathways in catalytic chemistry (Valdes-Perez, 1994) Each has led to publications in the refereed scientific literature (e.g., Langley, 2000), but they did not focus on systems science.

The Nature of Systems Science focus on synthesis rather than analysis in their operation; focus on synthesis rather than analysis in their operation; rely on computer modeling as one of their central methods; rely on computer modeling as one of their central methods; develop system-level models with many variables and relations; develop system-level models with many variables and relations; evaluate their models on observational, not experimental, data. evaluate their models on observational, not experimental, data. Disciplines like Earth science and computational biology differ from traditional fields in that they: Developing and testing such models are complex tasks that would benefit from computational aids. However, existing methods for computational scientific discovery were not designed with systems science in mind.

Observations from the Ross Sea

Inductive Process Modeling Our response is to design, construct, and evaluate computational methods for inductive process modeling, which: represent scientific models as sets of quantitative processes; represent scientific models as sets of quantitative processes; use these models to predict and explain observational data; use these models to predict and explain observational data; search a space of process models to find good candidates; search a space of process models to find good candidates; utilize background knowledge to constrain this search. utilize background knowledge to constrain this search. This framework has great potential for aiding systems science, but it raises new computational challenges.

Challenges of Inductive Process Modeling process models characterize behavior of dynamical systems; process models characterize behavior of dynamical systems; variables are continuous but can have discontinuous behavior; variables are continuous but can have discontinuous behavior; observations are not independently and identically distributed; observations are not independently and identically distributed; models may contain unobservable processes and variables; models may contain unobservable processes and variables; multiple processes can interact to produce complex behavior. multiple processes can interact to produce complex behavior. Process model induction differs from typical learning tasks in that: Compensating factors include a focus on deterministic systems and the availability of background knowledge.

Issue 1: Representing Scientific Models address observational rather than experimental data; address observational rather than experimental data; deal with dynamic systems that change over time; deal with dynamic systems that change over time; have an explanatory rather than a descriptive character; have an explanatory rather than a descriptive character; are causal in that they describe chains of effects; are causal in that they describe chains of effects; contain quantitative relations and qualitative structure. contain quantitative relations and qualitative structure. To assist system scientists modeling efforts, we must first encode candidate models that: We need some formal way to represent such models that can be interpreted computationally.

Why Are Existing Formalisms Inadequate? d[ice_mass,t] = (18 heat) / 6.02 d[water_mass,t] = (18 heat) / 6.02 systems of equations B>6 C>0 C> regression trees gcd(X,X,X). gcd(X,Y,D) :- X<Y,Z is Y–X,gcd(X,Z,D). gcd(X,Y,D) :- Y<X,gcd(Y,X,D). Horn clause programs x =12, x =1 x =12, x =1 y =18, x =2 y =18, x =2 x =12, x =1 x =12, x =1 y =10, x =2 y =10, x =2 x =16, x =2 x =16, x =2 y =13, x =1 y =13, x =1 x =19, x =1 x =19, x =1 y =11, x =2 y =11, x = hidden Markov models

A Process Model for an Aquatic Ecosystem model Ross_Sea_Ecosystem variables: phyto, nitro, residue, light, growth_rate, effective_light, ice_factor observables: phyto, nitro, light, ice_factor process phyto_loss equations:d[phyto,t,1] = 0.1 phyto equations:d[phyto,t,1] = 0.1 phyto d[residue,t,1] = 0.1 phyto process phyto_growth equations:d[phyto,t,1] = growth_rate phyto equations:d[phyto,t,1] = growth_rate phyto process phyto_uptakes_nitro conditions:nitro > 0 conditions:nitro > 0 equations:d[nitro,t,1] = growth_rate phyto equations:d[nitro,t,1] = growth_rate phyto process growth_limitation equations:growth_rate = 0.23 min(nitrate_rate, light_rate) equations:growth_rate = 0.23 min(nitrate_rate, light_rate) process nitrate_availability equations:nitrate_rate = nitrate / (nitrate + 5) equations:nitrate_rate = nitrate / (nitrate + 5) process light_availability equations:light_rate = effective_light / (effective_light + 50) equations:light_rate = effective_light / (effective_light + 50) process light_attenuation equations:effective_light = light ice_factor equations:effective_light = light ice_factor

Advantages of Quantitative Process Models they embed quantitative relations within qualitative structure; they embed quantitative relations within qualitative structure; that refer to notations and mechanisms familiar to scientists; that refer to notations and mechanisms familiar to scientists; they provide dynamical predictions of changes over time; they provide dynamical predictions of changes over time; they offer causal and explanatory accounts of phenomena; they offer causal and explanatory accounts of phenomena; while retaining the modularity needed to support induction. while retaining the modularity needed to support induction. Process models are a good target for discovery systems because: Quantitative process models provide an important alternative to formalisms used currently in computational discovery.

Issue 2: Generating Predictions and Explanations To utilize or evaluate a given process model, we must simulate its behavior over time: specify initial values for input variables and time step size; specify initial values for input variables and time step size; on each time step, determine which processes are active; on each time step, determine which processes are active; solve active algebraic/differential equations with known values; solve active algebraic/differential equations with known values; propagate values and recursively solve other active equations; propagate values and recursively solve other active equations; when multiple processes influence the same variable, assume their effects are additive. when multiple processes influence the same variable, assume their effects are additive. This performance method makes specific predictions that we can compare to observations.

Issue 3: Encoding Background Knowledge Horn clause programs (e.g., Towell & Shavlik, 1990) Horn clause programs (e.g., Towell & Shavlik, 1990) context-free grammars (e.g., Dzeroski & Todorovski, 1997) context-free grammars (e.g., Dzeroski & Todorovski, 1997) prior probability distributions (e.g., Friedman et al., 2000) prior probability distributions (e.g., Friedman et al., 2000) To constrain candidate models, we can utilize available backround knowledge about the domain. Previous work has encoded background knowledge in terms of: However, none of these notations are familiar to domain scientists, which suggests the need for another approach.

Generic Processes as Background Knowledge the variables involved in a process and their types; the variables involved in a process and their types; the parameters appearing in a process and their ranges; the parameters appearing in a process and their ranges; the forms of conditions on the process; and the forms of conditions on the process; and the forms of associated equations and their parameters. the forms of associated equations and their parameters. Our framework casts background knowledge as generic processes that specify: Generic processes are building blocks from which one can compose a specific process model.

Generic Processes for Aquatic Ecosystems generic process exponential_lossgeneric process remineralization variables: S{species}, D{detritus} variables: N{nutrient}, D{detritus} variables: S{species}, D{detritus} variables: N{nutrient}, D{detritus} parameters: [0, 1] parameters: [0, 1] parameters: [0, 1] parameters: [0, 1] equations:d[S,t,1] = 1 S equations:d[N, t,1] = D equations:d[S,t,1] = 1 S equations:d[N, t,1] = D d[D,t,1] = Sd[D, t,1] = 1 D generic process grazinggeneric process constant_inflow variables: S1{species}, S2{species}, D{detritus} variables: N{nutrient} variables: S1{species}, S2{species}, D{detritus} variables: N{nutrient} parameters: [0, 1], [0, 1] parameters: [0, 1] parameters: [0, 1], [0, 1] parameters: [0, 1] equations:d[S1,t,1] = S1 equations:d[N,t,1] = equations:d[S1,t,1] = S1 equations:d[N,t,1] = d[D,t,1] = (1 ) S1 d[S2,t,1] = 1 S1 generic process nutrient_uptake variables: S{species}, N{nutrient} variables: S{species}, N{nutrient} parameters: [0, ], [0, 1], [0, 1] parameters: [0, ], [0, 1], [0, 1] conditions:N > conditions:N > equations:d[S,t,1] = S equations:d[S,t,1] = S d[N,t,1] = 1 S

process exponential_growth variables: P {population} variables: P {population} equations: d[P,t] = [0, 1, ] P equations: d[P,t] = [0, 1, ] P process logistic_growth variables: P {population} variables: P {population} equations: d[P,t] = [0, 1, ] P (1 P / [0, 1, ]) equations: d[P,t] = [0, 1, ] P (1 P / [0, 1, ]) process constant_inflow variables: I {inorganic_nutrient} variables: I {inorganic_nutrient} equations: d[I,t] = [0, 1, ] equations: d[I,t] = [0, 1, ] process consumption variables: P1 {population}, P2 {population}, nutrient_P2 variables: P1 {population}, P2 {population}, nutrient_P2 equations: d[P1,t] = [0, 1, ] P1 nutrient_P2, equations: d[P1,t] = [0, 1, ] P1 nutrient_P2, d[P2,t] = [0, 1, ] P1 nutrient_P2 d[P2,t] = [0, 1, ] P1 nutrient_P2 process no_saturation variables: P {number}, nutrient_P {number} variables: P {number}, nutrient_P {number} equations: nutrient_P = P equations: nutrient_P = P process saturation variables: P {number}, nutrient_P {number} variables: P {number}, nutrient_P {number} equations: nutrient_P = P / (P + [0, 1, ]) equations: nutrient_P = P / (P + [0, 1, ]) Issue 4: Inducing Process Models model AquaticEcosystem variables: nitro, phyto, zoo, nutrient_nitro, nutrient_phyto observables: nitro, phyto, zoo process phyto_exponential_growth equations: d[phyto,t] = 0.1 phyto equations: d[phyto,t] = 0.1 phyto process zoo_logistic_growth equations: d[zoo,t] = 0.1 zoo / (1 zoo / 1.5) equations: d[zoo,t] = 0.1 zoo / (1 zoo / 1.5) process phyto_nitro_consumption equations: d[nitro,t] = 1 phyto nutrient_nitro, equations: d[nitro,t] = 1 phyto nutrient_nitro, d[phyto,t] = 1 phyto nutrient_nitro d[phyto,t] = 1 phyto nutrient_nitro process phyto_nitro_no_saturation equations: nutrient_nitro = nitro equations: nutrient_nitro = nitro process zoo_phyto_consumption equations: d[phyto,t] = 1 zoo nutrient_phyto, equations: d[phyto,t] = 1 zoo nutrient_phyto, d[zoo,t] = 1 zoo nutrient_phyto d[zoo,t] = 1 zoo nutrient_phyto process zoo_phyto_saturation equations: nutrient_phyto = phyto / (phyto + 0.5) equations: nutrient_phyto = phyto / (phyto + 0.5) Induction training data generic processes process model

A Method for Process Model Induction 1. Find all ways to instantiate known generic processes with specific variables, subject to type constraints; 2. Combine instantiated processes into candidate generic models subject to additional constraints (e.g., number of processes); 3. For each generic model, carry out search through parameter space to find good coefficients; 4. Return the parameterized model with the best overall score. We have implemented the IPM algorithm, which induces process models from generic components in four stages: The evaluation metric can be squared error or description length (e.g., M D = (M V + M C ) log (n) + n log (M E ).

Estimating Parameters in Process Models 1. Selects random initial values that fall within ranges specified in the generic processes; 2. Improves these parameters using the Levenberg-Marquardt method until it reaches a local optimum; 3. Generates new candidate values through random jumps along dimensions of the parameter vector and continue search; 4. If no improvement occurs after N jumps, it restarts the search from a new random initial point. To estimate the parameters for each generic model structure, the IPM algorithm: This multi-level method gives reasonable fits to time-series data from a number of domains, but it is computationally intensive.

identifying conditions on component processes identifying conditions on component processes inferring initial values of unobservable variables inferring initial values of unobservable variables keeping the structural search space tractable keeping the structural search space tractable reducing variance to mitigate overfitting effects reducing variance to mitigate overfitting effects Inductive process modeling raises a number of issues that have clear analogues in other paradigms: We have demonstrated promising responses to these problems within the IPM framework. More Issues in Process Model Induction

Evaluation of the IPM Algorithm 1. We used the aquatic ecosystem model to generate data sets over 100 time steps for the variables nitro and phyto; 2. We replaced each true value x with x (1 + r n), where r followed a Gaussian distribution ( = 0, = 1) and n > 0; 3. We ran IPM on these noisy data, giving it type constraints and generic processes as background knowledge. To demonstrate IPM's ability to induce process models, we ran it on synthetic data for a known system: In two experiments, we let IPM determine the initial values and thresholds given the correct structure; in a third study, we let it search through a space of 256 generic model structures.

Experimental Results with IPM The main results of our studies with IPM on synthetic data were: 1. The system infers accurate estimates for the initial values of unobservable variables like zoo and residue; 2. The system induces estimates of condition thresholds on nitro that are close to the target values; and 3. The MDL criterion selects the correct model structure in all runs with 5% noise, but only 40% of runs with 10% noise. These suggest that the basic approach is sound, but that we should consider more MDL schemes and other responses to overfitting.

Observations from the Ross Sea

Results on Training Data from Ross Sea

Results on Test Data from Ross Sea

Collecting Data on Photosynthetic Processes External stimuli (e.g., light) Adaptation Period Sampling mRNA/cDNA Equlibrium Period MicroarrayTrace Continuous Culture (Chemostat) /wwwscience.murdoch.edu.au/teach Health of Culture Time

Gene Expressions for Cyanobacteria

Generic Processes for Photosynthesis Regulation generic process translationgeneric process transcription variables: P{protein}, M{mRNA} variables: M{mRNA}, R{rate} variables: P{protein}, M{mRNA} variables: M{mRNA}, R{rate} parameters: [0, 1] parameters: parameters: [0, 1] parameters: equations:d[P,t,1] = M equations:d[M,t,1] = R equations:d[P,t,1] = M equations:d[M,t,1] = R generic process regulate_onegeneric process regulate_two variables: R{rate}, S{signal} variables: R{rate}, S{signal} variables: R{rate}, S{signal} variables: R{rate}, S{signal} parameters: [ 1, 1] parameters: [ 1, 1], [0, 1] parameters: [ 1, 1] parameters: [ 1, 1], [0, 1] equations:R = S equations:R = S equations:R = S equations:R = S d[S, t,1] = 1 S generic process automatic_degradationgeneric process controlled_degradation variables: C{concentration} variables: D{concentration}, E{concentration} variables: C{concentration} variables: D{concentration}, E{concentration} conditions:C > 0 conditions:D > 0, E > 0 conditions:C > 0 conditions:D > 0, E > 0 parameters: [0, 1] parameters: [0, 1] parameters: [0, 1] parameters: [0, 1] equations:d[C,t,1] = 1 C equations:d[D,t,1] = 1 E equations:d[C,t,1] = 1 C equations:d[D,t,1] = 1 E d[E,t,1] = 1 E generic process photosynthesis variables: L{light}, P{protein}, R{redox}, S{ROS} variables: L{light}, P{protein}, R{redox}, S{ROS} parameters: [0, 1], [0, 1] parameters: [0, 1], [0, 1] equations:d[R,t,1] = L P equations:d[R,t,1] = L P d[S,t,1] = L P

A Process Model for Photosynthetic Regulation model photo_regulation variables: light, mRNA_protein, ROS, redox, transcription_rate observables: light, mRNA process photosynthesis; equations:d[redox,t,1] = light protein equations:d[redox,t,1] = light protein d[ROS,t,1] = light protein process protein_translationprocess mRNA_transcription equations:d[protein,t,1] = 7.54 mRNA equations:d[mRNA,t,1] = transcription_rate equations:d[protein,t,1] = 7.54 mRNA equations:d[mRNA,t,1] = transcription_rate process regulate_one_1process regulate_two_2 equations: transcription_rate = 0.99 light equations:transcription_rate = redox equations: transcription_rate = 0.99 light equations:transcription_rate = redox d[redox,t,1] = redox process automatic_degradation_1process controlled_degradation_1 conditions:protein > 0 conditions:redox > 0, ROS > 0 conditions:protein > 0 conditions:redox > 0, ROS > 0 equations:d[protein,t,1] = 1.91 protein equations:d[redox,t,1] = ROS equations:d[protein,t,1] = 1.91 protein equations:d[redox,t,1] = ROS d[ROS,t,1] = ROS

Predictions from Best Parameterized Model

Electric Power on the International Space Station

Results on Battery Test Data

Results on Data from Rinkobing Fjord

specify a quantitative process model of the target system; specify a quantitative process model of the target system; display and edit the models structure and details graphically; display and edit the models structure and details graphically; simulate the models behavior over time and situations; simulate the models behavior over time and situations; compare the models predicted behavior to observations; compare the models predicted behavior to observations; invoke a revision module in response to detected anomalies. invoke a revision module in response to detected anomalies. Because few scientists want to be replaced, we are developing an interactive environment that lets users: The environment offers computational assistance in forming and evaluating models but lets the user retain control. Issue 5: Interfacing with Scientists

Viewing and Editing a Process Model

Results of Revising the NPP Model Initial model: E = 0.56 · T1 · T2 · W E = 0.56 · T1 · T2 · W T2 = 1.18 / [(1 + e 0.2 · (Topt – Tempc – 10) ) · (1 + e 0.3 · (Tempc – Topt – 10) )] T2 = 1.18 / [(1 + e 0.2 · (Topt – Tempc – 10) ) · (1 + e 0.3 · (Tempc – Topt – 10) )] PET = 1.6 · (10 · Tempc / AHI) A · PET-TW-M PET = 1.6 · (10 · Tempc / AHI) A · PET-TW-M SR {3.06, 4.35, 4.35, 4.05, 5.09, 3.06, 4.05, 4.05, 4.05, 5.09, 4.05} SR {3.06, 4.35, 4.35, 4.05, 5.09, 3.06, 4.05, 4.05, 4.05, 5.09, 4.05} RMSE on training data = and r 2 = Revised model: E = · T · T · W 0.00 E = · T · T · W 0.00 T2 = 0.83 / [(1 + e 1.0 · (Topt – Tempc – 6.34) ) · (1 + e 1.0 · (Tempc – Topt – 11.52) )] T2 = 0.83 / [(1 + e 1.0 · (Topt – Tempc – 6.34) ) · (1 + e 1.0 · (Tempc – Topt – 11.52) )] PET = 1.6 · (10 · Tempc / AHI) A · PET-TW-M PET = 1.6 · (10 · Tempc / AHI) A · PET-TW-M SR {0.61, 3.99, 2.44, 10.0, 2.21, 2.13, 2.04, 0.43, 1.35, 1.85, 1.61} SR {0.61, 3.99, 2.44, 10.0, 2.21, 2.13, 2.04, 0.43, 1.35, 1.85, 1.61} Cross-validated RMSE = and r 2 = [ 15 % reduction ]

computational scientific discovery (e.g., Langley et al., 1983); computational scientific discovery (e.g., Langley et al., 1983); theory revision in machine learning (e.g., Towell, 1991); theory revision in machine learning (e.g., Towell, 1991); qualitative physics and simulation (e.g., Forbus, 1984); qualitative physics and simulation (e.g., Forbus, 1984); languages for scientific simulation (e.g., STELLA, MATLAB ); languages for scientific simulation (e.g., STELLA, MATLAB ); interactive tools for data analysis (e.g., Schneiderman, 2001). interactive tools for data analysis (e.g., Schneiderman, 2001). Intellectual Influences Our approach to computational discovery incorporates ideas from many traditions: Our work combines, in novel ways, insights from machine learning, AI, programming languages, and human-computer interaction.

Contributions of the Research a new formalism for representing scientific process models; a new formalism for representing scientific process models; a computational method for simulating these models behavior; a computational method for simulating these models behavior; an encoding for background knowledge as generic processes; an encoding for background knowledge as generic processes; an algorithm for inducing process models from time-series data; an algorithm for inducing process models from time-series data; an interactive environment for model construction/utilization. an interactive environment for model construction/utilization. In summary, our work on computational scientific discovery has, in responding to various challenges, produced: We have demonstrated this approach to model creation on domains from Earth science, microbiology, and engineering.

Directions for Future Research produce additional results on other scientific data sets produce additional results on other scientific data sets develop improved methods for fitting model parameters develop improved methods for fitting model parameters extend the approach to handle data sets with missing values extend the approach to handle data sets with missing values implement heuristic methods for searching the structure space implement heuristic methods for searching the structure space utilize knowledge of subsystems to further constrain search utilize knowledge of subsystems to further constrain search augment the modeling environment to make it more usable augment the modeling environment to make it more usable Despite our progress to date, we need further work in order to: Inductive process modeling has great potential to speed progress in systems science and engineering.

End of Presentation