1 Introduction to Predictive Learning Electrical and Computer Engineering LECTURE SET 1 INTRODUCTION and OVERVIEW.

1 Introduction to Predictive Learning Electrical and Computer Engineering LECTURE SET 1 INTRODUCTION and OVERVIEW

2 OUTLINE of Set 1 1.1 Overview: what is this course about: - subject matter - philosophical connections - prerequisites and HW1 - expected outcomes of this course 1.2 Historical Perspective 1.3 Motivation for Empirical Knowledge 1.4 General Experimental Procedure for Estimating Models from Data

3 1.1.1 Subject Matter Uncertainty and Learning Decision making under uncertainty Biological learning (adaptation) (examples and discussion) Induction, Statistics and Philosophy Ex. 1: Many old men are bald Ex. 2: Sun rises on the East every day

4 (cont’d) Many old men are bald Psychological Induction: - inductive statement based on experience - has certain predictive aspect - no scientific explanation Statistical View: - the lack of hair = random variable - estimate its distribution (depending on age) from past observations (training sample) Philosophy of Science Approach: - find scientific theory to explain the lack of hair - explanation itself is not sufficient - true theory needs to make non-trivial predictions

5 Conceptual Issues Any theory (or model) has two aspects: 1. explanation of past data (observations) 2. prediction of future (unobserved) data Achieving both goals perfectly not possible Important issues to be addressed: - quality of explanation and prediction - is good prediction possible at all ? - if two models explain past data equally well, which one is better? - how to distinguish between true scientific and pseudo- scientific theories?

6 Beliefs vs True Theories Men have lower life expectancy than women Because they choose to do so Because they make more money (on average) and experience higher stress managing it Because they engage in risky activities Because ….. Demarcation problem in philosophy

7 1.1.2 Philosophical Connections Oxford English dictionary: Induction is the process of inferring a general law or principle from the observations of particular instances. Clearly related to Predictive Learning. All science and (most of) human knowledge involves induction How to form ‘good’ inductive theories?

8 Challenge of Predictive Learning Explain the past and predict the future

9 Background: philosophy William of Ockham: entities should not be multiplied beyond necessity Epicurus of Samos: If more than one theory is consistent with the observations, keep all theories

10 Background: philosophy Thomas Bayes: How to update/ revise beliefs in light of new evidence Karl Popper: Every true (inductive) theory prohibits certain events or occurences, i.e. it should be falsifiable

11 Background: philosophy George W. Bush: I am The Decider

12 Background: philosophy Bill Clinton: I told the Truth

13 1.1.3 Prerequisites and Hwk1 Math: working knowledge of basic Probability + Linear Algebra Statistical software (of your choice): - MATLAB, also R-project, Mathematica etc. Note: you will be using s/w implementations of learning algorithm (not writing programs) Writing: no special requirements Philosophy: no special requirements

14 Homework 1 Purpose: testing background on probability and computer skills Goal: estimate pdf of a random variable X Real Data: X=daily price changes of SP500 i.e.where Z(t) = closing price Typical + Useful Statistics - Histogram (empirical pdf) - mean, standard deviation

15 (cont’d) Homework 1 Histogram = estimated pdf (from data) Example: histograms of 5 and 30 bins to model N(0,1) also mean and standard deviation (estimated from data)

16 (cont’d) Homework 1 NOTE: histogram ~ empirical pdf, i.e. y-axis scale is in % (frequency). Example: histogram of SP500 daily price changes in 1981:

17 1.1.4 Expected Outcomes (of this course) Scientific/technical: Learning = generalization, concepts and issues Math theory: Statistical Learning Theory aka VC-theory Implications for Philosophy and Applications Philosophical: Nature of human knowledge and intelligence Demarcation principle Human learning Practical: Financial engineering Security Genomics Predicting successful marriage, climate modeling etc., etc. What is this course NOT about

18 Other Related Fields The field of Pattern Recognition is concerned with the automatic discovery of regularities in data. Data Mining is the process of automatically discovering useful information in large data repositories. This book (on Statistical Learning) is about learning from data. The field of Machine Learning is concerned with the question of how to construct computer programs that automatically improve with experience. Artificial Neural Networks perform useful computations through the process of learning.  (1) unnecessary fragmentation  confusion (2) all fields estimate useful models from data, i.e. extract knowledge from data (the same as in classical statistics) Real Issues: what is ‘useful’? What is ‘knowledge’?

19 OUTLINE of Set 1 1.1 Overview: what is this course about 1.2 Historical Perspective Handling uncertainty and risk: - probabilistic vs - risk minimization 1.3 Motivation for Empirical Knowledge 1.4 General Experimental Procedure for Estimating Models from Data

20 1.2.1Handling Uncertainty and Risk Ancient times Probability for quantifying uncertainty - degree-of-belief - frequentist (Cardano-1525, Pascale, Fermat) Newton and causal determinism Probability theory and statistics (20 th century) Modern science (A. Einstein)  Goal of science: estimating a true model or system identification Scientific knowledge ~ deterministic+causal(explicable, assured)

21 Gods, Prophets and Shamans

22 Handling Uncertainty and Risk(2) Making decisions under uncertainty = risk management Probabilistic approach: - estimate probabilities (of future events) - assign costs and minimize expected risk Risk minimization approach: - apply decisions to known past events - select one minimizing expected risk Common in all living things: learning, generalization

23 Cultural and Psychological Aspects All men by nature desire knowledge Man has an intense desire for assured knowledge Assured Knowledge ~ belief in - religion - reason (causal determinism) - science / pseudoscience - data-analytic models (~ Big Data)

24 Human Generalization Example 1: continue given sequence 6, 10, 14, 18,…. Example 2: Sceitnitss osbevred: it is nt inptrant how lteters are msspled isnide the word. It is ipmoratnt that the fisrt and lsat letetrs do not chngae, tehn the txet is itneprted corrcetly

25 OUTLINE of Set 1 1.1 Overview: what is this course about 1.2 Historical Perspective 1.3 Motivation for Empirical Knowledge - scientific knowledge - growth of empirical knowledge - the nature of human knowledge 1.4 General Experimental Procedure for Estimating Models from Data

26 1.3.1 Scientific Knowledge Combines ideas/models and facts/data First-principle knowledge: hypothesis  experiment  theory ~ deterministic, simple causal models Modern data-driven discovery: Computer program + DATA  knowledge ~ statistical, complex systems Two different philosophies

27 Scientific Knowledge Classical Knowledge (last 3-4 centuries): - objective - recurrent events (repeatable by others) - quantifiable (described by math models) Knowledge ~ causal, deterministic, logical Humans cannot reason well about - noisy/random data - multivariate high-dimensional data

28 Knowledge Discovery in Digital Age Most information in the form of digital data Can we get assured knowledge from data? Big Data ~ technological nirvana data + connectivity  more knowledge Wired Magazine, 16/07: We can stop looking for (scientific) models. We can analyze the data without hypotheses about what it might show. We can throw the numbers into the biggest computing clusters the world has ever seen and let statistical algorithms find patterns where science cannot.

29 Reality: many questionable studies Duke biologists discovered an unusual link between the popular singer and a new species of fern, i.e. - bisexual reproductive stage of the ferns; - the team found the sequence GAGA when analyzing the fern’s DNA base pairs

30 Scientific Data Mining: Kepler’s Laws How planets move among the stars? - Ptolemaic system (geocentric) - Copernican system (heliocentric) Tycho Brahe (16 century) - measure positions of the planets in the sky - use experimental data to support one’s view (hypothesis) Johannes Kepler: - used volumes of Tycho’s data to discover three remarkably simple laws

31 Kepler’s Laws (1) The orbit is an ellipse with sun at its focus (2) The line joining a planet to the sun sweeps equal areas during the same time (3) The ratio P 2 /D 3 is constant, where P is the orbit period and D is the orbit size. NO computers, statistics, machine learning or Big Data

32 Kepler’s Laws vs. ‘Lady Gaga’ knowledge Both search for assured knowledge Kepler’s Laws - well-defined hypothesis stated a priori - prediction capability - human intelligence - no marketing Lady Gaga knowledge ~ belief - no hypothesis stated a priori - no prediction capability - computer intelligence (software program) - very marketable (to wide audience)

33 1.3.2Growth of empirical knowledge Huge growth of the amount of data in 20 th century (computers and sensors) Complex systems (engineering, life sciences and social) Classical first-principle science is inadequate for empirical knowledge Need for Data-Analytic Modeling: How to estimate good predictive models from noisy data

34 1.3.3 Nature of human knowledge Three types of knowledge - scientific (first-principle, deterministic) - empirical - metaphysical (beliefs) Boundaries are poorly understood

35 More on Empirical Knowledge Demarcation: Empirical Knowledge vs Beliefs Examples Empirical vs First Principle Knowledge Examples, Discussion

36 Representation of Empirical Knowledge Facts, observations ~ data points (x, y) Knowledge: predictive relationship ~ function f(x) Examples:

37 Example: Polynomial Curve Fitting The Problem: estimate the best polynomial model - true (target) function ~ second-order (in blue) - But the best model is linear

38 Modeling Assumptions Future data is similar to the past Explanation is different from prediction Prediction implies many future (test) inputs) Multiplicity of good solutions Estimation of inductive models that can predict (generalize) is difficult (~ill-posed)

39 1.4 General Experimental Procedure 1. Understand Application Goals/ Requirements 2. Hypothesis Formulation (Problem Formalization) 3. Data Generation/Collection/ Experiment Design 4. Data Cleaning and Preprocessing 5. Model Estimation (learning) 6. Model Interpretion/ Assessment and Drawing Conclusions Note: - each step is complex and usually involves several iterations - final model depends on all previous steps

40 Honest Disclosure of Results Recall Tycho Brahe (16 th century) Modern drug studies Review of studies submitted to FDA Of 74 studies reviewed, 38 were judged to be positive by the FDA. All but one were published. Most of the studies found to have negative or questionable results were not published, researchers found. Source: The New England Journal of Medicine, WSJ Jan 17, 2008) Publication bias: widespread in modern research

1 Introduction to Predictive Learning Electrical and Computer Engineering LECTURE SET 1 INTRODUCTION and OVERVIEW.

Similar presentations

Presentation on theme: "1 Introduction to Predictive Learning Electrical and Computer Engineering LECTURE SET 1 INTRODUCTION and OVERVIEW."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Introduction to Predictive Learning Electrical and Computer Engineering LECTURE SET 1 INTRODUCTION and OVERVIEW.

Similar presentations

Presentation on theme: "1 Introduction to Predictive Learning Electrical and Computer Engineering LECTURE SET 1 INTRODUCTION and OVERVIEW."— Presentation transcript:

Similar presentations

About project

Feedback