Statistical Data Mining

Slides:



Advertisements
Similar presentations
By Addison Euhus, Guidance by Edward Phillips An Introduction To Uncertainty Quantification.
Advertisements

Uncertainty in Engineering The presence of uncertainty in engineering is unavoidable. Incomplete or insufficient data Design must rely on predictions or.
Section 5.1 and 5.2 Probability
What is Statistical Modeling
BAYESIAN INFERENCE Sampling techniques
Machine Learning CUNY Graduate Center Lecture 7b: Sampling.
1 Lecture 12 Monte Carlo methods in parallel computing Parallel Computing Fall 2008.
Monte Carlo Simulation Used when it is infeasible or impossible to compute an exact result with a deterministic algorithm Especially useful in –Studying.
1 Statistical Mechanics and Multi- Scale Simulation Methods ChBE Prof. C. Heath Turner Lecture 11 Some materials adapted from Prof. Keith E. Gubbins:
Marko Tainio, marko.tainio[at]thl.fi Modeling and Monte Carlo simulation Marko Tainio Decision analysis and Risk Management course in Kuopio
CSCE 2100: Computing Foundations 1 Probability Theory Tamara Schneider Summer 2013.
Stochastic Algorithms Some of the fastest known algorithms for certain tasks rely on chance Stochastic/Randomized Algorithms Two common variations – Monte.
1 Theoretical Physics Experimental Physics Equipment, Observation Gambling: Cards, Dice Fast PCs Random- number generators Monte- Carlo methods Experimental.
General Principle of Monte Carlo Fall 2013 By Yaohang Li, Ph.D.
Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:
Monte Carlo Methods Versatile methods for analyzing the behavior of some activity, plan or process that involves uncertainty.
NLP. Introduction to NLP Very important for language processing Example in speech recognition: –“recognize speech” vs “wreck a nice beach” Example in.
POSC 202A: Lecture 4 Probability. We begin with the basics of probability and then move on to expected value. Understanding probability is important because.
AP Statistics Semester One Review Part 2 Chapters 4-6 Semester One Review Part 2 Chapters 4-6.
Computer simulation Sep. 9, QUIZ 2 Determine whether the following experiments have discrete or continuous out comes A fair die is tossed and the.
Seminar on random walks on graphs Lecture No. 2 Mille Gandelsman,
Markov Chain Monte Carlo for LDA C. Andrieu, N. D. Freitas, and A. Doucet, An Introduction to MCMC for Machine Learning, R. M. Neal, Probabilistic.
The Markov Chain Monte Carlo Method Isabelle Stanton May 8, 2008 Theory Lunch.
Expectation-Maximization (EM) Algorithm & Monte Carlo Sampling for Inference and Approximation.
CS246 Latent Dirichlet Analysis. LSI  LSI uses SVD to find the best rank-K approximation  The result is difficult to interpret especially with negative.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
The Unscented Particle Filter 2000/09/29 이 시은. Introduction Filtering –estimate the states(parameters or hidden variable) as a set of observations becomes.
Chapter 8: Probability: The Mathematics of Chance Probability Models and Rules 1 Probability Theory  The mathematical description of randomness.  Companies.
Random Sampling Algorithms with Applications Kyomin Jung KAIST Aug ERC Workshop.
2-6 Probability Theoretical & Experimental. Probability – how likely it is that something will happen – Has a range from 0 – 1 – 0 means it definitely.
Generalization Performance of Exchange Monte Carlo Method for Normal Mixture Models Kenji Nagata, Sumio Watanabe Tokyo Institute of Technology.
Markov Chain Monte Carlo in R
Basic Practice of Statistics - 3rd Edition Introducing Probability
Section 5.1 and 5.2 Probability
Conditional Probability, Bayes Theorem, Independence and Repetition of Experiments Chris Massa.
Monte Carlo Methods Some example applications in C++
Optimization of Monte Carlo Integration
Advanced Statistical Computing Fall 2016
The Statistical Imagination
DSS & Warehousing Systems
cuRAND cuRAND uses GPU to generate pseudorandom numbers
Computational Lab in Physics: Numerical Integration
Professor S K Dubey,VSM Amity School of Business
Monte Carlo Methods in Scientific Computing
Latent Dirichlet Analysis
Lecture 11 Sections 5.1 – 5.2 Objectives: Probability
Chapter 4 – Part 3.
Hidden Markov Models Part 2: Algorithms
Lecture 2 – Monte Carlo method in finance
Instructors: Fei Fang (This Lecture) and Dave Touretzky
Basic Practice of Statistics - 3rd Edition Introducing Probability
CHAPTER 10: Introducing Probability
Section 6.2 Probability Models
Monte Carlo Integration Using MPI
Additional notes on random variables
Basic Practice of Statistics - 3rd Edition Introducing Probability
Monte Carlo Methods A so-called “embarrassingly parallel” computation as it decomposes into obviously independent tasks that can be done in parallel without.
Additional notes on random variables
Expectation-Maximization & Belief Propagation
Lecture 4 - Monte Carlo improvements via variance reduction techniques: antithetic sampling Antithetic variates: for any one path obtained by a gaussian.
Junghoo “John” Cho UCLA
Statistical Methods for Data Analysis Random number generators
Conditional Probability, Bayes Theorem, Independence and Repetition of Experiments Chris Massa.
Essential Statistics Introducing Probability
Random numbers What does it mean for a number to be random?
Basic Practice of Statistics - 5th Edition Introducing Probability
Monte Carlo simulation
Yalchin Efendiev Texas A&M University
Some Key Ingredients for Inferential Statistics
Using Monte Carlo Simulation for Library Science
Presentation transcript:

Statistical Data Mining SHORT LECTURE

Virtual Patient Approach in SMITH Patient + state paramters: individualisation (Machine Learning) generic model (mechanistic) adaption of generic model to available data (Machine Learning) Slide courtesy by Prof. Dr. Andreas Schuppert – RWTH Aachen University Hospital & Bayer AG Short Lecture – Markov Chains Monte Carlo in SMITH

Probability Theory – Contributions by Bernoulli Key question: How likely is a future event? Initial probability theory has roots in gambling Research in understanding uncertainty ever since E.g. understanding odds as the ratio of favorable / unfavorable outcomes E.g. probability connected to dices, cards, coins, marbles, balls, etc. Simple Example Unknown number of X black balls and Y white balls within a jar Determine the proportions of black balls & white balls? Approach: Perform a series of random draws (aka trials) from the jar Expected value of white vs. black draws will converge toward the real ratio from the jar as the number of extractions increases [1] P.A. Gagniuc, Book Markov Chains, 2017 Probability is a measure of how likely a future ‘event’ is (e.g. probability to get white/black ball) After many random draws / trials (e.g. from the jar), the observations (e.g. white balls vs. black balls) will converge toward the real ratio of elements (e.g. ratio white balls vs. black balls) Short Lecture – Markov Chains Monte Carlo in SMITH

Probability Theory – Contributions by Markov Probability Theory (Bernoulli) Outcomes of previous events do not change the outcome of future events Practice Example (selected additions by Markov) Above not always correct: events are not independent in many cases In short: added dependent events / dependent variables Dependent events / variables refer to those situations when the probability of an event is conditioned by the events that took place in the past (aka adding a ‘events over time dimension‘) [1] P.A. Gagniuc, Book Markov Chains, 2017 draw rule imposed: color of the current ball indicates color of the jar from which next draw will be made observations for one jar: draws are independent from each other (Bernoulli) from ‘Bernoulli‘ to ‘Markov‘ ‘a white jar‘ ‘a black jar‘ Short Lecture – Markov Chains Monte Carlo in SMITH

Simple Markov Model – Stochastic Model & Probability Based on probability theory a Markov model is used to model randomly changing systems A two-state ‘Markov chain‘ is the most basic Markov model that illustrates the Markov process A ‘Markov matrix‘ is a stochastic matrix that is used to describe the transitions of a ‘Markov chain‘ modified from [1] P.A. Gagniuc, Book Markov Chains, 2017 Markov diagram showing an example of a ‘randomly changing system‘ W ‘a white state‘ B ‘a black state‘ Draw rule imposed: color of the current ball indicates color of the jar from which next draw will be made Markov matrix (simplified, based on colors) To W To B From W From B ‘a white jar‘ ‘a black jar‘ Short Lecture – Markov Chains Monte Carlo in SMITH

Monte Carlo Methods – Approach Monte Carlo (MC) methods rely on repeated random sampling to obtain numerical results MC run simulations many times over to obtain the distribution of an unknown probabilistic entity (origin related to techniques in playing&recording results in gambling casinos) [2] Wikipedia on Monte Carlo method Method applicability Useful when it is difficult or impossible to apply a deterministic algorithm Define a domain of possible inputs Generate inputs randomly from a probability distribution over the domain Perform a deterministic computation on the inputs and aggregate the results (also known as MC ensemble method) (Example of approximating the value of pi – After placing many (30000) random sampling points, the estimate for pi is within 0.07% of the value – happens with an approximate probability of 20%) Pointwise real data inputs  model  Output (count the number of points inside the circle and the total number of points) (w/o monte carlo) (the ratio of the two counts is an estimate of the two areas ~ pi/4) Pointwise sampling from probability distribution Inputs  (estimate of the two areas gives ~pi/4, so multiply with 4 to estimate pi) model  Output (with monte carlo) Short Lecture – Markov Chains Monte Carlo in SMITH

Monte Carlo Methods – MPI Example Include library for a parallel random number generator (PRNG) – Important part of MC! #include <math.h> #include <mpi.h> #include <gsl/sdl_rng.h> #include “gsl-spring.h“ int main( int argc; char *argv[]) { int i, k, N; double u, ksum, Nsum, gsl_rng *r; MPI_Init(&argc,&argv); MPI_Comm_size(MPI_COMM_WORLD, &N); MPI_Comm_rank(MPI_COMM_WORLD, &k); r = gsl_rng_alloc(gsl_rng_spring20); for (i=0; i<10000; i++) { u = gsl_rng_uniform(r); ksum += exp(-u*u); } MPI_Reduce(&ksum, &Nsum, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD); printf(“MC estimate is %f\n“, (Nsum/10000/N)); MPI_Finalize(); return 0; Initializes random number generator (with a specific type) Use 10000 variates on each processor to create ‘local sum’ Gsl_rng_uniform(r) function returns a double precision floating point number uniformly distributed in the range [0,1) (idea of probability distribution) Use MPI_Reduce to create ‘global sum’ and printout estimate (MPI code prints out estimate of integral) ‘simplified demo code’ modified from [3] MSMC Short Lecture – Markov Chains Monte Carlo in SMITH

Markov Chains - Monte Carlo (MCMC) – ASIC Use Case Unsupervised Patient Stratification Dynamic clustering Critical state detection Predictive modelling Machine – ASIC system VP Patient Data (RDR) Prognosis for Individual patient DEA Patient subgroups & classifiers Machine Learning, Patient association, Subgroup specific prediction Slide courtesy by Prof. Dr. Andreas Schuppert – RWTH Aachen University Hospital & Bayer AG Short Lecture – Markov Chains Monte Carlo in SMITH

Lecture Bibliography Short Lecture – Markov Chains Monte Carlo in SMITH

Lecture Bibliography [1] P.A. Gagniuc, ‘Markov Chains: From Theory to Implementation and Experimentation’, Book John Wiley & Sons, ISBN 1119387558, 2017 [2] Wikipedia on Monte Carlo, Online: http://en.wikipedia.org/wiki/Monte_Carlo_method [3] Getting started with MCMC, Online: http://darrenjw.wordpress.com/2010/12/14/getting-started-with-parallel-mcmc/ Short Lecture – Markov Chains Monte Carlo in SMITH

Short Lecture – Markov Chains Monte Carlo in SMITH