Analytics – Statistical Approaches

Slides:



Advertisements
Similar presentations
Objectives 10.1 Simple linear regression
Advertisements

Chapter 7 Statistical Data Treatment and Evaluation
CHAPTER 21 Inferential Statistical Analysis. Understanding probability The idea of probability is central to inferential statistics. It means the chance.
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
Linear regression models
Uncertainty Representation. Gaussian Distribution variance Standard deviation.
Multiple regression analysis
Copyright 2004 David J. Lilja1 Errors in Experimental Measurements Sources of errors Accuracy, precision, resolution A mathematical model of errors Confidence.
1 BA 555 Practical Business Analysis Review of Statistics Confidence Interval Estimation Hypothesis Testing Linear Regression Analysis Introduction Case.
Simple Linear Regression Analysis
Standard error of estimate & Confidence interval.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
Chapter 11 Simple Regression
Overview G. Jogesh Babu. Probability theory Probability is all about flip of a coin Conditional probability & Bayes theorem (Bayesian analysis) Expectation,
Basic Statistics. Basics Of Measurement Sampling Distribution of the Mean: The set of all possible means of samples of a given size taken from a population.
Stats for Engineers Lecture 9. Summary From Last Time Confidence Intervals for the mean t-tables Q Student t-distribution.
Statistics for Business and Economics 7 th Edition Chapter 11 Simple Regression Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
Production Planning and Control. A correlation is a relationship between two variables. The data can be represented by the ordered pairs (x, y) where.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Chapter 7 The Logic Of Sampling. Observation and Sampling Polls and other forms of social research rest on observations. The task of researchers is.
EMIS 7300 SYSTEMS ANALYSIS METHODS FALL 2005 Dr. John Lipp Copyright © Dr. John Lipp.
Quick and Simple Statistics Peter Kasper. Basic Concepts Variables & Distributions Variables & Distributions Mean & Standard Deviation Mean & Standard.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.
26134 Business Statistics Tutorial 12: REVISION THRESHOLD CONCEPT 5 (TH5): Theoretical foundation of statistical inference:
Chapter 16 Social Statistics. Chapter Outline The Origins of the Elaboration Model The Elaboration Paradigm Elaboration and Ex Post Facto Hypothesizing.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 11: Models Marshall University Genomics Core Facility.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
Beginning Statistics Table of Contents HAWKES LEARNING SYSTEMS math courseware specialists Copyright © 2008 by Hawkes Learning Systems/Quant Systems, Inc.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Statistical Significance Hypothesis Testing.
Statistical Methods. 2 Concepts and Notations Sample unit – the basic landscape unit at which we wish to establish the presence/absence of the species.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Multiple Regression Chapter 14.
Chapter 22 Inferential Data Analysis: Part 2 PowerPoint presentation developed by: Jennifer L. Bellamy & Sarah E. Bledsoe.
Statistical principles: the normal distribution and methods of testing Or, “Explaining the arrangement of things”
Overview G. Jogesh Babu. R Programming environment Introduction to R programming language R is an integrated suite of software facilities for data manipulation,
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Advanced Data Analytics
GS/PPAL Research Methods and Information Systems
Inference about the slope parameter and correlation
Statistical Inference
Physics 114: Lecture 13 Probability Tests & Linear Fitting
Chapter 4 Basic Estimation Techniques
Probability and Statistics
as presented on that date, with special formatting removed
Regression Analysis AGEC 784.
LECTURE 33: STATISTICAL SIGNIFICANCE AND CONFIDENCE (CONT.)
Outlier Processing via L1-Principal Subspaces
CH 5: Multivariate Methods
Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.
Statistical Data Analysis
Chapter 11 Simple Regression
Correlation and Regression
Statistical Methods For Engineers
Simulation: Sensitivity, Bootstrap, and Power
Confidence Intervals Tobias Econ 472.
Introduction to Instrumentation Engineering
Filtering and State Estimation: Basic Concepts
Statistical Analysis Error Bars
I. Statistical Tests: Why do we use them? What do they involve?
Statistical Inference about Regression
BA 275 Quantitative Business Methods
Data Mining (and machine learning)
Simple Linear Regression
Confidence Intervals Tobias Econ 472.
Statistical Data Analysis
Fixed, Random and Mixed effects
Product moment correlation
MGS 3100 Business Analysis Regression Feb 18, 2016
Presentation transcript:

Analytics – Statistical Approaches CSC 453, CSC-591/ECE592, Spring 2016 Rudra Dutta

Overview Extracting “insight” or “knowledge” from sensor readings is a key value-add for IoT “Insight” aspect comes from knowing/hypothesizing what available data may correlated with what required information (The origin of business analytics) Actual analysis may be more traditional Algorithmic and statistical Algorithmic challenge – what IS the data? Statistical challenge – how to process? Copyright Rudra Dutta, CSC, NCSU, Spring 2016

R Programming Language A scripting-like environment MATLAB-like Openly available across platforms GUI through X-windows Easy manipulation of vectors and matrices Rich statistical library and tables Most of what follows can be (comparatively) painlessly done with R Without having to implement detailed steps E.g. linear regression is a single simple call “lm” Copyright Rudra Dutta, CSC, NCSU, Spring 2016

Datasets in IoT Context Data often has “stochastic” nature Many random variation factors Some are artifacts of the system itself Noise Environmental conditions for sensors Difference between individual sensors Uncertainty Actual variation due to complex environmental factors Often secondary, but with some effect nevertheless Copyright Rudra Dutta, CSC, NCSU, Spring 2016

Estimation, Forecast, Extrapolation When we measure some quantity or quantities in the physical world, we might be interested in Knowing what the “real” underlying quantity is, from the measurements Knowing how one quantity might be determined from other(s) Knowing what the quantity is when/where we have NOT measured it Copyright Rudra Dutta, CSC, NCSU, Spring 2016

Arithmetic Mean (Expectation) Simple and straightforward estimation of underlying quantity Population: all the instances that might be measured (potentially infinite) Sample: the ones we DO measure in an experiment (necessarily finite) In some cases, repeated measurements (to reduce effect of noise) In others, measurements apply to different instances (heights of people, e.g.) Copyright Rudra Dutta, CSC, NCSU, Spring 2016

Confidence Interval Based on a sample, a “confidence interval” can be extracted for any degree of confidence (e.g. 95%) Something of a misnomer “If we took more samples, and derived this interval each time, 95% the actual mean would lie inside the interval” Calculated with assumptions of underlying Gaussian distribution Copyright Rudra Dutta, CSC, NCSU, Spring 2016

Regression General-purpose model-fitting approach to collected data Hypothesis: there is an underlying reality that causes an observed variable to take on values in response to a different one (“model”) May or may not be causal Heater setting and temperature Height and weight Approach: assuming so, actual observed sample points must contain “error” Due to noise or random variation What parameters of the model would minimize the overall error? Linear model – one independent and one dependent variable Generalized to multiple (matrix statement of same) Copyright Rudra Dutta, CSC, NCSU, Spring 2016

Regression – Linear Model Copyright Rudra Dutta, CSC, NCSU, Spring 2016

Hypothesis Testing Original hypothesis of the model Null hypothesis: no such correlation Pre-determine desired error likelihood, e.g. 5% Test: draw sample, then determine the probability that this sample could have come from null hypothesis population If probability is less than pre-determined, then we say we can “reject the null hypothesis” at the 5% level Copyright Rudra Dutta, CSC, NCSU, Spring 2016

Hypothesis Testing Copyright Rudra Dutta, CSC, NCSU, Spring 2016

Time-Series Copyright Rudra Dutta, CSC, NCSU, Spring 2016

FSM-based Modeling Copyright Rudra Dutta, CSC, NCSU, Spring 2016

Smoothing Copyright Rudra Dutta, CSC, NCSU, Spring 2016

Clustering and Classification Copyright Rudra Dutta, CSC, NCSU, Spring 2016

Summary Statistical techniques can be useful tools in the toolkit of the IoT engineer Extract meaningful sense from available data Many such techniques are sufficiently mature that they can simply be plugged in programmatically Need to understand basic concepts behind tools and techniques Copyright Rudra Dutta, CSC, NCSU, Spring 2016