For Big Data sets and Data Science Applications

Slides:



Advertisements
Similar presentations
Mean, Median, Mode, and Range. Mean Mean: Average 1)First, order the numbers from least to greatest. 2)Next, add the numbers in the data set. 3)Then,
Advertisements

Central Tendency Mean – the average value of a data set. Add all the items in a data set then divide by the number of items in the data set.
Measures of Central Tendency Psych 101 with Professor Michael Birnbaum.
Mean and Median Math .
Mean, Median, Mode & Range
MEASURES OF CENTRAL TENDENCY & DISPERSION Research Methods.
Measures of Central Tendency Jan Sands 2007 Mean, Median, Mode, Range.
Think about it... Do you know what mean, median, mode, and range are?
Mean, Median, Mode and Range
Measures of Central Tendancy Lesson 6.05 Vocabulary Review Sum – the answer to an addition problem. Addend – the numbers you added together to get the.
Mean, Median, Mode & Range Content Standards Mathematics and Numeracy G1.S1.d1-2, 6 G2.S7.a.6.
Mean, Median, Mode and Range Lesson 2-6 and 2-7. Mean The mean of a set of data is the average. Add up all of the data. Divide the sum by the number of.
Mean, Median, Mode & Range
Linear Regression James H. Steiger. Regression – The General Setup You have a set of data on two variables, X and Y, represented in a scatter plot. You.
Linear Regression Least Squares Method: the Meaning of r 2.
Mean, Median, Mode & Range. Mean A number that represents the centre, or average, of a set of numbers; to find the mean, add the numbers in the set, then.
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 4 Section 2 – Slide 1 of 20 Chapter 4 Section 2 Least-Squares Regression.
1 Multiple Regression A single numerical response variable, Y. Multiple numerical explanatory variables, X 1, X 2,…, X k.
9.2A- Linear Regression Regression Line = Line of best fit The line for which the sum of the squares of the residuals is a minimum Residuals (d) = distance.
M M M R.
STATISTICAL ANALYSIS Created by The North Carolina School of Science and Math.The North Carolina School of Science and Math Copyright North Carolina.
Math Skills in Science Scientific Inquiry #4. Vocabulary Mean Mean Median Median Mode Mode.
Ways to Check for Divisibility Dividing By 1 All numbers are divisible by 1.
Data Modeling Patrice Koehl Department of Biological Sciences National University of Singapore
(7.12) Probability and statistics The student uses measures of central tendency and range to describe a set of data. The student is expected to: (A) describe.
Thinking Mathematically Statistics: 12.3 Measures of Dispersion.
The number which appears most often in a set of numbers. Example: in {6, 3, 9, 6, 6, 5, 9, 3} the Mode is 6 (it occurs most often). Mode : The middle number.
Searching & Sorting. Algorithms Step by step recipe to do a task…
February 21,2014. Number Types Integers, Odd and Even Numbers, Prime Numbers, Digits Integers…, -4, -3, -2, -1, 0, 1, 2, 3, 4, … Consecutive Integers:
Mean, Median, and Mode Lesson 7-1. Mean The mean of a set of data is the average. Add up all of the data. Divide the sum by the number of data items you.
Central Tendency Mean – the average value of a data set. Add all the items in a data set then divide by the number of items in the data set.
1 Math Review: mean, median, mode, range. 2 Average Number of s  x x x x x x x___x___x_____x__________ s sent today  What is the.
Statistics Tutorial.
Mean, Median, Mode & Range
Adding, Subtracting, Multiplying, and Dividing Integers
Statistical Measures M M O D E E A D N I A R A N G E.
Data Analysis for sets of numbers
Data Mining: Concepts and Techniques
Check for Understanding
Statistics in Science.
Definition Mean Mean – the average of a group of numbers. 2, 5, 2, 1, 5 Mean = 3.
Organizing Data: Mean, Median, Mode and Range
Mean, Median, Mode & Range
Simple Linear Regression - Introduction
1) A residual: a) is the amount of variation explained by the LSRL of y on x b) is how much an observed y-value differs from a predicted y-value c) predicts.
Standard Deviation.
Standard Deviation.
Linear Regression.
Mean: average 2. Median: middle number 3. Mode: most often
Mean, Median, Mode & Range
Standard Deviation.
The Science of Predicting Outcome
Measures of Central Tendency
Least Squares Method: the Meaning of r2
Measures of Central Tendency (Mean, Median, & Mode)
Suppose I want to add all the even integers from 1 to 100 (inclusive)
7.4 – The Method of Least-Squares
Section 2: Linear Regression.
Compute with Integers.
Mean, Median, Mode & Range
13.9 Day 2 Least Squares Regression
Median Statement problems.
State-Space Searches.
State-Space Searches.
Find Mean, Median, Mode, Range given a data set.
Mean, Median, Mode, Range, Outlier, & Mean Deviation
Standard Deviation.
Mean.
State-Space Searches.
Presentation transcript:

For Big Data sets and Data Science Applications Linear Regression For Big Data sets and Data Science Applications

Average(Mean) and Median Suppose we have 3 numbers and wanted to know the mean and median? 3, 2, 10 Mean: add them up and divide by n 3+2+10 3 = 5 Median: sort the numbers to get (2,3,10) and pick the middle number to get 3

Finding the mean (sum distances = 0) Given the numbers (3,2,10) Guess at the mean: 3 (maybe the median is the mean) Sum the difference from all values to the guessed value: (3-3) + (2-3) + (10-3) = 0 -1 +7 = +6 - Guess again: 4 and sum (3-4) + (2-4) + (10-4) = -1 -2 +6 = +3 - Guess again: 5 and sum (3-5) + (2-5) + (10-5) = -2 -3 +5 = 0  minimal difference = mean - Guess again: 6 and sum (3-6) + (2-6) + (10-6) = -3 -4 +4 = -3 - Guess again: 7 and sum (3-7) + (2-7) + (10-7) = -4 -5 +3 = -6 We started with a guess of 3, then made progress guessing toward 5, then after 5 our guess regressed away from the minimal value We are assuming integer values only.

Finding the median (sum |distances|) Given the numbers (3,2,10) Guess at the mean: 3 (maybe the median is the mean) Sum the difference from all values to the guessed value: |(3-3)| + |(2-3)| + |(10-3)| = |0| +|-1| +|+7| = 8 - Guess again: 4 and sum |(3-4)| + |(2-4)| + |(10-4)| = |-1| +|-2| +|+6| = 9 - Guess again: 5 and sum |(3-5)| + |(2-5)| + |(10-5)| = |-2| +|-3| +|+5| = 10 - Guess again: 2 and sum |(3-2)| + |(2-2)| + |(10-2)| = |-1| +|0| +|+9| = 9  regressing We are assuming integer values only.

Finding the mean (least sum of squares) Given the numbers (3,2,10) Guess at the mean: 3 (maybe the median is the mean) Sum the squares from all values to the guessed value: (3−3) 2 + (2−3) 2 + (10−3) 2 = 0 + 1 + 49 = 50 - Guess again: 4 and sum (3−4) 2 + (2−4) 2 + (10−4) 2 = 1 + 4 + 36 = 41 - Guess again: 5 and sum (3−5) 2 + (2−5) 2 + (10−5) 2 = 4 + 9 + 25 = 38  minimal = mean - Guess again: 6 and sum (3−6) 2 + (2−6) 2 + (10−6) 2 = 9 + 16 + 16 = 41 - Guess again: 7 and sum (3−7) 2 + (2−7) 2 + (10−7) 2 = 16 + 25 + 9 = 50 We started with a guess of 3, then made progress guessing toward 5, then after 5 our guess regressed away from the minimal value We are assuming integer values only.

Finding new mean (least sum of squares) Given the numbers (3,2,10) and now we add a new number “1” to the vector to get (3,2,10,1) - Guess again: 3 and sum (3−3) 2 + (2−3) 2 + (10−3) 2 = 0 + 1 + 49 = 50 + (1−3) 2 = 54 - Guess again: 4 and sum (3−4) 2 + (2−4) 2 + (10−4) 2 = 1 + 4 + 36 = 41 + (1−4) 2 = 50  new mean - Guess again: 5 and sum (3−5) 2 + (2−5) 2 + (10−5) 2 = 4 + 9 + 25 = 38 + (1−5) 2 = 54 - Guess again: 6 and sum (3−6) 2 + (2−6) 2 + (10−6) 2 = 9 + 16 + 16 = 41 + (1−6) 2 = 65 - Guess again: 7 and sum (3−7) 2 + (2−7) 2 + (10−7) 2 = 16 + 25 + 9 = 50 + (1−7) 2 = 86 We start off knowing that the sum of squares of (3,2,10) are listed above and a new number “1” is added to the set. Here we are searching for the new mean value of the vector (3,2,10,1) and doing a little work as possible This is very popular in Data science…in Statistics we would just start the entire computation over because data size and time are irreverent

Linear Regression