Short crash on correlations

Slides:



Advertisements
Similar presentations
Correlation When two sets of data are strongly linked together we say they have a High Correlation.
Advertisements

When you are ready to record start on slide 3
= 9 = 8 = 12 = 19 = 2 = 3 = 17 = 5 = 11 = 5.
Today we will multiply using decimals.
Starter 1.How would you record the following measurements? How many sig figs? a.b. 2.Count the sig figs in the following measured values: a
Statistics: For what, for who? Basics: Mean, Median, Mode.
LETS REVIEW SIGNIFICANT FIGURES ONE MORE TIME Solving Problems with Significant Figures.
Some Basic Math Concepts for Statistics
Educ 200C Wed. Oct 3, Variation What is it? What does it look like in a data set?
Decimals. Addition & Subtraction Example Find: Solution: = = =18.32 – =
Correlation.
Objectives The student will be able to:
Standard Deviation A Measure of Variation in a set of Data.
.  Relationship between two sets of data  The word Correlation is made of Co- (meaning "together"), and Relation  Correlation is Positive when the.
Significant Figures. Significant Figure Rules 1) ALL non-zero numbers (1,2,3,4,5,6,7,8,9) are ALWAYS significant. 1) ALL non-zero numbers (1,2,3,4,5,6,7,8,9)
1 Correlation [kor-uh-ley-shun] noun Statistics: the degree to which two or more attributes or measurements show a tendency to vary together. HONR 101.
Section 5-4 The Irrational Numbers Objectives: Define irrational numbers Simplify radicals Add, subtract, multiply, and divide square roots Rationalize.
Let’s COUNT In tenths
Square Roots All positive real numbers have two square roots, a positive and negative square root. All positive real numbers have two square roots, a positive.
A.9: The student given the data, will interpret variation in real-world contexts and calculate and interpret mean, absolute deviation, standard deviation,
Copyright © 2017, 2013, 2009 Pearson Education, Inc. 1 Section 1.1 Introduction to Algebra: Variables and Mathematical Models.
STATISTICS AND PROBABILITY IN CIVIL ENGINEERING
Discrete Probability Distributions
3.1 – Simplifying Algebraic Expressions
Copyright © Cengage Learning. All rights reserved.
Warm-up Get a sheet of computer paper/construction paper from the front of the room, and create your very own paper airplane. Try to create planes with.
Scatter Plots and Correlation Coefficients
Warm up Five word prediction. Predict the meanings of the following words: Variable Expression Base Constant Factor.
Daily Warmup Solve for x x2+7=43 Ans: x = ±6 64+x2=164
2.1 Classifying Polynomials
Measures of Dispersion
Discussion: What is this residual plot telling us about the relationship between speed and braking distance? Let’s now end by discussing how to interpret.
Converting Statements to Symbols
BASIC FORMULAE.
Square Roots Practice © T Madas.
Four Rules, Rounding & Place Value
Significant Figures Notes
Association between quantitative variables: Correlation and regression
Adding and Subtracting Numbers in Scientific Notation
Quantitative Methods PSY302 Quiz 6 Confidence Intervals
Standard Deviation Calculate the mean Given a Data Set 12, 8, 7, 14, 4
Simple Linear Regression
Linear Regression.
Factoring & Special Cases--- Week 13 11/4
Why is it important to understand the “language” of mathematics?
(-3)2 = -32 The first of these means -3 x -3, but the second one, we have to do the index first, so we square the 3, then make it negative.
Chapter 3D Chapter 3, part D Fall 2000.
Objective - To add and subtract decimals.
Complete the following calculation:
Correlation and Regression
Determination Intervals
OPERATIONS WITH INTEGERS: ADD, SUBTRACT, MULTIPLY & DIVIDE.
DSS-ESTIMATING COSTS Cost estimation is the process of estimating the relationship between costs and cost driver activities. We estimate costs for three.
Mean & Standard Deviation
Mathematical Symbols 09/04/2019 R Nicks.
Warm up 5 word prediction. Guess the meanings of the following words: Variable, expression, base, constant, & factor.
Mechanical Engineering Majors Authors: Autar Kaw, Luke Snyder
Complete the following calculation:
Linear word problems Two step
GSE Coordinate Algebra
Warm up How long does a car traveling at 70 mph take to travel 88 miles, in hours? How many terms are in the expression: 36x + 27xy – 18y – 9? 5 word prediction.
Addition & Subtraction Addition & Subtraction
Measurement and significant figures
It’s the middle of the week!
Definitions Identifying Parts.
Warm up 5 word prediction. Guess the meanings of the following words: Variable, expression, base, constant, & factor.
Scientific Notation THE LOGICAL APPROACH.
Warm up 5 word prediction. Guess the meanings of the following words:
Calculating Standard Deviation
Objectives: Translate between words and algebra.
Presentation transcript:

Short crash on correlations

Propositional logic If it’s raining, the street is wet You have two statements: If it’s raining, the street is wet If it’s snowing, the street is wet What can you deduce The street is wet, The street is not wet It have rained It have not rained

The street is wet, The street is not wet It have rained => nothing it could rain or snow or something else like flooding The street is not wet => it have not rained nor snowed It have rained => The street is wet It have not rained => nothing the street could be dry or wet from snowing

Causation and Correlation If we have a dependency between two figures abstract X and Y X causes Y T causes X Both X and Y are caused by something else like Z There is no causation going on; it’s just a coincidence

Correlations - linear correlations Correlation is Positive when the values increase together, and Correlation is Negative when one value decreases as the other increases Source: https://www.mathsisfun.com/data/correlation.html

How to calculate correlation by your self Let us call the two sets of data "x" and "y" (in example Temperature is x and Ice Cream Sales is y): Step 1: Find the mean of x, and the mean of y Step 2: Subtract the mean of x from every x value (call them "a"), do the same for y (call them "b") Step 3: Calculate: a × b, a2 and b2 for every value Step 4: Sum up a × b, sum up a2 and sum up b2 Step 5: Divide the sum of a × b by the square root of [(sum of a2) × (sum of b2)] Source: https://www.mathsisfun.com/data/correlation.html

How to calculate … cont. Here is how I calculated the first Ice Cream example (values rounded to 1 or 0 decimal places): https://www.mathsisfun.com/data/correlation.html

How to calculate … cont. Where: Σ is Sigma, the symbol for "sum up" (xi - x̅̅) is each x-value minus the mean of x (called "a" above) (yi - y̅̅)  is each y-value minus the mean of y (called "b" above) https://www.mathsisfun.com/data/correlation.html

Let move to practice If you choose humidity (x) and particles(y) For the same timestamp you have xi and yi Now do the 5 steps => then you have r (the correlation coefficient) Example: R=0.999 i.e. In this example high correlation X Y Avg(x) Avg(y) X-avg(x)=a Y-avg(y)=b a*b a2 b2 1.6 20 1.58 19.66 0.0166 0,3333 0.0055 0,000275 0.1111 1.7 22 0.1166 2,3333 0.2720 0,013595 5,444 1.45 17 -0,1333 -2,6666 0.35545 0,01776 7.1107 0,63295 0.03163 12,6662

Pig support inpt = load '~/pig_data/…' as (amnt:double,id:chararray,c2:chararray); grp = group inpt by id; mean = foreach grp { sum = SUM(inpt.amnt); count = COUNT(inpt); generate group as id, sum/count as mean, sum as sum, count as count; }; http://stackoverflow.com/questions/12593527/finding-mean-using-pig-or-hadoop

Spark support seriesX = sc.parallelize([1.0, 2.0, 3.0, 3.0, 5.0]) seriesY = sc.parallelize([11.0, 22.0, 33.0, 33.0, 555.0]) print("Correlation is: " + str(Statistics.corr(seriesX, seriesY, method="pearson"))) data = sc.parallelize( [np.array([1.0, 10.0, 100.0]), np.array([2.0, 20.0, 200.0]), np.array([5.0, 33.0, 366.0])]) print(Statistics.corr(data, method="pearson")) https://spark.apache.org/docs/latest/mllib-statistics.html