CS 204423: Assignment 1 Play around with your data.

Slides:



Advertisements
Similar presentations
 Will help you gain knowledge in: ◦ Improving performance characteristics ◦ Reducing costs ◦ Understand regression analysis ◦ Understand relationships.
Advertisements

Mini Lesson 4 (Instruction) Identify meaningful relationships using a scatter plot Data Literacy Project Do light cars go farther on a gallon of gas than.
7.1 Seeking Correlation LEARNING GOAL
A designed experiment is a controlled study in which one or more treatments are applied to experimental units. The experimenter then observes the effect.
 Consumer Research Organization.  Commissions surveys and publishes reports & ratings for automobiles.  Maintains online discussion forums where consumers.
Scatterplots Thinking Skill: Explicitly assess information and draw conclusions.
5-7: Scatter Plots & Lines of Best Fit. What is a scatter plot?  A graph in which two sets of data are plotted as ordered pairs  When looking at the.
Lecture Notes for Chapter 2 Introduction to Data Mining
High Dimensional Visualization By Mingyue Tan Mar10, 2004.
Visualization and Data Mining. 2 Outline  Graphical excellence and lie factor  Representing data in 1,2, and 3-D  Representing data in 4+ dimensions.
Database Design Chapter 3.
10.1 Scatter Plots and Trend Lines
Correlation and Regression 1. Bivariate data When measurements on two characteristics are to be studied simultaneously because of their interdependence,
EXAMPLE 1 Describe the correlation of data Describe the correlation of the data graphed in the scatter plot. a. The scatter plot shows a positive correlation.
4.4 – Solving Absolute Value Equations. Absolute Value = denoted by |x|, is the distant a number is from zero Always a positive number! (or zero)
Correlation and Linear Regression Chapter 13 McGraw-Hill/Irwin Copyright © 2012 by The McGraw-Hill Companies, Inc. All rights reserved.
EXAMPLE 3 Use a quadratic model Fuel Efficiency
What kind of graph should I make? Data Literacy Project Mini-lesson 1 (Instruction)
Chapter 3 Data Exploration and Dimension Reduction 1.
Overview: Humans are unique creatures. Everything we do is slightly different from everyone else. Even though many times these differences are so minute.
Gerald Kruse, Ph.D. & Cathy Stenson, Ph.D. Juniata College Mathematics Department.
Software Metrics  The measurement of a particular characteristic of a software program's performance or efficiency. (
THE SCIENTIFIC METHOD By: Drew Sandage. THE SCIENTIFIC METHOD Identify the problem Collect information Make a hypothesis or guess Test the hypothesis.
Higher Derivatives Concavity 2 nd Derivative Test Lesson 5.3.
Regression Examples. Gas Mileage 1993 SOURCES: Consumer Reports: The 1993 Cars - Annual Auto Issue (April 1993), Yonkers, NY: Consumers Union. PACE New.
CHAPTER 38 Scatter Graphs. Correlation To see if there is a relationship between two sets of data we plot a SCATTER GRAPH. If there is some sort of relationship.
STAT 1301 Chapter 8 Scatter Plots, Correlation. For Regression Unit You Should Know n How to plot points n Equation of a line Y = mX + b m = slope b =
Descriptive Statistics becoming familiar with the data.
1 Entity-Relationship Diagram. 2 Components of ERD: –Entity –Relationship –Cardinality –Attributes.
For ITCS 6265/8265 Fall 2009 TA: Fei Xu UNC Charlotte.
Lesson 3-8 Solving Equations and Formulas. Objectives Solve equations for given variables Use formulas to solve real-world problems.
Math Pacing Solving Equations and Formulas. Some equations such as the one on the previous slide contain more than one variable. At times, you will.
1 Data Mining: Data Lecture Notes for Chapter 2. 2 What is Data? l Collection of data objects and their attributes l An attribute is a property or characteristic.
2.6 Scatter Diagrams. Scatter Diagrams A relation is a correspondence between two sets of data X is the independent variable Y is the dependent variable.
By Mary Strong & Jessica Haws 1. Explanatory variable: Quantitative, age in years and months, example July and 22yrs. = 22.7 Response variable: Weight.
Splash Screen. Then/Now You solved equations with variables on each side. Solve equations for given variables. Use formulas to solve real-world problems.
A. Write an equation in slope-intercept form that passes through (2,3) and is parallel to.
Stat 112 Notes 6 Today: –Chapter 4.1 (Introduction to Multiple Regression)
Data Literacy Project Mini-lesson 1: Think statistically and Ask statistical questions 2.
You can calculate: Central tendency Variability You could graph the data.
Assignment for Understanding XML and XPath. You will learn 2 Finding objects in a web page Understanding what makes them unique Writing an XPath expression.
Data Visualization.
If the scatter is curved, we can straighten it Then use a linear model Types of transformations for x, y, or both: 1.Square 2.Square root 3.Log 4.Negative.
Scatter Plots. Scatter plots are used when data from an experiment or test have a wide range of values. You do not connect the points in a scatter plot,
Scatter Plots. Graphing Basics Coordinate Plane – Two real number lines that intersect at a right angle.
11.2 Scatter Plots. Scatter Plots Example: What is the relationship here? Is the data on the x-axis related to the data on the y-axis? Is there a relationship.
ASSIGNMENT 李菁. The data 1983 ASA Data Exposition dataset Data on cylinders, displacement, horse power, weight, acceleration, model year and.
Chapter 10 Notes AP Statistics. Re-expressing Data We cannot use a linear model unless the relationship between the two variables is linear. If the relationship.
What Science Is and Is NOT - The goal of science is to investigate and understand the natural world, to explain events in the natural world, and to use.
Bell Ringer A random sample of records of sales of homes from Feb. 15 to Apr. 30, 1993, from the files maintained by the Albuquerque Board of Realtors.
Scatter Plots Chapter 1 Section 5. Scatter Plot - A graph that relates data from 2 different sets. - To make a scatter plot, the 2 sets of data are plotted.
1.5 Scatter Plots & Line of Best Fit. Scatter Plots A scatter plot is a graph that shows the relationship between two sets of data. In a scatter plot,
IT523-01N: DATA WAREHOUSING AND DATA MINING FINAL PROJECT INSTRUCTOR: DR. SHEILA FOURNIER- BONILLA ELEISHA BARNETT How Mpgs are Affected in Vehicles: A.
Accuracy and Precision Understanding measurements.
Copyright © 2011 Pearson Education, Inc. Association between Quantitative Variables Chapter 6.
Lab 04: Visualizing Multiple Variables
Scatter Plots and Correlations Dittamo Lewis Notes 2013.
NEURAL NETWORK APPROACHES FOR AUTOMOBILE MPG PREDICTION
2.6 Solving absolute value inequalities
Computers & Programming Languages
Literal Equations and Dimensional Analysis
2-8 Solving for a Specific Variable
Re-expressing Data:Get it Straight!
A B 1 (5,2), (8, 8) (3,4), (2, 1) 2 (-2,1), (1, -11) (-2,3), (-3, 2) 3
Data Preprocessing Copyright, 1996 © Dale Carnegie & Associates, Inc.
Welcome!.
Factor Analysis (Principal Components) Output
Data Preprocessing Copyright, 1996 © Dale Carnegie & Associates, Inc.
Data Preprocessing Copyright, 1996 © Dale Carnegie & Associates, Inc.
Data exploration and visualization
Presentation transcript:

CS : Assignment 1 Play around with your data

The dataset You are given a data set named ‘Auto MPG’ The data concerns fuel consumption in miles per gallon of cars Attribute Information: – 1. mpg: continuous – 2. cylinders: multi-valued discrete – 3. displacement: continuous – 4. horsepower: continuous – 5. weight: continuous – 6. acceleration: continuous – 7. model year: multi-valued discrete – 8. origin: multi-valued discrete (Europe, Asia, America) – 9. car name: string (unique for each instance)

Tasks 1) Examine basic characteristics of your data 1.1 How many data points? 1.2 What is the dimensionality of the data? 2) Preprocess your data into a matrix form 2.1 How will you deal with the missing attributes? There is no absolute correct answer. Express you idea! 2.2 The last field is a string data. This can’t be stored as a matrix. What to do? 3) Find the correlations between all pairs of attributes 3.1 Take the most positively correlated variables and plot them using scatter plot. Does the result make sense? Discuss your finding. 3.2 Take the most negatively correlated variables and plot them using scatter plot. Does the result make sense? Discuss your finding. 3.3 Can you find spurious correlations? 4) Standardise your data (z-normalisation) 4.1 What is the mean of your standarised dataset? 4.2 What is the standard deviation of your standardised dataset?

Useful things mean(), std() max(), min() corr() textread() scatter() help …..  help textread – Give you the manual for such function

For more data sets Visit UCI data repository – Or collect your own