INFORMATION TECHNOLOGY IN BUSINESS AND SOCIETY SESSION 19 – GETTING DATA AND VISUALIZING IT SEAN J. TAYLOR.

Slides:



Advertisements
Similar presentations
San Jose State University Engineering 101 JKA & KY.
Advertisements

Chapter 3 – Data Visualization © Galit Shmueli and Peter Bruce 2010 Data Mining for Business Intelligence Shmueli, Patel & Bruce.
Statistics 100 Lecture Set 6. Re-cap Last day, looked at a variety of plots For categorical variables, most useful plots were bar charts and pie charts.
Statistics 100 Lecture Set 7. Chapters 13 and 14 in this lecture set Please read these, you are responsible for all material Will be doing chapters
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Lecture Slides Elementary Statistics Eleventh Edition and the Triola.
Excel Charts – Basic Skills Creating Charts in Excel.
Psy302 Quantitative Methods
The controlled assessment is worth 25% of the GCSE The project has three stages; 1. Planning 2. Collecting, processing and representing data 3. Interpreting.
Statistics and Probability: 5 sessions
GRAPHICAL DESCRIPTIVE STATISTICS FOR QUALITATIVE, TIME SERIES AND RELATIONAL DATA.
Types of Data Displays Based on the 2008 AZ State Mathematics Standard.
B a c kn e x t h o m e Classification of Variables Discrete Numerical Variable A variable that produces a response that comes from a counting process.
1 i247: Information Visualization and Presentation Marti Hearst Graphing and Basic Statistics.
Educational Research by John W. Creswell. Copyright © 2002 by Pearson Education. All rights reserved. Slide 1 Chapter 8 Analyzing and Interpreting Quantitative.
Presenting information
CHAPTER 1: Picturing Distributions with Graphs
Hydrologic Statistics
1. An Overview of the Data Analysis and Probability Standard for School Mathematics? 2.
© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 12 Describing Data.
Chapter 1 Descriptive Analysis. Statistics – Making sense out of data. Gives verifiable evidence to support the answer to a question. 4 Major Parts 1.Collecting.
Quantitative Skills: Data Analysis and Graphing.
Exploratory Data Analysis. Computing Science, University of Aberdeen2 Introduction Applying data mining (InfoVis as well) techniques requires gaining.
● Midterm exam next Monday in class ● Bring your own blue books ● Closed book. One page cheat sheet and calculators allowed. ● Exam emphasizes understanding.
July, 2000Guang Jin Statistics in Applied Science and Technology Chapter 3 Organizing and Displaying Data.
Tutor: Prof. A. Taleb-Bendiab Contact: Telephone: +44 (0) CMPDLLM002 Research Methods Lecture 9: Quantitative.
Census A survey to collect data on the entire population.   Data The facts and figures collected, analyzed, and summarized for presentation and.
Graphical Analysis. Why Graph Data? Graphical methods Require very little training Easy to use Massive amounts of data can be presented more readily Can.
Chap 11 Engineering Statistics PREP004 – Introduction to Applied Engineering College of Engineering - University of Hail Fall 2009.
6.1 What is Statistics? Definition: Statistics – science of collecting, analyzing, and interpreting data in such a way that the conclusions can be objectively.
Variable  An item of data  Examples: –gender –test scores –weight  Value varies from one observation to another.
Quantitative Skills 1: Graphing
STAT 211 – 019 Dan Piett West Virginia University Lecture 1.
Question 4 What are data and what do they mean to a scientist?
Chapter 1 Review MDM 4U Mr. Lieff. 1.1 Displaying Data Visually Types of data Quantitative Discrete – only whole numbers are possible Continuous – decimals/fractions.
Are You Smarter Than a 5 th Grader?. 1,000,000 5th Grade Topic 15th Grade Topic 24th Grade Topic 34th Grade Topic 43rd Grade Topic 53rd Grade Topic 62nd.
StatisticsStatistics Graphic distributions. What is Statistics? Statistics is a collection of methods for planning experiments, obtaining data, and then.
Chapter 2 Describing Data.
Introduction to Statistics Mr. Joseph Najuch Introduction to statistical concepts including descriptive statistics, basic probability rules, conditional.
1 Chapter 3 Looking at Data: Distributions Introduction 3.1 Displaying Distributions with Graphs Chapter Three Looking At Data: Distributions.
TYPES OF STATISTICAL METHODS USED IN PSYCHOLOGY Statistics.
PCB 3043L - General Ecology Data Analysis. OUTLINE Organizing an ecological study Basic sampling terminology Statistical analysis of data –Why use statistics?
1 STAT 500 – Statistics for Managers STAT 500 Statistics for Managers.
Sort the graphs. Match the type of graph to it’s name.
Descriptive Statistics Summarizing data using graphs.
GrowingKnowing.com © Frequency distribution Given a 1000 rows of data, most people cannot see any useful information, just rows and rows of data.
1.1 example these are prices for Internet service packages find the mean, median and mode determine what type of data this is create a suitable frequency.
June 21, Objectives  Enable the Data Analysis Add-In  Quickly calculate descriptive statistics using the Data Analysis Add-In  Create a histogram.
Descriptive statistics Petter Mostad Goal: Reduce data amount, keep ”information” Two uses: Data exploration: What you do for yourself when.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
1.1 Statistical Analysis. Learning Goals: Basic Statistics Data is best demonstrated visually in a graph form with clearly labeled axes and a concise.
Statistics – OR 155, Section 2 J. S. Marron, Professor Department of Statistics and Operations Research.
Review Lecture 51 Tue, Dec 13, Chapter 1 Sections 1.1 – 1.4. Sections 1.1 – 1.4. Be familiar with the language and principles of hypothesis testing.
Data Science and Big Data Analytics Chap 3: Data Analytics Using R
PCB 3043L - General Ecology Data Analysis.
MDM4U Displaying Data Visually Learning goal:Classify data by type Create appropriate graphs.
Data Analysis, Presentation, and Statistics
Statistics is... a collection of techniques for planning experiments, obtaining data, and then organizing, summarizing, presenting, analyzing, interpreting,
Statistics with TI-Nspire™ Technology Module E Lesson 1: Elementary concepts.
StatisticsStatistics Did you hear about the statistician who put her head in the oven and her feet in the refrigerator? She said, "On average, I feel just.
Graphs with SPSS Aravinda Guntupalli. Bar charts  Bar Charts are used for graphical representation of Nominal and Ordinal data  Height of the bar is.
(Unit 6) Formulas and Definitions:. Association. A connection between data values.
Plotting in Excel Ken youssefi Engineering 10.
Plotting in Excel KY San Jose State University Engineering 10.
MATH-138 Elementary Statistics
Analysis and Empirical Results
PCB 3043L - General Ecology Data Analysis.
CHAPTER 5 Basic Statistics
CHAPTER 1: Picturing Distributions with Graphs
CPSC 531: System Modeling and Simulation
CHAPTER 1: Picturing Distributions with Graphs
Presentation transcript:

INFORMATION TECHNOLOGY IN BUSINESS AND SOCIETY SESSION 19 – GETTING DATA AND VISUALIZING IT SEAN J. TAYLOR

ADMINISTRATIVIA Assignment 3: Still Grading Assignment 4: GREAT JOB EVERYONE!

GROUP PROJECT 1 (DUE 4/13) 1.Find some data 2.Load it into Access and/or Excel (or anything) 3.Explore the data set, find something interesting 4.Create interesting visualizations of the data 5.Use your exploration to help define a question you’d like to answer. 6.Answer that question as best you can using the data.

FINDING DATA 1.Infochimps [ 2.NYC Open Data [ 3.ScraperWiki [ 4.Google Insights for Search [ 5.World Bank Data [ 6.Many more: [ Million song data set Movies and ratings Census data, Enron s, Tweets, Bit.ly link clicks, etc.

LOADING DATA Formats: csv, tab-delimited, fixed-width, many more Unstructured: html or web API data Import into Excel first, then Access May have to clean it first!

TOOLS

WHY VISUALIZE DATA? Same average for X Same variance for X Same average for Y Same variance for Y (approx) Same correlation between X and Y Same linear regression:

ANSCOMBE’S QUARTET Your brain can efficiently process properly visualized data.

EDA: EXPLORATORY DATA ANALYSIS An approach to analyzing data sets to summarize their main characteristics in easy-to-understand form. Often with visual graphs, without using a statistical model or having formulated a hypothesis. Helps to formulate hypotheses that could be tested on new data-sets.

RULE #1: NO PIE CHARTS!

RULE #1: NO 3-D PIE CHARTS!

HISTOGRAMS Shows entire distribution of one particular variable. Each column’s height is determined by the count of the number of items which fall into the bin. Bin size is a variable you can play with: wider is more smooth, while smaller bins can yield erratic plots.

DENSITY PLOTS A population of women who were at least 21 years old, of Pima Indian heritage and living near Phoenix, Arizona, was tested for diabetes according to World Health Organization criteria. The data were collected by the US National Institute of Diabetes and Digestive and Kidney Diseases. We used the 532 complete records. Red: Diabetes = 0 Blue: Diabetes = 1 Black: Diabetes = 0 or 1

BOX PLOTS Display differences between subpopulations in your data. Furthest lines are min/max. Box shows 25 th to 75 th percentiles. Thick line shows the 50 th percentile (the median).

SCATTER PLOTS Suggests correlation between two variables. Correlations may be positive (rising), negative (falling), or null (uncorrelated). A line of best fit (alternatively called 'trendline') can be drawn. Ability to show nonlinear relationships between variables.

PARETO CHARTS MUCH better than a pie chart. Shows individual components as well as cumulative total.

RUN CHART Shows a variable over time. Allows comparison between different variables. Can show trends or time- relationships between variables.

USING AREA/VOLUME/SHAPE Don’t: hard for our brains to compare total area of odd shapes If you must, use regular bars (in some kind of… bar chart )

USING COLOR Colors have no natural scale. Bad: Better:

USING PLACEMENT

USING PLACEMENT: BAD

TRANSFORM/COMBINE YOUR VARIABLES! Relationships can exist between your variables and computed variables. Height => Height^2 Skewed variables (counts) => take log (e.g. number of friends) Running backs in the NFL Weight / (40 yard dash time) = Speed Score Make categories out of continuous variables: Good performance: 1 if > 5% return in the last year, 0 otherwise.

“BEAUTIFUL VISUALIZATION” “THE VISUAL DISPLAY OF QUANTITATIVE INFORMATION”

NEXT CLASS: SOFTWARE ENGINEERING Read “No Silver Bullet”