Data Preliminaries CSC 576: Data Mining.

Slides:



Advertisements
Similar presentations
Data Engineering.
Advertisements

Copyright © 2010, 2007, 2004 Pearson Education, Inc. Lecture Slides Elementary Statistics Eleventh Edition and the Triola Statistics Series by.
TYPES OF DATA. Qualitative vs. Quantitative Data A qualitative variable is one in which the “true” or naturally occurring levels or categories taken by.
Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 3 of Data Mining by I. H. Witten, E. Frank and M. A. Hall.
Lecture Notes for Chapter 2 Introduction to Data Mining
Measurement of Variables What are the Four Types of Psychological Measures? What are the Four Types of Psychological Measures? What are the Four Types.
Chapter 3 Data Issues. What is a Data Set? Attributes (describe objects) Variable, field, characteristic, feature or observation Objects (have attributes)
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Measurement of Variables What are the Four Types of Psychological Measures? What are the Four Types of Psychological Measures? What are the Four Measurement.
Unit 1 Section 1.2.
Data Mining Lecture 2: data.
1 Statistics 202: Statistical Aspects of Data Mining Professor David Mease Tuesday, Thursday 9:00-10:15 AM Terman 156 Lecture 2 = Start chapter 2 Agenda:
Variables and Types of Data.   Qualitative variables are variables that can be placed into distinct categories, according to some characteristic or.
Introduction to Statistics What is Statistics? : Statistics is the sciences of conducting studies to collect, organize, summarize, analyze, and draw conclusions.
Basic Geographic Concepts GEOG 370 Instructor: Christine Erlien.
Data Mining & Knowledge Discovery Lecture: 2 Dr. Mohammad Abu Yousuf IIT, JU.
Statistics 300: Introduction to Probability and Statistics Section 1-2.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining Basics: Data Remark: Discusses “basics concerning data sets (first half of Chapter.
1  Specific number numerical measurement determined by a set of data Example: Twenty-three percent of people polled believed that there are too many polls.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Data Lecture Notes for Chapter 2 Introduction to Data Mining by Tan, Steinbach,
Vocabulary of Statistics Part Two. Variable classifications Qualitative variables: can be placed into distinct categories, according to some characteristic.
What is Data? Attributes
Unit 1 Section : Variables and Types of Data  Variables can be classified in two ways:  Qualitative Variable – variables that can be placed.
CIS527: Data Warehousing, Filtering, and Mining
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Data Lecture Notes for Chapter 2 Introduction to Data Mining by Tan, Steinbach,
1 Data Mining: Data Lecture Notes for Chapter 2. 2 What is Data? l Collection of data objects and their attributes l An attribute is a property or characteristic.
Thematic Data & Spatial Symbology.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
INTRODUCTION TO STATISTICS CHAPTER 1: IMPORTANT TERMS & CONCEPTS.
1 PAUF 610 TA 1 st Discussion. 2 3 Population & Sample Population includes all members of a specified group. (total collection of objects/people studied)
1 What is Data? l An attribute is a property or characteristic of an object l Examples: eye color of a person, temperature, etc. l Attribute is also known.
Chapter 1: Section 2-4 Variables and types of Data.
Tallahassee, Florida, 2016 CIS4930 Introduction to Data Mining Getting to Know Your Data Peixiang Zhao.
1 Basic Geographic Concepts Real World  Digital Environment Data in a GIS represent a simplified view of physical entities or phenomena 1. Spatial location.
Do Now  47 TCNJ students were asked to complete a survey on campus clubs and activities. 87% of the students surveyed participate in campus clubs and.
CENG 770. Data mining (knowledge discovery from data) – Extraction of interesting ( non-trivial, implicit, previously unknown and potentially useful)
Statistics Vocabulary. 1. STATISTICS Definition The study of collecting, organizing, and interpreting data Example Statistics are used to determine car.
1 Data Mining Lecture 02a: Data Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook authors)
Statistics Terminology. What is statistics? The science of conducting studies to collect, organize, summarize, analyze, and draw conclusions from data.
Geog. 314 Working with tables.
Data Preliminaries CSC 600: Data Mining Class 1.
Variables and Types of Data
Unit 1 Section 1.2.
Chapter 1 Chapter 1 Introduction to Statistics Larson/Farber 6th ed.
Pharmaceutical Statistics
Elementary Statistics
What is Data? Attributes Objects An attribute is a property or
Data Mining Lecture 02a: Theses slides are based on the slides by Data
Chapter 2: Getting to Know Your Data
Lecture Notes for Chapter 2 Introduction to Data Mining
Chapter 1 Chapter 1 Introduction to Statistics Larson/Farber 6th ed.
Welcome to Statistics World
Lecture Notes for Chapter 2 Introduction to Data Mining
statistics Specific number
CISC 4631 Data Mining Lecture 02:
Lecture Notes for Chapter 2 Introduction to Data Mining
Basic Statistical Terms
Vocabulary of Statistics
statistics Specific number
Course code:- PGPPA2F007T STATISTICAL METHODS AND COMPUTER APPLICATIONS.
Data Quality Data Exploration
Chapter 1 Chapter 1 Introduction to Statistics Larson/Farber 6th ed.
Data Mining Lecture 02a: Theses slides are based on the slides by Data
Group 9 – Data Mining: Data
Measurement Measurement is the assignment of numbers (or other symbols) to characteristics (or objects) according to certain pre-specified rules. One-to-one.
Data Pre-processing Lecture Notes for Chapter 2
Chapter 1 Chapter 1 Introduction to Statistics Larson/Farber 6th ed.
Biostatistics Lecture (2).
Lecture Slides Essentials of Statistics 5th Edition
Business Statistics For Contemporary Decision Making 9th Edition
Presentation transcript:

Data Preliminaries CSC 576: Data Mining

Today Types of Data Data Quality Exploring Data: Summary Statistics, Visualizations

What is Data? Attributes Objects Collection of data objects and their attributes An attribute is a property or characteristic of an object Examples: eye color of a person, temperature, etc. Attribute is also known as variable, field, characteristic, or feature A collection of attributes describe an object Object is also known as record, point, case, sample, entity, or instance

Attribute Values Attribute values are numbers or symbols assigned to an attribute Distinction between attributes and attribute values Same attribute can be mapped to different attribute values Example: height can be measured in feet or meters Different attributes can be mapped to the same set of values Example: Attribute values type for both “ID” and “age” are integers But properties of attribute values can be different ID has no limit but age has a maximum and minimum value

Different Types of Attributes Nominal / Categorical Examples: ID numbers, eye color, zip codes Ordinal Examples: rankings, size in {small, medium, large} Interval Examples: calendar dates Ratio Examples: counts, time

Properties of Attribute Values The type of an attribute depends on which of the properties it possesses Nominal Eye color = ≠ Ordinal Size {small, medium, large} = ≠ < > ≤ ≥ Interval Calendar dates = ≠ < > ≤ ≥ + − Ratio Counts, time = ≠ < > ≤ ≥ + − × ÷

Ordinal vs. Interval vs. Ratio Order matters, but not the difference between values Difference between 7 and 5 may not be same as difference between 5 and 3 Interval Difference between two values is meaningful 100 degrees – 90 degrees is same difference as 90 degrees – 80 degrees Temperature of 100 degrees is not twice as hot as 50 degrees Ratio Clear definition of 0.0; none of a variable at 0.0 Weight of 8 grams is twice the weight of 4 grams (Temperature 100 Kelvin is twice as hot as 50 Kelvin; Kelvin is Ratio; 0.0 Kelvin means “no heat”)

Discrete and Continuous Attributes Has finite attribute values Often represented as integer variables Examples: zip codes, counts, {1,2,3,…} (Note: binary 0/1 attributes are special case of discrete.) Has real numbers as attribute values Often represented as doubles (floating-pt variables) Examples: height, temperature, 3.14159 (Practically, real values can only be measured and represented using a finite number of digits)

Ordered Data Temporal Data Sequential Data Each record has a time associated with it Example: retail transaction Sequential Data Dataset has sequence of individual entities (such as sequence of words or letters) Example: DNA sequence (ATGC possible letters)

Ordered Data Time Series Spatial Data Series of measurements taken over time Example: financial stock price data Temporal autocorrelation: if two measurements are close in time, then the value of those measurements are often very similar Spatial Data Each record has a position or area Example: geographical locations Spatial autocorrelation: objects that are physically close tend to be similar

References Fundamentals of Machine Learning for Predictive Data Analytics, 1st Edition, Kelleher et al. Introduction to Data Mining, 1st edition, Tan et al.