Download presentation
Presentation is loading. Please wait.
Published byImogen Wade Modified over 9 years ago
1
Descriptive Exploratory Data Analysis III Jagdish S. Gangolly State University of New York at Albany
2
Trellis Graphics I Syntax: Dependent variable ~ explanatory variable |conditioning variable Data set Output: >trellis.device(motif) >dev.off() or >graphics.off()
3
Trellis Graphics II Example: histogram(~height | voice.part, data=singer) –No dependent variable for histogram –Height is explanatory variable –Data set is singer
4
Trellis Graphics III Layout: layout and skip and aspect parameters (p.147). Ordering graphs: left to right, bottom to top. If as.table=T, left to right top to bottom p.149).
5
Descriptive Data Exploration summary : mean, median, quartiles p.171 stem : stem and leaf display p.171 quantile p.172 stdev p.173 tapply : splits data p.174 by p.175 mean works on vector, and other structures need to be converted to vectors before computing means. (example on p.176-7)
6
Data Preprocessing for Datamining I Why –Incomplete Attribute values not available, equipment malfunctions, not considered important –Noisy (errors) instrument problems, human/computer errors, transmission errors –Inconsistent inconsistencies due to data definitions
7
Data Preprocessing for Datamining II Data Cleaning –Missing values: ignore tuple, fill-in values manually, use a global constant (unknown), missing value=attribute mean, missing value = attribute group mean, missing value= most probable value –Noisy data: Binning: partitioning into equi-sized bins, smoothing by bin means or bin boundaries Clustering Inspection: computer & human Regression –Inconsistencies
8
Data Preprocessing for Datamining III Data Integration: Combining data from different sources into a coherent whole –Schema integration: combining data models (entity identification problems) –Redundancy (derived values, calculated fields, use of different key attributes): use of correlations to detect redundancies –Resolution of data value conflicts (coding values in different measures)
9
Data Preprocessing for Datamining III Transformation –Smoothing –Aggregation –Generalisation –Normalisation –Attribute (or feature) construction
10
Data Preprocessing for Datamining IV Data Reduction & compression –Data cube aggregation (p.117) –Dimension reduction: minimise loss of information. Attribute selection Decision tree induction Principal components analysis
11
Data Preprocessing for Datamining IV –Numerosity reduction Regression/log-linear regression histograms Clustering
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.