Descriptive Exploratory Data Analysis III Jagdish S. Gangolly State University of New York at Albany.

Descriptive Exploratory Data Analysis III Jagdish S. Gangolly State University of New York at Albany

Trellis Graphics I Syntax: Dependent variable ~ explanatory variable |conditioning variable Data set Output: >trellis.device(motif) >dev.off() or >graphics.off()

Trellis Graphics II Example: histogram(~height | voice.part, data=singer) –No dependent variable for histogram –Height is explanatory variable –Data set is singer

Trellis Graphics III Layout: layout and skip and aspect parameters (p.147). Ordering graphs: left to right, bottom to top. If as.table=T, left to right top to bottom p.149).

Descriptive Data Exploration summary : mean, median, quartiles p.171 stem : stem and leaf display p.171 quantile p.172 stdev p.173 tapply : splits data p.174 by p.175 mean works on vector, and other structures need to be converted to vectors before computing means. (example on p.176-7)

Data Preprocessing for Datamining I Why –Incomplete Attribute values not available, equipment malfunctions, not considered important –Noisy (errors) instrument problems, human/computer errors, transmission errors –Inconsistent inconsistencies due to data definitions

Data Preprocessing for Datamining II Data Cleaning –Missing values: ignore tuple, fill-in values manually, use a global constant (unknown), missing value=attribute mean, missing value = attribute group mean, missing value= most probable value –Noisy data: Binning: partitioning into equi-sized bins, smoothing by bin means or bin boundaries Clustering Inspection: computer & human Regression –Inconsistencies

Data Preprocessing for Datamining III Data Integration: Combining data from different sources into a coherent whole –Schema integration: combining data models (entity identification problems) –Redundancy (derived values, calculated fields, use of different key attributes): use of correlations to detect redundancies –Resolution of data value conflicts (coding values in different measures)

Data Preprocessing for Datamining III Transformation –Smoothing –Aggregation –Generalisation –Normalisation –Attribute (or feature) construction

Data Preprocessing for Datamining IV Data Reduction & compression –Data cube aggregation (p.117) –Dimension reduction: minimise loss of information. Attribute selection Decision tree induction Principal components analysis

Data Preprocessing for Datamining IV –Numerosity reduction Regression/log-linear regression histograms Clustering

Descriptive Exploratory Data Analysis III Jagdish S. Gangolly State University of New York at Albany.

Similar presentations

Presentation on theme: "Descriptive Exploratory Data Analysis III Jagdish S. Gangolly State University of New York at Albany."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Descriptive Exploratory Data Analysis III Jagdish S. Gangolly State University of New York at Albany.

Similar presentations

Presentation on theme: "Descriptive Exploratory Data Analysis III Jagdish S. Gangolly State University of New York at Albany."— Presentation transcript:

Similar presentations

About project

Feedback