Instance Construction via Likelihood- Based Data Squashing Madigan D., Madigan D., et. al. (Ch 12, Instance selection and Construction for Data Mining.

Instance Construction via Likelihood- Based Data Squashing Madigan D., Madigan D., et. al. (Ch 12, Instance selection and Construction for Data Mining (2001), (Ch 12, Instance selection and Construction for Data Mining (2001), Kruwer Academic Publishers) Summarize: Jinsan Yang, SNU Biointelligence Lab

 Abstract Data Compression Method: Squashing LDS: Likelihood based data squashing  Keywords Instance Construction, Data Squashing

Outline  Introduction  The LDS Algorithm  Evaluation: Logistic Regression  Evaluation: Neural Networks  Iterative LDS  Discussion

Introduction  Massive data examples Large-scale retailing Telecommunications Astronomy Computational biology Internet logging  Some computational challenges Need of multiple passes for data access 10^5~6 times slower than main memory Current Solution:Scaling up existing algorithm Here: Scaling down the data  Data squashing: 750000  8443 ( DuMouchel et al (1999), Outperforms by a factor of 500 in MSE than random sample of size 7543

LDS Algorithm  Motivation: Bayesian rule Given three data points d1,d2,d3, estimate the parameter : Clusters by likelihood profile:

LDS Algorithm  Details of LDS Algorithm [Select] Values of by a central composite design Central composite Design for 3 factors

LDS Algorithm [Profile] Evaluate the likelihood profiles [Cluster] Cluster the mother data in a single pass -Select n’ random samples as initial cluster centers -Assign the remaining data to each cluster [Construct] Construct the Pseudo data: cluster center

Evaluation: Logistic Regression Small-scale simulations: Initial estimate of Plot: Log (Error Ratio) Three methods of initial parameter estimations 100 data / 48 squashed data

Evaluation: Logistic Regression  Medium Scale: 100000, base: 1% simple random sampling

Evaluation: Logistic Regression  Large Scale: 744963, base: 1% simple random sampling

Evaluation: Neural Networks  Feed forward, two input nodes, one hidden layer with 3 units, Single binary output  Mother data: 10000, Squashed data: 1000, repetitions:30 test data: 1000 from the same network  Comparisons for P(whole) - P(reduced)

Evaluation: Neural Networks

Iterative LDS  When the estimation of is not accurate. 1. Set from simple random sampling 2. Squash by LDS 3. Estimate 4. Go to 2.

Instance Construction via Likelihood- Based Data Squashing Madigan D., Madigan D., et. al. (Ch 12, Instance selection and Construction for Data Mining.

Similar presentations

Presentation on theme: "Instance Construction via Likelihood- Based Data Squashing Madigan D., Madigan D., et. al. (Ch 12, Instance selection and Construction for Data Mining."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Instance Construction via Likelihood- Based Data Squashing Madigan D., Madigan D., et. al. (Ch 12, Instance selection and Construction for Data Mining.

Similar presentations

Presentation on theme: "Instance Construction via Likelihood- Based Data Squashing Madigan D., Madigan D., et. al. (Ch 12, Instance selection and Construction for Data Mining."— Presentation transcript:

Similar presentations

About project

Feedback