Introduction to Data Mining Engineering Group in ACL
Outline Introduction Learning Methods Applications Conclusion Engineering Group in ACL2
Introduction Massive data/information in the real world Induced rules/procedures for management ◦ Decreasing human labor ◦ Increasing process efficiency Engineering Group in ACL3
In computer sciences… Machine Learning Data Mining Artificial Intelligence Gene algorithm …. Engineering Group in ACL4
Machine Learning I/III Unsupervised learning ◦ Given points that are drawn from a common distribution (unlabeled) ◦ The goal is to find the interesting distribution ◦ Clustering, Dimension Reduction, … K-means, Neural Network, … Engineering Group in ACL5
Machine Learning II/III Supervised learning ◦ Given points that are known/labeled ◦ The goal is to find the mapping function ◦ Classification, Handwriting Recognition,… Support Vector Machines, Decision Tree,.. Engineering Group in ACL6
Machine Learning III/III Semi-supervised learning ◦ Given both kinds of points mentioned above ◦ The goal is to find the mapping function Unlabeled points are used for adjustment ◦ In the environment of fewer (labeled) information,… Engineering Group in ACL7
Associative Rule Learning Discovering interesting relations between variables in large databases Classification, Bioinformatics,.. ◦ Apriori,.. Engineering Group in ACL8 MilkBreadT-shirtShoes Y YY YYY YY Rule: Milk Bread
Applications in the real world Text Classification Document Classification Web Page Classification … Engineering Group in ACL9
Standard Classification Procedure Preprocessing ◦ Content Retrieving ◦ Sentence Segmentation (in Chinese) ◦ Feature Selection Classification ◦ Model Generation ◦ Prediction Learning ◦ Active Learning Engineering Group in ACL10
Lazy Learning The inductive process is delayed until classification Taking advantage of related and qualitative evidence Able to model complex decision spaces ◦ J. Han and M. Kamber, Data Mining: Concepts and Techniques, Elsevier, Engineering Group in ACL11
Conclusion Machine Learning and Data Mining techniques are handling massive information Observing issues in special domains Engineering Group in ACL12