Data Mining, Neural Network and Genetic Programming

Slides:

Advertisements

Similar presentations

Request Dispatching for Cheap Energy Prices in Cloud Data Centers

Advertisements

SpringerLink Training Kit

Luminosity measurements at Hadron Colliders

From Word Embeddings To Document Distances

Choosing a Dental Plan Student Name

Virtual Environments and Computer Graphics

Chương 1: CÁC PHƯƠNG THỨC GIAO DỊCH TRÊN THỊ TRƯỜNG THẾ GIỚI

THỰC TIỄN KINH DOANH TRONG CỘNG ĐỒNG KINH TẾ ASEAN –

D. Phát triển thương hiệu

NHỮNG VẤN ĐỀ NỔI BẬT CỦA NỀN KINH TẾ VIỆT NAM GIAI ĐOẠN

Điều trị chống huyết khối trong tai biến mạch máu não

BÖnh Parkinson PGS.TS.BS NGUYỄN TRỌNG HƯNG BỆNH VIỆN LÃO KHOA TRUNG ƯƠNG TRƯỜNG ĐẠI HỌC Y HÀ NỘI Bác Ninh 2013.

Nasal Cannula X particulate mask

Evolving Architecture for Beyond the Standard Model

HF NOISE FILTERS PERFORMANCE

Electronics for Pedestrians – Passive Components –

Parameterization of Tabulated BRDFs Ian Mallett (me), Cem Yuksel

L-Systems and Affine Transformations

CMSC423: Bioinformatic Algorithms, Databases and Tools

Some aspect concerning the LMDZ dynamical core and its use

Bayesian Confidence Limits and Intervals

实习总结（Internship Summary)

Current State of Japanese Economy under Negative Interest Rate and Proposed Remedies Naoyuki Yoshino Dean Asian Development Bank Institute Professor Emeritus,

Front End Electronics for SOI Monolithic Pixel Sensor

Face Recognition Monday, February 1, 2016.

Solving Rubik's Cube By: Etai Nativ.

CS284 Paper Presentation Arpad Kovacs

انتقال حرارت 2 خانم خسرویار.

Summer Student Program First results

Theoretical Results on Neutrinos

HERMESでのHard Exclusive生成過程による核子内クォーク全角運動量についての研究

Wavelet Coherence & Cross-Wavelet Transform

yaSpMV: Yet Another SpMV Framework on GPUs

Creating Synthetic Microdata for Higher Educational Use in Japan: Reproduction of Distribution Type based on the Descriptive Statistics Kiyomi Shirakawa.

MOCLA02 Design of a Compact L-band Transverse Deflecting Cavity with Arbitrary Polarizations for the SACLA Injector Sep. 14th, 2015 H. Maesaka, T. Asaka,

Hui Wang†*, Canturk Isci‡, Lavanya Subramanian*,

Fuel cell development program for electric vehicle

Overview of TST-2 Experiment

Optomechanics with atoms

داده کاوی سئوالات نمونه

Inter-system biases estimation in multi-GNSS relative positioning with GPS and Galileo Cecile Deprez and Rene Warnant University of Liege, Belgium

ლექცია 4 - ფული და ინფლაცია

10. predavanje Novac i financijski sustav

Wissenschaftliche Aussprache zur Dissertation

FLUORECENCE MICROSCOPY SUPERRESOLUTION BLINK MICROSCOPY ON THE BASIS OF ENGINEERED DARK STATES* *Christian Steinhauer, Carsten Forthmann, Jan Vogelsang,

Particle acceleration during the gamma-ray flares of the Crab Nebular

Interpretations of the Derivative Gottfried Wilhelm Leibniz

Advisor: Chiuyuan Chen Student: Shao-Chun Lin

Widow Rockfish Assessment

SiW-ECAL Beam Test 2015 Kick-Off meeting

On Robust Neighbor Discovery in Mobile Wireless Networks

Chapter 6 并发：死锁和饥饿 Operating Systems: Internals and Design Principles

You NEED your book!!! Frequency Distribution

Y V =0 a V =V0 x b b V =0 z

Fairness-oriented Scheduling Support for Multicore Systems

Climate-Energy-Policy Interaction

Hui Wang†*, Canturk Isci‡, Lavanya Subramanian*,

Ch48 Statistics by Chtan FYHSKulai

The ABCD matrix for parabolic reflectors and its application to astigmatism free four-mirror cavities.

Measure Twice and Cut Once: Robust Dynamic Voltage Scaling for FPGAs

Online Learning: An Introduction

Factor Based Index of Systemic Stress (FISS)

What is Chemistry? Chemistry is: the study of matter & the changes it undergoes Composition Structure Properties Energy changes.

THE BERRY PHASE OF A BOGOLIUBOV QUASIPARTICLE IN AN ABRIKOSOV VORTEX*

Quantum-classical transition in optical twin beams and experimental applications to quantum metrology Ivano Ruo-Berchera Frascati.

The Toroidal Sporadic Source: Understanding Temporal Variations

FW 3.4: More Circle Practice

ارائه یک روش حل مبتنی بر استراتژی های تکاملی گروه بندی برای حل مسئله بسته بندی اقلام در ظروف

Decision Procedures Christoph M. Wintersteiger 9/11/2017 3:14 PM

Limits on Anomalous WWγ and WWZ Couplings from DØ

Presentation transcript:

Data Mining, Neural Network and Genetic Programming COMP422 Week 2 DM Tasks and Algorithms Mengjie Zhang, Yi Mei mengjie.zhang@ecs.vuw.ac.nz yi.mei@ecs.vuw.ac.nz

Outline DM development history DM related fields and relationships DM common tasks and examples DM common algorithms Bayesian classification approach Support vector machine Web mining DM issues and challenges

KDD/DM History Time Area Contribution Late 1700s Stat Bayes theorem of probability Early 1900s Regression analysis Early 1920s Maximum likelihood estimation Early 1940s AI Neural networks Early 1950s Nearest neighbour Single link Late 1950s Percepton Resampling, bias redution Early 1960s Machine learning started DB Batch report Mid 1960s Decision tree Linear model for classification IR Similarity measure Clustering

KDD/DM History Time Area Contribution Mid 1960s Stat Exploratory data analysis Late 1960s DB Relational data model Early 1970s IR SMART IR system AI Expert system Mid 1970s Genetic algorithms Late 1970s Estimation with incomplete data K-means clustering Early 1980s Kohonen self-organising maps Mid 1980s Decision tree algorithms Early 1990s DB? Association rule algorithms Web and search engine Genetic programming Late 1990s AI/Stat Support vector machines 2000s AI/DB Deep learning, Big data, …

KDD Related Fields KDD/DM is multidisciplinary AI, Machine Learning, Statistics, Pattern Recognition Databases (Parallel DBMS, Deductive Databases) Knowledge Acquisition, Expert Systems, Decision Support Systems Data Visualisation Fuzzy Set, Fuzzy Logic, Uncertainty Machine Discovery and Casual Modeling Data Warehousing High Performance Computing Image Analysis, Signal Processing Web Search Engine, Information Extraction

KDD Related Fields

Artificial Intelligence

Artificial Intelligence Machine mimics “cognitive” functions such as “learning” and “problem solving” Understanding human speech Playing games Autonomous driving Intelligent routing/delivery Interpreting complex data … Related to DM, but many others Symbolic AI Planning and scheduling Search

Machine Learning Basis for many core DM research topics Examine existing data (training set) to produce some “rules” (KDD objects) Apply these rules/KDD objects to unseen data (test set) Often used for classification and regression (prediction) Supervised learning vs unsupervised learning Examples: Neural network Genetic algorithm/programming Decision trees (C4.5, C5.0, …)

Statistics Probability Distributions to describe domains for different data attributes Statistical inference Bayesian approach Regression

Fuzzy Set and Fuzzy Logic Fuzzy set: A pair (𝑈,𝑚 ⋅ ) 𝑈 is the set 𝑚:𝑈→[0,1] is the membership function Example: “tall” set Crisp set Fuzzy set

Fuzzy Set and Fuzzy Logic Can have multiple fuzzy sets Example: characterise temperature Three fuzzy sets: cold, warm, hot

Fuzzy Set and Fuzzy Logic Fuzzy logic: reasoning with uncertainty Example: characterise temperature Three fuzzy sets: cold, warm, hot “fairly cold” “slightly warm” “not hot”

Fuzzy Set and Fuzzy Logic Fuzzy logic operators Let 𝑥 and 𝑦 be two fuzzy logic statements 𝑚 ¬𝑥 =1−𝑚 𝑥 𝑚 𝑥∧𝑦 =min⁡(𝑚 𝑥 ,𝑚(𝑦)) 𝑚 𝑥∨𝑦 =max⁡(𝑚 𝑥 ,𝑚(𝑦)) Boolean (True/False) Fuzzy (Membership) AND(x, y) min(m(x), m(y)) OR(x, y) max(m(x), m(y)) NOT(x) 1-m(x)

Fuzzy Set and Fuzzy Logic

Data Warehousing Central repositories of integrated data from one or more disparate sources Store current and historical data + Create analytical reports OLAP vs OLTP in operational databases

Web Search Engine, Web search engines search online documents related to a particular topic based on keyword search Examples: Google, Yahoo, Baidu, …

Information Extraction Get particular interesting information from Web Convert human read only to computer read Typical example: natural language processing

DB Management vs Machine Learning DB is an active, evolving entity DM is static Records may contain erroneous or missing data DBs are complete and noisy free Typical field is numeric Typical field is binary DB contains millions of records DM contains hundreds of instances AI should get down to reality All DB problems have been solved.

DM Tasks Classification Regression Prediction Time Series Analysis Clustering Summarisation Association Rules Sequencing Discovery

Classification Maps data into predefined groups or classes Supervised learning Classes are predefined/determined based on data attribute values before examining the data Examples Medical: cancer vs not cancer Bank: credit reliable vs unreliable Digit recognition: multi-class Weather: sunny or rainy Anomaly detection Airport security screening: terrorist/criminals or not

Regression Map a data item to a real-valued prediction variable Learning a function Assume a certain function type (e.g. linear, logistic, polynomial, …) and determine the best function of this type to fit the given data Examples Financial prediction Saving prediction Ad cost vs sales

Classification vs Regression

Prediction Prediction is similar to classification/regression The difference is that prediction is predicting a future state in time rather than a current state Example Flood prediction Weather forecasting

Time Series Analysis A special case: the attribute to be examined varies over time Example: stock market (prices for X, Y and Z in one month)

Clustering Similar to classification, except that the groups are not predefined, but rather defined by the data itself Unsupervised learning Segmenting or partitioning data into groups that might or night not disjointed Done by determining the similarity among the data on predefined attributes A domain expert is needed to interpret the meaning Segmentation: a special type of clustering: a DB is partitioned into disjointed groups of similar tuples called segments.

Clustering

Clustering Self-organising map A special method for clustering & high-dimensional data visualisation

Summarisation Also called characterisation or generalisation Maps data into subsets with associated simple descriptions Clustering Sampling Compression (e.g. image) Histogram …

Association Rules Link analysis = association Uncover relationships among data An association rule is a model that identifies specific types of data associations, which are often used in the retail sales community to identify items that are frequently purchased together

Association Rules Link analysis = association Uncover relationships among data An association rule is a model that identifies specific types of data associations, which are often used in the retail sales community to identify items that are frequently purchased together Beer & Nappies

Sequence Discovery Determine sequential patterns in data Similar to associations in that data or events are found to be related, but relationship is based on time Example 1: most people who purchase CD player may be found to buy CDs within one week. Example 2: A webmaster at a company is to determine what sequences of web pages are frequently accessed. He found that 70% of the users of Page A follow one of the following patterns: <A, B, C>, <A, D, B, C> or <A, E, B, C>

Data Mining Algorithms Statistical-based: Regression, Bayesian classification Distance-based: Nearest neighbour, nearest centroid, K-Nearest neighbour Decision tree-based: ID3, C4.5/C5.0, … Neural network-based: perceptron learning, back propagation learning, … Evolutionary-based: genetic programming Rule-based: generating rules from a decision tree, from a neural network, from a GP tree Hybrid: use GA to train weights of an NN, NEAT, …

Bayes Approach Statistical inference Given a data set: 𝑋={ 𝑥 1 , 𝑥 2 ,…, 𝑥 𝑛 } A hypothesis space: 𝐻={ ℎ 1 , ℎ 2 ,…, ℎ 𝑚 } Assume that one and only one hypothesis must occur at a time, and that 𝑥 𝑖 is an observable event. The Bayes rule estimates the likelihood of a hypothesis given the set of data as evidence or input: Posterior probability: Prior probability: Unconditional probability: Conditional Probability:

Bayes Classification Example Credit loan authorisation Four classes/hypotheses: ℎ 1 : authorise purchase ℎ 2 : authorise after further identification ℎ 3 : do not authorise ℎ 4 : do not authorise and contact police

Bayes Classification Example Categorise incoming into range1 = [$0, $10,000), range2 = [$10,000, $50,000), range3 = [$50,000, $100,000), range4 = [$100,000, ∞) Credit records: {excellent, good, bad} Then we have 12 values in the data space Given “Income = $130,000” and “Credit record = Excellent”, which class/hypothesis it belongs to?

Bayes Classification Example Prob h1 6/10 h2 2/10 h3 1/10 h4 x Prob x1 x2 2/10 x3 x4 1/10 x5 x6 x7 x8 x9 x10 x11 x12 𝑷( 𝒙 𝒊 | 𝒉 𝒋 ) h1 h2 h3 h4 x1 x2 2/6 x3 x4 1/6 x5 x6 x7 x8 x9 1 x10 1/2 x11 x12 What is the problem here?

Bayes Classification Example Dealing with zero counts (initially 1 for all occurrences Assumption on conditional independence

Support Vector Machines (SVMs) To find the optimal hyperplane that correctly classifies data points as much as possible and separates the points as far as possible

Support Vector Machines (SVMs) To find the optimal hyperplane that correctly classifies data points as much as possible and separates the points as far as possible Unseen data points

SVM Main Idea Given a set of data points which belong to either of two classes, an SVM finds the hyperplane: Leave the largest fraction of points of the same class on the same side, and Maximise the distance of either class from the hyperplane Hard margin or Soft margin (class) y = 1 (class) y = -1 Penalty

SVM for Non-linear Classification What if the data is non-linearly separable? Kernel machine: transform data to higher dimension so it becomes linear separable 𝑥,𝑦 →(𝑥,𝑦, 𝑥 2 + 𝑦 2 )

SVM for Non-linear Classification Everything remains the same, except that the dot products are replaced by a nonlinear kernel function Resultant classifier is linear in the transformed high-dimensional space, but non-linear in the original space

Vapnik-Chevonenkis (VC) Dimension Measure the capacity (complexity, expressive power, richness, or flexibility) of a space of functions that can be learned by a statistical classification algorithm. Shattered: A set of functions 𝑓(𝜃), where 𝜃 is the parameter of the function, is said to shatter a set of data points { 𝑥 1 ,…, 𝑥 𝑛 } if, for all assignments of class labels to those points, there exists a 𝜃 value so that 𝑓(𝜃) can perfectly classify the data points. The VC Dimension of 𝑓(𝜃) is the maximal number of points so that 𝑓(𝜃) can shatter them.

VC Dimension Examples If 𝑓 𝜃 = 𝑤 1 𝑥 1 + 𝑤 2 𝑥 2 −𝑏 is a linear classifier in 2D space, there exist sets of 3 points that are shattered by the model. However, no set of 4 points can be shattered.

VC Dimension Examples 𝑓 is constant classifier (with no parameters), it returns 1 if the input is larger than 𝑐, and 0 otherwise. What is its VC dimension? 𝑓 is a single-parametric (threshold) classifier, it returns 1 if the input is larger than 𝜃, and 0 otherwise. What is its VC dimension? 𝑓 is a single-parametric interval classifier, it returns 1 if the input is in the interval 𝜃,𝜃+4 , and 0 otherwise. What is its VC dimension? 𝑓 is a single-parametric sine classifier, it returns 1 if the input is larger than sin⁡(𝜃𝑥), and 0 otherwise. What is its VC dimension?

Expected Risk Given N observations, each as a pair (𝒙 𝑖 , 𝑦 𝑖 ) The data points are independently and identically distributed, and follow some unknown distribution 𝑃 𝒙,𝑦 A machine/function is to learn the mapping 𝑓 𝒙, 𝛼 :𝑿→𝑦∈{−1,1}, where 𝛼 is the parameter Expected risk: Empirical risk:

Bound for Expected Risk According to the PAC model, the bound for the expected risk which holds with probability 1−𝜂 ℎ is the VC dimension of 𝑓 𝒙,𝛼 Smaller VC dimension  Smaller bound of expected risk Empirical risk VC confidence Bound of expected risk Empirical risk VC dimension confidence = +

Structural Risk Minimisation To make the expected risk small Minimise the empirical risk Minimise the VC dimension

Structural Risk Minimisation Using a priori knowledge of the domain, choose a class of functions, such as polynomials of degree n, neural networks having n hidden layer neurons, a set of splines with n nodes or fuzzy logic models having n rules. Divide the class of functions into a hierarchy of nested subsets in order of increasing complexity (VC dimension). For example, polynomials of increasing degree. Perform empirical risk minimization on each subset (this is essentially parameter selection). Select the model in the series whose sum of empirical risk and VC confidence is minimal.

Structural Risk Minimisation Well founded mathematically, but difficult to implement VC dimension is hard to compute, only a few models for which we know how to compute VC dimension Even if we know how to calculate the VC dimension, not easy to solve the optimisation problem (need to minimise the empirical risk for each subset, and the choose the best one)

SVM for Structural Risk Minimisation SVMs use the spirit of the SRM principle In SVM, the model (hyperplane) is represented as In SVM, the VC dimension is determined by The set of functions VC dimension SVM tries to minimise the empirical risk SVM tries to maximise the margin (smaller VC dimension)

SVM for Structural Risk Minimisation

SVM Issues/Problems Multiple classes Feature selection One-against-the-rest One-against-one Feature selection Need to select a good kernel function Usually require lots of memory and CPU time

Web Mining Web mining is mining of data related to the World Wide Web

Web Mining

Web Mining Web content mining: mining, extraction and integration of useful data, information and knowledge from Web page content Topic discovery, extract association patterns, clustering/classifying Web pages Related to Information Retrieval Text Mining, Natural Language Processing Computer Vision & Image Processing

Web Mining Web structure mining: use graph theory to analyse the node and connection structure of a web site. Extract patterns from hyperlinks Mining the document structure (HTML/XML tags) Example: PageRank (Google) Rank search results Estimate the quality of the Web pages based on hyperlinks: a page has a high rank if it is pointed to by many highly ranked pages

Web Mining Web usage mining: discover interesting usage patterns -- identity or origin of Web users, and their browsing behaviours Applications Personalised marketing Identify terrorism threats Ethical issues: invasion of privacy

Issues and Challenges High dimensionality: hundreds of fields, tables, millions of records -- Big Data Overfitting Randomness: if search is over many models, some models will fit well by chance Dynamism: Data and knowledge change over time. Missing and noisy data. Complex relationships between fields Understandability of patterns User interaction and prior knowledge Integrating with other systems Blind use of methods by incompetents leads to meaningless and invalid patterns

Summary DM related fields and relationship DM tasks and examples DM algorithms Bayes method Support vector machine Structural Risk Minimisation Web mining DM Issues and Challenges