Semi-Supervised Time Series Classification & DTW-D REPORTED BY WANG YAWEN.

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

PEBL: Web Page Classification without Negative Examples Hwanjo Yu, Jiawei Han, Kevin Chen- Chuan Chang IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING,
SAX: a Novel Symbolic Representation of Time Series
Chapter 5 Multiple Linear Regression
Three things everyone should know to improve object retrieval
Word Spotting DTW.
Statistics Review – Part II Topics: – Hypothesis Testing – Paired Tests – Tests of variability 1.
Carolina Galleguillos, Brian McFee, Serge Belongie, Gert Lanckriet Computer Science and Engineering Department Electrical and Computer Engineering Department.
Mining Time Series.
Watching Unlabeled Video Helps Learn New Human Actions from Very Few Labeled Snapshots Chao-Yeh Chen and Kristen Grauman University of Texas at Austin.
Locally Constraint Support Vector Clustering
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
Lecture Notes for CMPUT 466/551 Nilanjan Ray
Classification Dr Eamonn Keogh Computer Science & Engineering Department University of California - Riverside Riverside,CA Who.
Robust Real-time Object Detection by Paul Viola and Michael Jones ICCV 2001 Workshop on Statistical and Computation Theories of Vision Presentation by.
Reduced Support Vector Machine
Making Time-series Classification More Accurate Using Learned Constraints © Chotirat “Ann” Ratanamahatana Eamonn Keogh 2004 SIAM International Conference.
Evaluation.
1 Ensembles of Nearest Neighbor Forecasts Dragomir Yankov, Eamonn Keogh Dept. of Computer Science & Eng. University of California Riverside Dennis DeCoste.
Using Relevance Feedback in Multimedia Databases
Semi-supervised protein classification using cluster kernels Jason Weston, Christina Leslie, Eugene Ie, Dengyong Zhou, Andre Elisseeff and William Stafford.
Exact Indexing of Dynamic Time Warping
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
Active Learning for Class Imbalance Problem
Discovering the Intrinsic Cardinality and Dimensionality of Time Series using MDL BING HU THANAWIN RAKTHANMANON YUAN HAO SCOTT EVANS1 STEFANO LONARDI EAMONN.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
 Text Representation & Text Classification for Intelligent Information Retrieval Ning Yu School of Library and Information Science Indiana University.
ECE 8443 – Pattern Recognition Objectives: Error Bounds Complexity Theory PAC Learning PAC Bound Margin Classifiers Resources: D.M.: Simplified PAC-Bayes.
Experimental Evaluation of Learning Algorithms Part 1.
 2003, G.Tecuci, Learning Agents Laboratory 1 Learning Agents Laboratory Computer Science Department George Mason University Prof. Gheorghe Tecuci 5.
Mining Time Series.
BING: Binarized Normed Gradients for Objectness Estimation at 300fps
1 ICPR 2006 Tin Kam Ho Bell Laboratories Lucent Technologies.
 2003, G.Tecuci, Learning Agents Laboratory 1 Learning Agents Laboratory Computer Science Department George Mason University Prof. Gheorghe Tecuci 9 Instance-Based.
Xiangnan Kong,Philip S. Yu Multi-Label Feature Selection for Graph Classification Department of Computer Science University of Illinois at Chicago.
Transductive Regression Piloted by Inter-Manifold Relations.
A Novel Local Patch Framework for Fixing Supervised Learning Models Yilei Wang 1, Bingzheng Wei 2, Jun Yan 2, Yang Hu 2, Zhi-Hong Deng 1, Zheng Chen 2.
Extending the Multi- Instance Problem to Model Instance Collaboration Anjali Koppal Advanced Machine Learning December 11, 2007.
D. M. J. Tax and R. P. W. Duin. Presented by Mihajlo Grbovic Support Vector Data Description.
1 Data Mining: Concepts and Techniques (3 rd ed.) — Chapter 12 — Jiawei Han, Micheline Kamber, and Jian Pei University of Illinois at Urbana-Champaign.
Semi-Supervised Time Series Classification Li Wei Eamonn Keogh University of California, Riverside {wli,
Gang WangDerek HoiemDavid Forsyth. INTRODUCTION APROACH (implement detail) EXPERIMENTS CONCLUSION.
Exact indexing of Dynamic Time Warping
DTW-D: Time Series Semi-Supervised Learning from a Single Example Yanping Chen 1.
Social Tag Prediction Paul Heymann, Daniel Ramage, and Hector Garcia- Molina Stanford University SIGIR 2008.
DATA MINING WITH CLUSTERING AND CLASSIFICATION Spring 2007, SJSU Benjamin Lam.
Optimal Dimensionality of Metric Space for kNN Classification Wei Zhang, Xiangyang Xue, Zichen Sun Yuefei Guo, and Hong Lu Dept. of Computer Science &
COP5992 – DATA MINING TERM PROJECT RANDOM SUBSPACE METHOD + CO-TRAINING by SELIM KALAYCI.
Multiple Instance Learning for Sparse Positive Bags Razvan C. Bunescu Machine Learning Group Department of Computer Sciences University of Texas at Austin.
Accelerating Dynamic Time Warping Clustering with a Novel Admissible Pruning Strategy Nurjahan BegumLiudmila Ulanova Jun Wang 1 Eamonn Keogh University.
Chapter 13 (Prototype Methods and Nearest-Neighbors )
Iterative similarity based adaptation technique for Cross Domain text classification Under: Prof. Amitabha Mukherjee By: Narendra Roy Roll no: Group:
Text Clustering Hongning Wang
Fast Query-Optimized Kernel Machine Classification Via Incremental Approximate Nearest Support Vectors by Dennis DeCoste and Dominic Mazzoni International.
Multi-Criteria-based Active Learning for Named Entity Recognition ACL 2004.
Eick: kNN kNN: A Non-parametric Classification and Prediction Technique Goals of this set of transparencies: 1.Introduce kNN---a popular non-parameric.
Incremental Reduced Support Vector Machines Yuh-Jye Lee, Hung-Yi Lo and Su-Yun Huang National Taiwan University of Science and Technology and Institute.
Instance-Based Learning Evgueni Smirnov. Overview Instance-Based Learning Comparison of Eager and Instance-Based Learning Instance Distances for Instance-Based.
A Binary Linear Programming Formulation of the Graph Edit Distance Presented by Shihao Ji Duke University Machine Learning Group July 17, 2006 Authors:
Anomaly Detection Carolina Ruiz Department of Computer Science WPI Slides based on Chapter 10 of “Introduction to Data Mining” textbook by Tan, Steinbach,
Feature learning for multivariate time series classification Mustafa Gokce Baydogan * George Runger * Eugene Tuv † * Arizona State University † Intel Corporation.
Data Science Algorithms: The Basic Methods
Cross Domain Distribution Adaptation via Kernel Mapping
Table 1. Advantages and Disadvantages of Traditional DM/ML Methods
Supervised Time Series Pattern Discovery through Local Importance
PEBL: Web Page Classification without Negative Examples
Special Topics in Text Mining
Nearest Neighbors CSC 576: Data Mining.
Semi-Supervised Time Series Classification
CAMCOS Report Day December 9th, 2015 San Jose State University
Presentation transcript:

Semi-Supervised Time Series Classification & DTW-D REPORTED BY WANG YAWEN

Outline Semi-Supervised Time Series Classification(KDD_06) ◦Background ◦Time Series Classification ◦Semi-Supervised Time Series Classification ◦Empirical Evaluation DTW-D: Time Series Semi-Supervised Learning from a single sample(KDD_13) ◦Introduction ◦DTW-D ◦Two Key Assumptions ◦Algorithm ◦Experiment

Semi-Supervised Time Series Classification LI WEI, EAMONN KEOGH DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING UNIVERSITY OF CALIFORNIA, RIVERSIDE {WLI,

Semi-Supervised Time Series Classification Background ◦Labeled training data  difficult or expensive to obtain ◦Use the value of unlabeled data

Semi-Supervised Time Series Classification Background ◦Learning from both labeled and unlabeled data  SSL ◦Semi-supervised technique for building time series classifiers that take advantage of the large collections of unlabeled data ◦Only a handful of labeled examples to construct accurate classifiers ◦Note: the usefulness of unlabeled data depends on the critical assumption that the underlying models / features / kernels / similarity functions match well with the problem at hand

Semi-Supervised Time Series Classification Time Series Classification

Semi-Supervised Time Series Classification Time Series Classification

Semi-Supervised Time Series Classification Time Series Classification under realistic situation ◦It is typically not the case that we have two or more well defined classes ◦Positive class with some structure ◦Negative examples have little or no common structure ◦Can not in general assume that subsequences not belonging to the positive class look similar to each other ◦It is typically the cade that positive labeled examples are rare, but unlabeled data is abundant ◦  building binary time series classifiers for extremely imbalanced class distributions, with only a small number of labeled examples from the positive class

Semi-Supervised Time Series Classification ◦One-nearest-neighbor with Euclidean distance classifier

Semi-Supervised Time Series Classification ◦Training the classifier(train itself) ◦Step 1: train on the initial training set, where all labeled instances are positive and all unlabeled instances are regarded as negative ◦Note: the size of the training set never changes during the training process, but the labeled set is augmented ◦Step 2: classify the unlabeled data in the training set ◦Step 3: among all the unlabeled instances, the one we can most confidently classify as positive is the instance which is closet to the labeled positive examples, which will be added into the positive set. Go back to Step 1. ◦Look BackLook Back

Semi-Supervised Time Series Classification

◦Stooping criterion ◦Precision-recall breakeven point ◦Distance between the closest pair in the labeled positive set ◦Decrease gradually: space gets denser ◦Stabilizing phase: incorporate closest pair ◦Drop: negative example add

Semi-Supervised Time Series Classification Empirical Evaluation ◦Precision-recall breakeven point ◦The value at which precision and recall are equal

Semi-Supervised Time Series Classification Empirical Evaluation ◦ECG dataset

Semi-Supervised Time Series Classification Empirical Evaluation ◦Word Spotting dataset

Semi-Supervised Time Series Classification Empirical Evaluation ◦Gun dataset

Semi-Supervised Time Series Classification Empirical Evaluation ◦Wafer dataset

Semi-Supervised Time Series Classification Empirical Evaluation ◦Yoga dataset

DTW-D: Time Series Semi-Supervised Learning from a Single Example YANPING CHEN, BING HU, EAMONN KEOGH, GUSTAVO E.A.P.A BATISTA1 UNIVERSITY OF CALIFORNIA, RIVERSIDE 1UNIVERSIDADE DE SÃO PAULO-USP {YCHEN053, BHU002,

DTW-D Introduction ◦Unlabeled members of a circumscribed positive class may be closer to some unlabeled members of a diverse negative class than to the labeled positive class

DTW-D ED is an upper bound to DTW

DTW-D

Why did DTW not solve the problem? ◦There are other differences between P1 and U2, including the fact that the first and last peaks have different heights, DTW cannot mitigate this ◦Simple shape tend to be close to everything ◦Smooth, flat or least very slowly changing time series tend to be surprisingly close to other objects ◦Observation: if a class is characterized by the existence of intra-class warping(possibly among other distortions), then we should expect that moving from ED to DTW reduces distances more for intra-class comparisons than interclass caparisons.

DTW-D

Two Key Assumptions ◦Assumption 1: the positive class(the target concept) contains time warped versions of some platonic ideal(some prototypical shape), possibly with other types of noise/distortions. ◦Assumption 2: the negative class may be very diverse, and occasionally by chance produces objects close to a member of the positive class, even under DTW.

DTW-D Assumption 1 is mitigated by large amounts of labeled data ◦Our noted weakness of semi-supervised learning happens when the nearest instance to a labeled positive exemplar is a negative instance. With more labeled positive instances this becomes less and less likely to happen.

DTW-D Assumption 2 is compounded by a large negative dataset ◦If the negative class is random and/or diverse, then the larger the negative class is, the more likely it is that it will produce an instance that just happens to be close to a labeled positive item

DTW-D Assumption 2 is compounded by low complexity negative data ◦A complex time series is one that is not well approximated by few DFT coefficients or by a low degree polynomial. ◦Low complexity data  close to everything

DTW-D Algorithm Details ◦One-class classifier with no training examples from the negative class

DTW-D Why is DTW-D better? ◦Training process(choosing) ◦DTW-D selects better labeled objects than DTW ◦Evaluation process(classification) ◦DTW-D is better at selecting the top K nearest neighbors during evaluation process

DTW-D Comparison to rival methods ◦We favor our rival approaches by offering them with fifty more initial labeled examples ◦We start with a single labeled example

DTW-D Experiment ◦Learning dataset ◦Labeled dataset P: a single positive example ◦Unlabeled dataset U: the rest of objects in the learning dataset ◦Holdout dataset ◦Test the accuracy of the learned classifier

DTW-D Experiment ◦Insect Wingbeat Sound Detection

DTW-D Experiment ◦Historical Manuscript Mining

DTW-D Experiment ◦Activity Recognition

Conclusion Semi-Supervised learning framework when only a small set of labeled examples is available Simple idea that dramatically improves the quality of SSL in time series domain. Future work ◦Revisit the stopping criteria issue in light of DTW-D ◦Consider other avenues where DTW-D may be useful