Semi-Supervised Time Series Classification

Slides:



Advertisements
Similar presentations
A General Framework for Mining Concept-Drifting Data Streams with Skewed Distributions Jing Gao Wei Fan Jiawei Han Philip S. Yu University of Illinois.
Advertisements

Yinyin Yuan and Chang-Tsun Li Computer Science Department
SAX: a Novel Symbolic Representation of Time Series
Mustafa Cayci INFS 795 An Evaluation on Feature Selection for Text Clustering.
Unsupervised Learning Clustering K-Means. Recall: Key Components of Intelligent Agents Representation Language: Graph, Bayes Nets, Linear functions Inference.
K-NEAREST NEIGHBORS AND DECISION TREE Nonparametric Supervised Learning.
K Means Clustering , Nearest Cluster and Gaussian Mixture
Data preprocessing before classification In Kennedy et al.: “Solving data mining problems”
Relational Learning with Gaussian Processes By Wei Chu, Vikas Sindhwani, Zoubin Ghahramani, S.Sathiya Keerthi (Columbia, Chicago, Cambridge, Yahoo!) Presented.
A Novel Scheme for Video Similarity Detection Chu-Hong Hoi, Steven March 5, 2003.
Semi-supervised learning and self-training LING 572 Fei Xia 02/14/06.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Semi-Supervised Learning Using Randomized Mincuts Avrim Blum, John Lafferty, Raja Reddy, Mugizi Rwebangira.
Presented by Zeehasham Rasheed
1 NHDC and PHDC: Local and Global Heat Diffusion Based Classifiers Haixuan Yang Group Meeting Sep 26, 2005.
Using Relevance Feedback in Multimedia Databases
Graph-Based Semi-Supervised Learning with a Generative Model Speaker: Jingrui He Advisor: Jaime Carbonell Machine Learning Department
CS Ensembles and Bayes1 Semi-Supervised Learning Can we improve the quality of our learning by combining labeled and unlabeled data Usually a lot.
Transfer Learning From Multiple Source Domains via Consensus Regularization Ping Luo, Fuzhen Zhuang, Hui Xiong, Yuhong Xiong, Qing He.
Learning from Imbalanced, Only Positive and Unlabeled Data Yetian Chen
Employing EM and Pool-Based Active Learning for Text Classification Andrew McCallumKamal Nigam Just Research and Carnegie Mellon University.
Bug Localization with Machine Learning Techniques Wujie Zheng
A Weakly-Supervised Approach to Argumentative Zoning of Scientific Documents Yufan Guo Anna Korhonen Thierry Poibeau 1 Review By: Pranjal Singh Paper.
Xiangnan Kong,Philip S. Yu Department of Computer Science University of Illinois at Chicago KDD 2010.
Xiangnan Kong,Philip S. Yu Multi-Label Feature Selection for Graph Classification Department of Computer Science University of Illinois at Chicago.
Semi-Supervised Time Series Classification & DTW-D REPORTED BY WANG YAWEN.
Data Reduction via Instance Selection Chapter 1. Background KDD  Nontrivial process of identifying valid, novel, potentially useful, and ultimately understandable.
Randomization in Privacy Preserving Data Mining Agrawal, R., and Srikant, R. Privacy-Preserving Data Mining, ACM SIGMOD’00 the following slides include.
Paired Sampling in Density-Sensitive Active Learning Pinar Donmez joint work with Jaime G. Carbonell Language Technologies Institute School of Computer.
Semi-Supervised Time Series Classification Li Wei Eamonn Keogh University of California, Riverside {wli,
DTW-D: Time Series Semi-Supervised Learning from a Single Example Yanping Chen 1.
Active learning Haidong Shi, Nanyi Zeng Nov,12,2008.
Detecting New a Priori Probabilities of Data Using Supervised Learning Karpov Nikolay Associate professor NRU Higher School of Economics.
Dimensionality Reduction in Unsupervised Learning of Conditional Gaussian Networks Authors: Pegna, J.M., Lozano, J.A., Larragnaga, P., and Inza, I. In.
Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.
Eick: kNN kNN: A Non-parametric Classification and Prediction Technique Goals of this set of transparencies: 1.Introduce kNN---a popular non-parameric.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and.
Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte A Bag-of-Features Framework for Time Series Classification.
Hierarchical Sampling for Active Learning Sanjoy Dasgupta and Daniel Hsu University of California, San Diego Session : Active Learning and Experimental.
Feature learning for multivariate time series classification Mustafa Gokce Baydogan * George Runger * Eugene Tuv † * Arizona State University † Intel Corporation.
Sparse Coding: A Deep Learning using Unlabeled Data for High - Level Representation Dr.G.M.Nasira R. Vidya R. P. Jaia Priyankka.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Experience Report: System Log Analysis for Anomaly Detection
Semi-Supervised Learning Using Label Mean
Queensland University of Technology
Semi-Supervised Clustering
Privacy-Preserving Data Mining
k-Nearest neighbors and decision tree
Sofus A. Macskassy Fetch Technologies
Semi-supervised Machine Learning Gergana Lazarova
Prepared by: Mahmoud Rafeek Al-Farra
Constrained Clustering -Semi Supervised Clustering-
Supervised Time Series Pattern Discovery through Local Importance
Outlier Processing via L1-Principal Subspaces
Machine Learning Basics
COMBINED UNSUPERVISED AND SEMI-SUPERVISED LEARNING FOR DATA CLASSIFICATION Fabricio Aparecido Breve, Daniel Carlos Guimarães Pedronette State University.
Outline Nonlinear Dimension Reduction Brief introduction Isomap LLE
K Nearest Neighbor Classification
Learning with information of features
Nearest-Neighbor Classifiers
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
iSRD Spam Review Detection with Imbalanced Data Distributions
Special Topics in Text Mining
Text Categorization Berlin Chen 2003 Reference:
Decision Trees for Mining Data Streams
Using Manifold Structure for Partially Labeled Classification
CSE572: Data Mining by H. Liu
Xiao-Yu Zhang, Shupeng Wang, Xiaochun Yun
CAMCOS Report Day December 9th, 2015 San Jose State University
Presentation transcript:

Semi-Supervised Time Series Classification Mojdeh Jalali Heravi

Introduction Time series are of interest to many communities: Medicine Aerospace Finance Business Meteology Entertainment ….

Large amount of labeled training data Introduction Current methods for time series classification: Large amount of labeled training data Difficult or expensive to collect Time Expertise

Copious amounts of Unlabeled data are available Introduction On the other hand … Copious amounts of Unlabeled data are available For example: PhysioBank archive More than 40 GBs of ECG Freely available In hospitals there are even more! Semi-Supervised classification  takes advantage of large collections of Unlabeled data

The paper… Li Wei, Eamonn Keogh, Semi-Supervised time series classification, In Proc. of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2006

Outline Applications Value of unlabeled data Semi-supervise learning Time series classification Semi-supervised time series classification Empirical Evaluation

Applications Indexing of handwritten documents and are interested in making large archives of handwritten text searchable. For indexing first the words should be classified. Treating the words as time series is an competitive approach.

A sample of text written by Applications a classifier for George Washington will not generalize to Isaac Newton Obtaining labeled data for each word is expensive Having few training examples and using semi-supervised approach would be great! A sample of text written by George Washington

Applications Heartbeat Classification PhysioBank More than 40 GBs of freely available medical data A potential goldmine for a researcher Again, Having few training examples and using semi-supervised approach would be great!

Outline Applications Value of unlabeled data Semi-supervise learning Time series classification Semi-supervised time series classification Empirical Evaluation

Value of unlabeled data

Value of unlabeled data

Outline Applications Value of unlabeled data Semi-supervise learning Time series classification Semi-supervised time series classification Empirical Evaluation

Semi-supervised Learning Classification  supervised learning Clustering  unsupervised learning Learning from both labeled and unlabeled data is called semi-supervised learning Less human effort Higher accuracy

Semi-supervised Learning Five classes of SSL: Generative models the oldest methods Assumption: the data are drawn from a mixture distribution that can be identified by large amount of unlabeled data. Knowledge of the structure of the data can be naturally incorporate into the model There has been no discussion of the mixture distribution assumption for time series data so far

Semi-supervised Learning Five classes of SSL: 2. Low density separation approaches “ The decision boundary should lie in a low density region”  pushes the decision boundary away from the unlabeled data To achieve this goal  maximization algorithms (e.g. TSVM) “(abnormal time series) do not necessarily live in sparse areas of n-dimensional space” and “repeated patterns do not necessarily live in dense parts”. Keogh et. al. [1]

Semi-supervised Learning Five classes of SSL: 3. Graph-based semi-supervised learning “the (high-dimensional) data lie (roughly) on a low-dimensional manifold” Data  nodes distance between the nodes  edges Graph mincut [2], Tikhonov Regularization [3], Manifold Regularization [4] The graph encodes prior knowledge  its construction needs to be hand crafted for each domain. But we are looking for a general semi-supervised classification framework

Semi-supervised Learning Five classes of SSL: 4. Co-training Features  2 disjoint sets assumption: features are independent each set is sufficient to train a good classifier Two classifiers  on each feature subset The predictions of one classifier are used to enlarge the training set of the other. shape color Time series have very high feature correlation

Semi-supervised Learning Five classes of SSL: 5. Self-training Train  small amount of labeled data Classify  unlabeled data Adds the most confidently classified examples + their labels into the training set This procedure repeats  classifier refines gradually The classifier is using its own predictions to teach itself  it’s general with few assumptions

Outline Applications Value of unlabeled data Semi-supervise learning Time series classification Semi-supervised time series classification Empirical Evaluation

Time Series Definition 1. Time Series: A time series T = t1,…,tm is an ordered set of m real-valued variables. Long time series Short time series  subsequences of long time series Definition 2. Euclidean Distance:

Time Series Classification Positive class Some structure positive labeled examples are rare, but unlabeled data is abundant. Small number of ways to be in class Negative class Little or no common structure essentially infinite number of ways to be in this class We focus on binary time series classifiers

Outline Applications Value of unlabeled data Semi-supervise learning Time series classification Semi-supervised time series classification Empirical Evaluation

Semi-supervised Time Series Classification 1 nearest neighbor with Euclidian distance On Control-Chart Dataset

Semi-supervised Time Series Classification Training the classifier (example)

Semi-supervised Time Series Classification Training the classifier (algorithm) P positively labeled examples U unlabeled examples

Semi-supervised Time Series Classification Stopping criterion (example)

Semi-supervised Time Series Classification Stopping criterion

Semi-supervised Time Series Classification Using the classifier For each instance to be classified, check whether its nearest neighbor in the training set is labeled or not the training set is huge Comparing each instance in the testing set to each example in the training set is untenable in practice.

Semi-supervised Time Series Classification Using the classifier a modification on the classification scheme of the 1NN classifier using only the labeled positive examples in the training set To classify: within r distance to any of the labeled positive examples positive otherwise  negative. r  the average distance from a positive example to its nearest neighbor

Outline Applications Value of unlabeled data Semi-supervise learning Time series classification Semi-supervised time series classification Empirical Evaluation

Empirical Evaluation Semi-supervised approach Compared to: Naïve KNN approach K nearest neighbor of positive example  positive Others  negative Find the best k

Empirical Evaluation Performance class distribution is skewed  accuracy is not good 96% negative 4% positive if simply classify everything as negative accuracy = 96% Precision-recall breakeven point Precision = recall

Empirical Evaluation Stopping heuristic Test and training sets Different from what was described before Keep training until it achieves the highest precision-recall + few more iterations Test and training sets For more experiments  distinct For small datasets  same still non-trivial  most data in training dataset are unlabeled

ECG Dataset ECG dataset form MIT-BIH arrhythmia Database # of initial positive examples = 10 Run 200 times Blue line  average Gray lines  1 SD intervals P-R approach 94.97% Semi-supervised 81.29% KNN (k = 312)

Word Spotting Dataset Handwritten documents # of initial positive examples = 10 Run 25 times Blue line  average Gray lines  1 SD intervals P-R approach 86.2% Semi-supervised 79.52% KNN (k = 109)

Word Spotting Dataset distance from positive class  rank  probability to be in positive class

Gun Dataset 2D time series extracted from video Class A: Actor 1 with gun Class B: Actor 1 without gun Class C: Actor 2 with gun Class D: Actor 2 without gun # of initial positive examples = 1 Run 27 times P-R approach 65.19% Semi-supervised 55.93% KNN (k = 27)

Wafer Dataset # of initial positive examples = 1 a collection of time series containing a sequence of measurements recorded by one vacuum-chamber sensor during the etch process of silicon wafers for semiconductor fabrication # of initial positive examples = 1 P-R approach 73.17% Semi-supervised 46.87% KNN (k = 381)

Yoga Dataset # of initial positive examples = 1 P-R approach 89.04% Semi-supervised 82.95% KNN (k = 156)

Conclusion An accurate semi-supervised learning framework for time series classification with small set of labeled examples Reduction in # of training labeled examples needed  dramatic

References [1] Keogh, E., Lin, J., & Fu, A. (2005). HOT SAX: Efficient finding the most unusual time series subsequence. In proceedings of the 5th IEEE International Conference on Data Mining (ICDM 2005), pp. 226-233, 2005. [2]Blum, A. & Chawla, S. (2001). Learning from labeled and unlabeled data using graph mincuts. In proceedings of 18th International Conference on Machine Learning, 2001. [3]Belkin, M., Matveeva, I., & Niyogi, P. (2004). Regularization and semi-supervised learning on large graphs. COLT, 2004. [4] Belkin, M., Niyogi, P., & Sindhwani, V. (2004). Manifold regularization: a geometric framework for learning from examples. Technical Report TR-2004-06, University of Chicago.

Thanks … “Thanks for your patience” any question?