Introduction to String Kernels Blaz Fortuna JSI, Slovenija.

Slides:



Advertisements
Similar presentations
Text mining Gergely Kótyuk Laboratory of Cryptography and System Security (CrySyS) Budapest University of Technology and Economics
Advertisements

1 Classification using instance-based learning. 3 March, 2000Advanced Knowledge Management2 Introduction (lazy vs. eager learning) Notion of similarity.
EigenFaces and EigenPatches Useful model of variation in a region –Region must be fixed shape (eg rectangle) Developed for face recognition Generalised.
Aggregating local image descriptors into compact codes
Relevant characteristics extraction from semantically unstructured data PhD title : Data mining in unstructured data Daniel I. MORARIU, MSc PhD Supervisor:
Face Recognition and Biometric Systems Eigenfaces (2)
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
DIMENSIONALITY REDUCTION: FEATURE EXTRACTION & FEATURE SELECTION Principle Component Analysis.
1 Welcome to the Kernel-Class My name: Max (Welling) Book: There will be class-notes/slides. Homework: reading material, some exercises, some MATLAB implementations.
SVM—Support Vector Machines
Pattern Recognition and Machine Learning: Kernel Methods.
1 Machine Learning: Lecture 7 Instance-Based Learning (IBL) (Based on Chapter 8 of Mitchell T.., Machine Learning, 1997)
The Disputed Federalist Papers : SVM Feature Selection via Concave Minimization Glenn Fung and Olvi L. Mangasarian CSNA 2002 June 13-16, 2002 Madison,
A New Suffix Tree Similarity Measure for Document Clustering Hung Chim, Xiaotie Deng City University of Hong Kong WWW 2007 Session: Similarity Search April.
From last time What’s the real point of using vector spaces?: A user’s query can be viewed as a (very) short document. Query becomes a vector in the same.
DIMENSIONALITY REDUCTION BY RANDOM PROJECTION AND LATENT SEMANTIC INDEXING Jessica Lin and Dimitrios Gunopulos Ângelo Cardoso IST/UTL December
K nearest neighbor and Rocchio algorithm
Bioinformatics Challenge  Learning in very high dimensions with very few samples  Acute leukemia dataset: 7129 # of gene vs. 72 samples  Colon cancer.
The Implicit Mapping into Feature Space. In order to learn non-linear relations with a linear machine, we need to select a set of non- linear features.
Latent Semantic Analysis (LSA). Introduction to LSA Learning Model Uses Singular Value Decomposition (SVD) to simulate human learning of word and passage.
Representation of hypertext documents based on terms, links and text compressibility Julian Szymański Department of Computer Systems Architecture, Gdańsk.
1 Text Categorization  Assigning documents to a fixed set of categories  Applications:  Web pages  Recommending pages  Yahoo-like classification hierarchies.
Overview of Kernel Methods Prof. Bennett Math Model of Learning and Discovery 2/27/05 Based on Chapter 2 of Shawe-Taylor and Cristianini.
Presented By Wanchen Lu 2/25/2013
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Automated Essay Grading Resources: Introduction to Information Retrieval, Manning, Raghavan, Schutze (Chapter 06 and 18) Automated Essay Scoring with e-rater.
AN OPTIMIZED APPROACH FOR kNN TEXT CATEGORIZATION USING P-TREES Imad Rahal and William Perrizo Computer Science Department North Dakota State University.
The use of machine translation tools for cross-lingual text-mining Blaz Fortuna Jozef Stefan Institute, Ljubljana John Shawe-Taylor Southampton University.
1 A study on automatically extracted keywords in text categorization Authors:Anette Hulth and Be´ata B. Megyesi From:ACL 2006 Reporter: 陳永祥 Date:2007/10/16.
1 Comparison of Principal Component Analysis and Random Projection in Text Mining Steve Vincent April 29, 2004 INFS 795 Dr. Domeniconi.
Support Vector Machine (SVM) Based on Nello Cristianini presentation
AN ORTHOGONAL PROJECTION
Project 1: Machine Learning Using Neural Networks Ver 1.1.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Extracting meaningful labels for WEBSOM text archives Advisor.
Classification Course web page: vision.cis.udel.edu/~cv May 12, 2003  Lecture 33.
Kernels Usman Roshan CS 675 Machine Learning. Feature space representation Consider two classes shown below Data cannot be separated by a hyperplane.
Protein Classification Using Averaged Perceptron SVM
Powerpoint Templates Page 1 Powerpoint Templates Scalable Text Classification with Sparse Generative Modeling Antti PuurulaWaikato University.
Gang WangDerek HoiemDavid Forsyth. INTRODUCTION APROACH (implement detail) EXPERIMENTS CONCLUSION.
Project 11: Determining the Intrinsic Dimensionality of a Distribution Okke Formsma, Nicolas Roussis and Per Løwenborg.
Principal Component Analysis Machine Learning. Last Time Expectation Maximization in Graphical Models – Baum Welch.
Sets of Digital Data CSCI 2720 Fall 2005 Kraemer.
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
Support vector machine LING 572 Fei Xia Week 8: 2/23/2010 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A 1.
CS Statistical Machine learning Lecture 12 Yuan (Alan) Qi Purdue CS Oct
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Mismatch String Kernals for SVM Protein Classification Christina Leslie, Eleazar Eskin, Jason Weston, William Stafford Noble Presented by Pradeep Anand.
Out of sample extension of PCA, Kernel PCA, and MDS WILSON A. FLORERO-SALINAS DAN LI MATH 285, FALL
1 Text Categorization  Assigning documents to a fixed set of categories  Applications:  Web pages  Recommending pages  Yahoo-like classification hierarchies.
String Kernels on Slovenian documents Blaž Fortuna Dunja Mladenić Marko Grobelnik.
IR 6 Scoring, term weighting and the vector space model.
Non-separable SVM's, and non-linear classification using kernels Jakob Verbeek December 16, 2011 Course website:
Data Science Dimensionality Reduction WFH: Section 7.3 Rodney Nielsen Many of these slides were adapted from: I. H. Witten, E. Frank and M. A. Hall.
Dimensionality Reduction and Principle Components Analysis
Tries 07/28/16 11:04 Text Compression
Clustering of Web pages
IMAGE PROCESSING RECOGNITION AND CLASSIFICATION
Instance Based Learning
Kernels Usman Roshan.
Object Modeling with Layers
Face Recognition and Detection Using Eigenfaces
Project 1: Text Classification by Neural Networks
Text Categorization Assigning documents to a fixed set of categories
Chap 8. Instance Based Learning
Usman Roshan CS 675 Machine Learning
Machine Learning: UNIT-4 CHAPTER-1
Kazuki Yokoi1 Eunjong Choi2 Norihiro Yoshida3 Katsuro Inoue1
Machine Learning – a Probabilistic Perspective
Lecture 16. Classification (II): Practical Considerations
Presentation transcript:

Introduction to String Kernels Blaz Fortuna JSI, Slovenija

What is a Kernel? Inner-product Similarity between documents Documents mapped into some higher dimensional feature space

Why to use Kernels? Mapped documents are not explicitly calculated Linear algorithms can be applied on mapped documents Input documents can be anything (not necessary vectors)!

Algorithms using Kernels Support Vector Machine (classification, regression, …) Kernel Principal Component Analysis Kernel Canonical Correlation Analysis Nearest Neighbour …

Representation of text Vector-space model (bag of words) Most commonly used Each document is encoded as a feature vector with with word frequencies as elements IDF weighting, normalized Similarity is inner-product (cosine distance) Can be viewed as a kernel

Basic Idea of String Kernels Words -> Substrings Each document is encoded as a feature vector with substring frequencies as elements More contiguous substrings receive higher weighting (trough < 1) caarcrbabrapcp car bar cap

Kernel Trick Computation of feature vectors is very expensive For algorithms that use kernels only inner-product is needed This can be efficiently computed without explicit use of feature vectors (dynamic programming)

Advantage of String Kernel Detection of words with different suffixes or prefixes Example: ‘microcomputer’ ‘computers’ ‘computerbased’

Extensions 1/2 Use of syllables or words Documents are viewed as a sequence of syllables or words instead of characters Reduces length of documents Syllables still eliminate need for stammer Convex Combinations of Kernels Use of substrings with different lengths No extra computational cost

Extensions 2/2 Different weighing for symbols Introduction of weighting similar to IDF Low computational cost Soft-Matching Similar symbols are matched Use of WordNet for matching synonyms Computational cost comes from matching

Speed performance String kernel is much slower and memory consuming than BOW text representation DP implementation is O(n|s||t|) n – length of substring |s|, |t| – length of documents s and t Memory consumption is O(|s||t|)

How to be Faster TRIE – only count more contiguous substrings Dimension reduction – documents are projected into subspace spanned by most frequent continuous substrings Incomplete Cholesky Decomposition – approximation of kernel matrix

Experiments Subset of Reuters dataset Bow vs. String kernel 300 train test 600 train test Approximation techniques

Bow vs. String kernel CE*F1NSV*CE*F1NSV*Time String Kernel Syllable Kernel Word Kernel BOW – TF only /6 BOW – TFIDF /6 CE – Classification error, NSV – number of support vectors

Approximations Prec [%]Rec [%]Time [sec] TFIDF DR (1500) DR (2500) DR (3500) ICD (200) ICD (450) ICD (750)