Introduction to String Kernels Blaz Fortuna JSI, Slovenija.

Introduction to String Kernels Blaz Fortuna JSI, Slovenija

What is a Kernel? Inner-product Similarity between documents Documents mapped into some higher dimensional feature space

Why to use Kernels? Mapped documents are not explicitly calculated Linear algorithms can be applied on mapped documents Input documents can be anything (not necessary vectors)!

Algorithms using Kernels Support Vector Machine (classification, regression, …) Kernel Principal Component Analysis Kernel Canonical Correlation Analysis Nearest Neighbour …

Representation of text Vector-space model (bag of words) Most commonly used Each document is encoded as a feature vector with with word frequencies as elements IDF weighting, normalized Similarity is inner-product (cosine distance) Can be viewed as a kernel

Basic Idea of String Kernels Words -> Substrings Each document is encoded as a feature vector with substring frequencies as elements More contiguous substrings receive higher weighting (trough < 1) caarcrbabrapcp car 2 2 3 0000 bar0 2 0 2 3 00 cap 2 0000 2 3

Kernel Trick Computation of feature vectors is very expensive For algorithms that use kernels only inner-product is needed This can be efficiently computed without explicit use of feature vectors (dynamic programming)

Advantage of String Kernel Detection of words with different suffixes or prefixes Example: ‘microcomputer’ ‘computers’ ‘computerbased’

Extensions 1/2 Use of syllables or words Documents are viewed as a sequence of syllables or words instead of characters Reduces length of documents Syllables still eliminate need for stammer Convex Combinations of Kernels Use of substrings with different lengths No extra computational cost

Extensions 2/2 Different weighing for symbols Introduction of weighting similar to IDF Low computational cost Soft-Matching Similar symbols are matched Use of WordNet for matching synonyms Computational cost comes from matching

Speed performance String kernel is much slower and memory consuming than BOW text representation DP implementation is O(n|s||t|) n – length of substring |s|, |t| – length of documents s and t Memory consumption is O(|s||t|)

How to be Faster TRIE – only count more contiguous substrings Dimension reduction – documents are projected into subspace spanned by most frequent continuous substrings Incomplete Cholesky Decomposition – approximation of kernel matrix

Experiments Subset of Reuters 21578 dataset Bow vs. String kernel 300 train + 700 test 600 train + 400 test Approximation techniques

Bow vs. String kernel CE*F1NSV*CE*F1NSV*Time String Kernel 1587184397305208 Syllable Kernel 128921849737929 Word Kernel 18851573972819 BOW – TF only 18841503972871/6 BOW – TFIDF 8932524974431/6 CE – Classification error, NSV – number of support vectors

Approximations Prec [%]Rec [%]Time [sec] TFIDF959724 DR (1500)879024 DR (2500)879148 DR (3500)899164 ICD (200)869249 ICD (450)8892114 ICD (750)9094244

Introduction to String Kernels Blaz Fortuna JSI, Slovenija.

Similar presentations

Presentation on theme: "Introduction to String Kernels Blaz Fortuna JSI, Slovenija."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Introduction to String Kernels Blaz Fortuna JSI, Slovenija.

Similar presentations

Presentation on theme: "Introduction to String Kernels Blaz Fortuna JSI, Slovenija."— Presentation transcript:

Similar presentations

About project

Feedback