Using CTW as a language modeler in Dasher Martijn van Veen 05-02-2007 Signal Processing Group Department of Electrical Engineering Eindhoven University.

Slides:

Advertisements

Similar presentations

Introduction to Computer Science 2 Lecture 7: Extended binary trees

Advertisements

Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna

Arithmetic Coding. Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a How we can do better than Huffman? - I As we have seen, the.

(speaker) Fedor Groshev Vladimir Potapov Victor Zyablov IITP RAS, Moscow.

Reinforcement Learning

CENG536 Computer Engineering Department Çankaya University.

Procedures of Extending the Alphabet for the PPM Algorithm Radu Rădescu George Liculescu Polytechnic University of Bucharest Faculty of Electronics, Telecommunications.

Lecture 11: Recursive Parameter Estimation

HMM-BASED PATTERN DETECTION. Outline  Markov Process  Hidden Markov Models Elements Basic Problems Evaluation Optimization Training Implementation 2-D.

Lecture 6: Huffman Code Thinh Nguyen Oregon State University.

2015/6/15VLC 2006 PART 1 Introduction on Video Coding StandardsVLC 2006 PART 1 Variable Length Coding  Information entropy  Huffman code vs. arithmetic.

Spatial and Temporal Data Mining

Text Operations: Coding / Compression Methods. Text Compression Motivation –finding ways to represent the text in fewer bits –reducing costs associated.

1 Lab Session-IV CSIT-120 Spring 2001 Lab 3 Revision and Exercises Rev: Precedence Rules Lab Exercise 4-A Machine Language Programming The “Micro” Machine.

UNIVERSITY OF MASSACHUSETTS Dept

Using CTW as a language modeler in Dasher Phil Cowans, Martijn van Veen Inference Group Department of Physics University of Cambridge.

Accurate Method for Fast Design of Diagnostic Oligonucleotide Probe Sets for DNA Microarrays Nazif Cihan Tas CMSC 838 Presentation.

A Probabilistic Method to Determine the Minimum Leakage Vector for Combinational Designs Kanupriya Gulati Nikhil Jayakumar Sunil P. Khatri Department of.

Bell Work: If 479 stocks advanced in price and 326 declined in price, then what was the approximate ratio of advancers to decliners?

2015/7/12VLC 2008 PART 1 Introduction on Video Coding StandardsVLC 2008 PART 1 Variable Length Coding  Information entropy  Huffman code vs. arithmetic.

1 Lab Session-III CSIT-120 Spring 2001 Revising Previous session Data input and output While loop Exercise Limits and Bounds GOTO SLIDE 13 Lab session.

Source Coding Hafiz Malik Dept. of Electrical & Computer Engineering The University of Michigan-Dearborn

Normalised Least Mean-Square Adaptive Filtering

Walter Hop Web-shop Order Prediction Using Machine Learning Master’s Thesis Computational Economics.

Advanced Methods of Prediction Motti Sorani Boaz Cohen Supervisor: Gady Zohar Technion - Israeli Institute of Technology Department of Electrical Engineering.

By Ravi Shankar Dubasi Sivani Kavuri A Popularity-Based Prediction Model for Web Prefetching.

15-853Page :Algorithms in the Real World Data Compression II Arithmetic Coding – Integer implementation Applications of Probability Coding – Run.

1 Lab Session-III CSIT-120 Fall 2000 Revising Previous session Data input and output While loop Exercise Limits and Bounds Session III-B (starts on slide.

296.3Page 1 CPS 296.3:Algorithms in the Real World Data Compression: Lecture 2.5.

Logistic Regression Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata September 1, 2014.

1 0-1 Knapsack problem Dr. Ying Lu RAIK 283 Data Structures & Algorithms.

Automated Reassembly of Document Fragments DFRWS 2002.

Performance Prediction for Random Write Reductions: A Case Study in Modelling Shared Memory Programs Ruoming Jin Gagan Agrawal Department of Computer and.

Quick and Easy Binary to dB Conversion George Weistroffer, Jeremy Cooper, and Jerry Tucker Electrical and Computer Engineering Virginia Commonwealth University.

Prof. Amr Goneid, AUC1 Analysis & Design of Algorithms (CSCE 321) Prof. Amr Goneid Department of Computer Science, AUC Part 8. Greedy Algorithms.

LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.

Greedy Algorithms for the Shortest Common Superstring Overview by Anton Nesterov Saint Petersburg State University Russia Original paper by A. Frieze,

Release Progress Report Daniel May Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering min XMLABNF.

1 Blind Channel Identification and Equalization in Dense Wireless Sensor Networks with Distributed Transmissions Xiaohua (Edward) Li Department of Electrical.

A Baseline System for Speaker Recognition C. Mokbel, H. Greige, R. Zantout, H. Abi Akl A. Ghaoui, J. Chalhoub, R. Bayeh University Of Balamand - ELISA.

Recent Results in Combined Coding for Word-Based PPM Radu Rădescu George Liculescu Polytechnic University of Bucharest Faculty of Electronics, Telecommunications.

Dr. Sudharman K. Jayaweera and Amila Kariyapperuma ECE Department University of New Mexico Ankur Sharma Department of ECE Indian Institute of Technology,

TCP-Cognizant Adaptive Forward Error Correction in Wireless Networks

Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏

Finding frequent and interesting triples in text Janez Brank, Dunja Mladenić, Marko Grobelnik Jožef Stefan Institute, Ljubljana, Slovenia.

Bundle Adjustment A Modern Synthesis Bill Triggs, Philip McLauchlan, Richard Hartley and Andrew Fitzgibbon Presentation by Marios Xanthidis 5 th of No.

Intelligent Key Prediction by N-grams and Error-correction Rules Kanokwut Thanadkran, Virach Sornlertlamvanich and Tanapong Potipiti Information Research.

Protein Family Classification using Sparse Markov Transducers Proceedings of Eighth International Conference on Intelligent Systems for Molecular Biology.

Copyright 2008 Koren ECE666/Koren Part.7b.1 Israel Koren Spring 2008 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Digital Computer.

Sporadic model building for efficiency enhancement of the hierarchical BOA Genetic Programming and Evolvable Machines (2008) 9: Martin Pelikan, Kumara.

Sequential Logic Flip-Flop Circuits By Dylan Smeder.

STROUD Worked examples and exercises are in the text Programme 10: Sequences PROGRAMME 10 SEQUENCES.

Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015.

Mismatch String Kernals for SVM Protein Classification Christina Leslie, Eleazar Eskin, Jason Weston, William Stafford Noble Presented by Pradeep Anand.

Lampel ZIV (LZ) code The Lempel-Ziv algorithm is a variable-to-fixed length code Basically, there are two versions of the algorithm LZ77 and LZ78 are the.

Deep Learning and Deep Reinforcement Learning. Topics 1.Deep learning with convolutional neural networks 2.Learning to play Atari video games with Deep.

Department of Electrical Engineering, Southern Taiwan University 1 Robotic Interaction Learning Lab The ant colony algorithm In short, domain is defined.

Lossless Compression-Statistical Model Lossless Compression One important to note about entropy is that, unlike the thermodynamic measure of entropy,

Number Systems. The position of each digit in a weighted number system is assigned a weight based on the base or radix of the system. The radix of decimal.

Project on Newton’s Iteration Method Presented by Dol Nath Khanal Project Advisor- Professor Dexuan Xie 05/11/2015.

The Greedy Method and Text Compression

13 Text Processing Hongfei Yan June 1, 2016.

Context-based Data Compression

Analysis & Design of Algorithms (CSCE 321)

On Spatial Joins in MapReduce

Chapter 11 Data Compression

Alan Kuhnle*, Victoria G. Crawford, and My T. Thai

An Algorithm for Compression of Bilevel Images

Huffman Coding Greedy Algorithm

Computational issues Issues Solutions Large time scale

Presentation transcript:

Using CTW as a language modeler in Dasher Martijn van Veen Signal Processing Group Department of Electrical Engineering Eindhoven University of Technology

2/21 Overview What is Dasher –And what is a language model What is CTW –And how to implement it in Dasher Decreasing the model costs Conclusions and future work

3/21 Dasher Text input method Continuous gestures Language model Let’s give it a try! Dasher

4/21 Dasher: Language Model Conditional probability for each alphabet symbol, given the previous symbols Similar to compression methods Requirements: –Sequential –Fast –Adaptive Model is trained Better compression -> faster text input

5/21 Dasher: Language model PPM: Prediction by Partial Match Predictions by models of different order Weight factor for each model

6/21 Dasher: Language model Asymptotically PPM reduces to fixed order context model But the incomplete model works better!

7/21 CTW: Tree model Source structure in the model, parameters memoryless KT estimator: a = number of zeros b = number of ones

8/21 CTW: Context tree Context-Tree Weighting: combine all possible tree models up to a maximum depth

9/21 CTW: tree update

10/21 CTW: Implementation Current implementation –Ratio of block probabilities stored in each node –Efficient but patented Develop a new implementation –Use only integer arithmetic, avoid divisions –Represent both block probabilities as fractions –Ensure denominators equal by cross-multiplication –Store the numerators, scale if necessary

11/21 CTW for Text Binary decomposition Adjust zero-order estimator

12/21 Results Comparing PPM and CTW language models –Single file –Model trained with English text –Model trained with English text and user input Input fileCTWPPMDifference Book % NL % Input fileCTWPPMDifference GB % Book % Book % Input fileCTWPPMDifference Book % NL %

13/21 CTW: Model costs What are model costs?

14/21 CTW: Model costs Actual model and alphabet size fixed -> Optimize weight factor alpha –Per tree -> not enough parameters –Per node -> not enough adaptivity –Optimize alpha per depth of the tree

15/21 CTW: Model costs Exclusion: only use Betas of the actual model Iterative process –Convergent? Approximation: To find actual model use Alpha = 0.5

16/21 CTW: Model costs Compression of an input sequence –Model costs significant, especially for short sequence –No decrease by optimizing alpha per depth? SymbolsAlpha 0.5 Alpha after exclusion Without model costs

17/21 CTW: Model costs Symbols Alpha 0.5 Alpha after exclusion Max. probability in root Without model costs Maximize probability in the root, instead of the probability per depth –Exclusion based on alpha = 0.5 almost optimal

18/21 CTW: Model costs LanguageAlpha 0.5Alpha after exclusion GB NL Results in Dasher scenario: Trained model –Negative effect if no user text is available Trained with concatenated user text –Small positive effect if user text added to training text, and very similar to it LanguageAlpha 0.5Alpha after exclusion GB NL

19/21 Conclusions New CTW Implementation –Only integer arithmetic –Avoids patented techniques –New decomposition tree structure Dasher language model based on CTW –6 percent more are accurate predictions than PPM-D Decreasing the model costs –Only insignificant decrease possible with our method

20/21 Future work Make CTW suitable for MobileDasher –Decrease memory usage –Decrease number of computations Combine language models –Select locally best model, or weight models together Combine languages in 1 model –Models differ in structure or in parameters?

21/21 Thank you for your attention Ask away!

22/21 CTW: Implementation Store the numerators of the block probabilities