K.U. Leuven Leuven 2008-05-08 Morphological Normalization and Collocation Extraction Jan Šnajder, Bojana Dalbelo Bašić, Marko Tadić University of Zagreb.

Slides:



Advertisements
Similar presentations
10/01/2014 DMI - Università di Catania 1 Combinatorial Landscapes & Evolutionary Algorithms Prof. Giuseppe Nicosia University of Catania Department of.
Advertisements

CS6800 Advanced Theory of Computation
Exact and heuristics algorithms
Student : Mateja Saković 3015/2011.  Genetic algorithms are based on evolution and natural selection  Evolution is any change across successive generations.
Genetic Algorithms Contents 1. Basic Concepts 2. Algorithm
On the Genetic Evolution of a Perfect Tic-Tac-Toe Strategy
Biologically Inspired AI (mostly GAs). Some Examples of Biologically Inspired Computation Neural networks Evolutionary computation (e.g., genetic algorithms)
Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.
Non-Linear Problems General approach. Non-linear Optimization Many objective functions, tend to be non-linear. Design problems for which the objective.
1 Lecture 8: Genetic Algorithms Contents : Miming nature The steps of the algorithm –Coosing parents –Reproduction –Mutation Deeper in GA –Stochastic Universal.
Learning to Advertise. Introduction Advertising on the Internet = $$$ –Especially search advertising and web page advertising Problem: –Selecting ads.
Evolutionary Algorithms Simon M. Lucas. The basic idea Initialise a random population of individuals repeat { evaluate select vary (e.g. mutate or crossover)
Bruxelles, Computer Aided Document Indexing System (CADIS) with Eurovoc Bojana Dalbelo Bašić Faculty of Electrical Engineering and Computing.
Khaled Rasheed Computer Science Dept. University of Georgia
Image Registration of Very Large Images via Genetic Programming Sarit Chicotay Omid E. David Nathan S. Netanyahu CVPR ‘14 Workshop on Registration of Very.
Brandon Andrews.  What are genetic algorithms?  3 steps  Applications to Bioinformatics.
Christoph F. Eick: Applying EC to TSP(n) Example: Applying EC to the TSP Problem  Given: n cities including the cost of getting from on city to the other.
Genetic Programming.
1 Reasons for parallelization Can we make GA faster? One of the most promising choices is to use parallel implementations. The reasons for parallelization.
Leuven, Computer Aided Document Indexing System for Accessing Legislation A Joint Venture of Flanders and Croatia Bojana Dalbelo Bašić Faculty.
Genetic Algorithm.
Problems Premature Convergence Lack of genetic diversity Selection noise or variance Destructive effects of genetic operators Cloning Introns and Bloat.
K.U. Leuven Leuven Morphological Normalization and Collocation Extraction Jan Šnajder, Bojana Dalbelo Bašić, Marko Tadić University of Zagreb.
Using Genetic Programming to Learn Probability Distributions as Mutation Operators with Evolutionary Programming Libin Hong, John Woodward, Ender Ozcan,
SOFT COMPUTING (Optimization Techniques using GA) Dr. N.Uma Maheswari Professor/CSE PSNA CET.
Evolution Strategies Evolutionary Programming Genetic Programming Michael J. Watts
Improved Gene Expression Programming to Solve the Inverse Problem for Ordinary Differential Equations Kangshun Li Professor, Ph.D Professor, Ph.D College.
Lecture 8: 24/5/1435 Genetic Algorithms Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
Ch.12 Machine Learning Genetic Algorithm Dr. Bernard Chen Ph.D. University of Central Arkansas Spring 2011.
Zorica Stanimirović Faculty of Mathematics, University of Belgrade
What is Genetic Programming? Genetic programming is a model of programming which uses the ideas (and some of the terminology) of biological evolution to.
Dr.Abeer Mahmoud ARTIFICIAL INTELLIGENCE (CS 461D) Dr. Abeer Mahmoud Computer science Department Princess Nora University Faculty of Computer & Information.
Introduction to Evolutionary Algorithms Session 4 Jim Smith University of the West of England, UK May/June 2012.
GENETIC ALGORITHMS FOR THE UNSUPERVISED CLASSIFICATION OF SATELLITE IMAGES Ankush Khandelwal( ) Vaibhav Kedia( )
FF & FER INFuture2009: Digital Resources and Knowledge Sharing, 4-7 November 2009 Comparative Analysis of Automatic Term and Collocation Extraction Sanja.
Genetic Algorithms Siddhartha K. Shakya School of Computing. The Robert Gordon University Aberdeen, UK
Efficient Language Model Look-ahead Probabilities Generation Using Lower Order LM Look-ahead Information Langzhou Chen and K. K. Chin Toshiba Research.
Methods for Automatic Evaluation of Sentence Extract Summaries * G.Ravindra +, N.Balakrishnan +, K.R.Ramakrishnan * Supercomputer Education & Research.
Chapter 9 Genetic Algorithms.  Based upon biological evolution  Generate successor hypothesis based upon repeated mutations  Acts as a randomized parallel.
Introduction to Genetic Algorithms. Genetic Algorithms We’ve covered enough material that we can write programs that use genetic algorithms! –More advanced.
EE749 I ntroduction to Artificial I ntelligence Genetic Algorithms The Simple GA.
Machine Learning A Quick look Sources: Artificial Intelligence – Russell & Norvig Artifical Intelligence - Luger By: Héctor Muñoz-Avila.
Automated discovery in math Machine learning techniques (GP, ILP, etc.) have been successfully applied in science Machine learning techniques (GP, ILP,
Local Search. Systematic versus local search u Systematic search  Breadth-first, depth-first, IDDFS, A*, IDA*, etc  Keep one or more paths in memory.
Genetic Algorithms MITM613 (Intelligent Systems).
An evolutionary approach for improving the quality of automatic summaries Constantin Orasan Research Group in Computational Linguistics School of Humanities,
The Standard Genetic Algorithm Start with a “population” of “individuals” Rank these individuals according to their “fitness” Select pairs of individuals.
1 Contents 1. Basic Concepts 2. Algorithm 3. Practical considerations Genetic Algorithm (GA)
Agenda  INTRODUCTION  GENETIC ALGORITHMS  GENETIC ALGORITHMS FOR EXPLORING QUERY SPACE  SYSTEM ARCHITECTURE  THE EFFECT OF DIFFERENT MUTATION RATES.
Artificial Intelligence By Mr. Ejaz CIIT Sahiwal Evolutionary Computation.
Overview Last two weeks we looked at evolutionary algorithms.
Genetic Algorithms An Evolutionary Approach to Problem Solving.
An Evolutionary Algorithm for Neural Network Learning using Direct Encoding Paul Batchis Department of Computer Science Rutgers University.
Genetic Algorithm(GA)
Genetic Algorithm. Outline Motivation Genetic algorithms An illustrative example Hypothesis space search.
 Presented By: Abdul Aziz Ghazi  Roll No:  Presented to: Sir Harris.
Hirophysics.com The Genetic Algorithm vs. Simulated Annealing Charles Barnes PHY 327.
Genetic (Evolutionary) Algorithms CEE 6410 David Rosenberg “Natural Selection or the Survival of the Fittest.” -- Charles Darwin.
GP FOR ADAPTIVE MARKETS Jake Pacheco 6/11/2010. The Goal  Produce a system that can create novel quantitative trading strategies for the stock market.
1 Genetic Algorithms Contents 1. Basic Concepts 2. Algorithm 3. Practical considerations.
Genetic Algorithm (Knapsack Problem)
Evolutionary Algorithms Jim Whitehead
Evolution Strategies Evolutionary Programming
USING MICROBIAL GENETIC ALGORITHM TO SOLVE CARD SPLITTING PROBLEM.
Evolving the goal priorities of autonomous agents
CADIAL search engine at INEX
Example: Applying EC to the TSP Problem

Huffman Coding Greedy Algorithm
Beyond Classical Search
Presentation transcript:

K.U. Leuven Leuven Morphological Normalization and Collocation Extraction Jan Šnajder, Bojana Dalbelo Bašić, Marko Tadić University of Zagreb Faculty of Electrical Engineering and Computing / Faculty of Humanities and Social Sciences fer.hr, fer.hr, ffzg.hr Seminar at the K. U. Leuven, Department of Computing Science Leuven

K.U. Leuven Leuven Collocation Extraction Using Genetic Programming Bojana Dalbelo Bašić University of Zagreb Faculty of Electrical Engineering and Computing fer.hr, Seminar at the K. U. Leuven, Department of Computing Science Leuven

K.U. Leuven Leuven Outine  Collocations  Genetic programming  Results  Conclusion

K.U. Leuven Leuven  (Manning and Schütze 1999) “...an expression consisting of two or more words that correspond to some conventional way of saying things.”  Many different deffinitions...  An uninterrupted sequence of words that generally functions as a single constituent in a sentence (e.g., stock market, Republic of Croatia). Collocation

K.U. Leuven Leuven Collocation Applications:  improving indexing in information retrieval (Vechtomova, Robertson, and Jones 2003)  automatic language generation (Smadja and McKeown 1990)  word sense disambiguation (Wu and Chang 2004),  terminology extraction (Goldman and Wehrli 2001)  improving text categorization systems (Scott and Matwin 1999)

K.U. Leuven Leuven More general term - n-gram of words – any sequence of n words (digram, trigram, tetragram) Collocation extraction is usually done by assigning each candidate n-gram a value indicating how strongly the words within the n-gram are associated with each other. Collocation

K.U. Leuven Leuven Association measures More general term - n-gram of words – any sequence of n words (digram, trigram, tetragram) Collocation extraction is usually done by assigning each candidate n-gram a value indicating how strongly the words within the n-gram are associated with each other. Collocation extraction

K.U. Leuven Leuven Association measures Examples:  MI (Mutual Information):  DICE coefficient:

K.U. Leuven Leuven Based on hypothesis testing:   2 :  log-likelihood: Association measures

K.U. Leuven Leuven Collocation extraction Example: digramAssoc.measure value stock market20.1 machine learning30.7 town Slavonski10.0 New York25.2 big dog7.2 new house7.4 White house16.2

K.U. Leuven Leuven Collocation extraction Example: digramAssoc.measure value machine learning30.7 New York25.2 stock market20.1 White house16.2 town Slavonski10.0 new house7.4 big dog7.2

K.U. Leuven Leuven Association measures extensions Extensions:

K.U. Leuven Leuven Evaluation of AMs  Needed: sample of collocations and non-collocations  F 1 measure:

K.U. Leuven Leuven Our approach based on genetic programming  Similar to genetic algorithm  Population  Selection  Fittness function  Crossover  Mutation  GP: Evolution of programs in the forms of trees

K.U. Leuven Leuven Fitness function  Used to evaluate the “goodness” of a particular solution  Solution with a higher fitness function has a higher probability that it will be copied into the next generation during the selection process

K.U. Leuven Leuven Crossover  Combines two (parent) solutions into a new (child) solution  Used to exchange genetic material (in our case – portions of code) between solutions

K.U. Leuven Leuven Mutation  Introduces new genetic material into a population by randomly changing a solution  It can help adding genetic material that was not initially in the population (consider a population where no solution contains the addition operator – this operator can never be “invented” solely through crossover)  Also used to randomly “jump” through the space of possible solutions, thereby avoiding local maxima

K.U. Leuven Leuven Genetic programming  Idea – evolution of association measures  Fitness function – F 1

K.U. Leuven Leuven Genetic programming  Idea – evolution of association measures  Fitness function – F 1  Specifics:  Parsimony pressure  Stopping conditions – maximal generalisations  Inclusion of known AMs in the initial population

K.U. Leuven Leuven Nodes and leaves OperatorsOperands +, -const *, /f(.) ln(|x|)N IF(cond, a, b) POS(W)

K.U. Leuven Leuven Examples MI:DICE coefficient:

K.U. Leuven Leuven One solution Heuristics H:

K.U. Leuven Leuven Recombination (crossover) parents children  Exchange of subtrees

K.U. Leuven Leuven Mutation Node insertion: Node removal:

K.U. Leuven Leuven Experiment  Collection of 7008 legislative documents  Trigram extraction – 1.6 million  Two samples of classified trigrams:  Each sample 100 positive negative examples

K.U. Leuven Leuven Generalisation Stopping conditions – maximal generalisations maximal generalisations

K.U. Leuven Leuven Experimental settings  We used three-tounament selection  We varied the following parameters:  probability of mutation [0.0001, 0.3]  parsimony factor [0, 0.5]  maximum number of nodes [20, 1000]  number of iterations before stopping [10 4, 10 7 ]  In total, 800 runs of the algorithm (with different combinations of mentioned parameters)

K.U. Leuven Leuven Results  About 20% of evolved AMs reach F 1 over 80% Figure shows F1 score and number of nodes

K.U. Leuven Leuven Results

K.U. Leuven Leuven Results Interpretation of evolved measures in not easy (M205): f(abc) f(a) f(c) * / f(abc) f(ab) f(c) - f(c) f(bc) f(b) -f(abc) + / + / N * f(b) + * ln f(c) f(b) * * N f(a) * f(abc) f(a) f(abc) f(a) f(c) * / f(bc) * f(bc) f(b) + / f(a) N AKO(vr(b)={X}) * ( ) f(b) + / N * f(bc) f(b) -( ) * ln ln / f(a) f(c) * ( ) * ln ln / N * ln * / f(bc) * f(bc) f(b) + / N * ( ) f(b) + / N * f(abc) N f(a) * f(a) f(abc) f(a) f(c) * / f(bc) * f(abc) f(b) + / N * ( ) f(b) + / N * f(b) f(c) * ln ln / f(abc) f(a) f(c) * / f(c) * ln ln ( ) * ln ln / N * / N * / N * ln f(c) * / f(a) f(b) + * ln ln f(abc) f(abc) f(a) f(a) N AKO(vr(b)={X}) ( ) f(b) + * / N * / N * ln f(c) * / f(a) f(b) + * ln ln * ln ln / f(abc) f(a) f(c) * / f(a) f(b) + * ln ln ( ) * ln ln / N * ln ln AKO(vr(c)={X}) N * AKO(vr(b)={X})  Verification on other collections

K.U. Leuven Leuven Results Some results are more easily interpretable (M13): ( ) f(c) * f(abc) / f(a) * f(abc) f(b) - AKO(POS(b)={X}) f(abc) /

K.U. Leuven Leuven Results  96% of measures with F 1 over 82% contain operator IF with condition “second word is a stopword”.

K.U. Leuven Leuven Conclusion  Standard measures are imitated by evolution  Genetic programming can be used to boost collocation extraction results for a particular corpus and to “invent” new AMs  Futher reasearch is needed:  Other test collections (domains, languages)  Extraction of digrams, tetragrams...

K.U. Leuven Leuven  Jan Šnajder, Bojana Dalbelo Bašić, Saša Petrović, Ivan Sikirić, Evolving new lexical association measures using genetic programming, The 46th Annual Meeting of the Association of Computational Linguistic: Human Language Technologies, Columbus, Ohio, June 15-20, 2008.

K.U. Leuven Leuven Thank you