CS 6961: Structured Prediction Fall 2014 Introduction Lecture 1 What is structured prediction?

Slides:



Advertisements
Similar presentations
Latent Variables Naman Agarwal Michael Nute May 1, 2013.
Advertisements

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Statistical Machine Learning- The Basic Approach and Current Research Challenges Shai Ben-David CS497 February, 2007.
Introduction to Support Vector Machines (SVM)
Introduction to Machine Learning BITS C464/BITS F464
CHAPTER 2: Supervised Learning. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Learning a Class from Examples.
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Machine learning continued Image source:
LEARNING FROM OBSERVATIONS Yılmaz KILIÇASLAN. Definition Learning takes place as the agent observes its interactions with the world and its own decision-making.
Introduction to Machine Learning course fall 2007 Lecturer: Amnon Shashua Teaching Assistant: Yevgeny Seldin School of Computer Science and Engineering.
Presented by Zeehasham Rasheed
What is Learning All about ?  Get knowledge of by study, experience, or being taught  Become aware by information or from observation  Commit to memory.
Introduction to Machine Learning Approach Lecture 5.
Machine Learning Theory Maria-Florina Balcan Lecture 1, Jan. 12 th 2010.
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
CS Machine Learning. What is Machine Learning? Adapt to / learn from data  To optimize a performance function Can be used to:  Extract knowledge.
INTELLIGENT SYSTEMS Artificial Intelligence Applications in Business.
An Introduction to Support Vector Machines Martin Law.
Machine Learning Theory Maria-Florina (Nina) Balcan Lecture 1, August 23 rd 2011.
CAREERS IN LINGUISTICS OUTSIDE OF ACADEMIA CAREERS IN INDUSTRY.
CSCI 347 – Data Mining Lecture 01 – Course Overview.
CSCI-383 Object-Oriented Programming & Design Lecture 4.
INTRODUCTION TO MACHINE LEARNING. $1,000,000 Machine Learning  Learn models from data  Three main types of learning :  Supervised learning  Unsupervised.
Chapter 6 Supplement Knowledge Engineering and Acquisition Chapter 6 Supplement.
Artificial Neural Network Theory and Application Ashish Venugopal Sriram Gollapalli Ulas Bardak.
Invitation to Computer Science, Java Version, Second Edition.
Computer Vision CS 776 Spring 2014 Recognition Machine Learning Prof. Alex Berg.
LDK R Logics for Data and Knowledge Representation Modeling First version by Alessandro Agostini and Fausto Giunchiglia Second version by Fausto Giunchiglia.
THE BIG PICTURE Basic Assumptions Linguistics is the empirical science that studies language (or linguistic behavior) Linguistics proposes theories (models)
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
Universit at Dortmund, LS VIII
CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov
An Introduction to Support Vector Machines (M. Law)
Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.
Programming Languages and Design Lecture 3 Semantic Specifications of Programming Languages Instructor: Li Ma Department of Computer Science Texas Southern.
Concept learning, Regression Adapted from slides from Alpaydin’s book and slides by Professor Doina Precup, Mcgill University.
Data Structures and Algorithms Dr. Tehseen Zia Assistant Professor Dept. Computer Science and IT University of Sargodha Lecture 1.
Recognition Using Visual Phrases
Support Vector Machines. Notation Assume a binary classification problem. –Instances are represented by vector x   n. –Training examples: x = (x 1,
Support Vector Machines and Kernel Methods for Co-Reference Resolution 2007 Summer Workshop on Human Language Technology Center for Language and Speech.
Copyright Paula Matuszek Kinds of Machine Learning.
Supervised Machine Learning: Classification Techniques Chaleece Sandberg Chris Bradley Kyle Walsh.
CES 592 Theory of Software Systems B. Ravikumar (Ravi) Office: 124 Darwin Hall.
Learning Photographic Global Tonal Adjustment with a Database of Input / Output Image Pairs.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Compiler Construction CPCS302 Dr. Manal Abdulaziz.
Machine Learning Lecture 1: Intro + Decision Trees Moshe Koppel Slides adapted from Tom Mitchell and from Dan Roth.
Pattern Recognition. What is Pattern Recognition? Pattern recognition is a sub-topic of machine learning. PR is the science that concerns the description.
Machine learning & object recognition Cordelia Schmid Jakob Verbeek.
CS 9633 Machine Learning Support Vector Machines
Advanced Computer Systems
Machine Learning for Computer Security
Week 3 - Wednesday CS 113.
Algorithms, Part 1 of 3 The First step in the programming process
1.3 Finite State Machines.
Lecture #1 Introduction
Part 2 Applications of ILP Formulations in Natural Language Processing
School of Computer Science & Engineering
Kai-Wei Chang University of Virginia
COMS W1004 Introduction to Computer Science and Programming in Java
2008/09/17: Lecture 4 CMSC 104, Section 0101 John Y. Park
What is Pattern Recognition?
Presented by: Prof. Ali Jaoua
Classification Slides by Greg Grudic, CSCI 3202 Fall 2007
Algorithm Discovery and Design
3.1.1 Introduction to Machine Learning
Dan Roth Department of Computer Science
CS249: Neural Language Model
Naïve Bayes Classifier
Presentation transcript:

CS 6961: Structured Prediction Fall 2014 Introduction Lecture 1 What is structured prediction?

Our goal today To define a Structure and Structured Prediction 2

What are structures? 3

What are some examples of structured data? Database tables and spreadsheets HTML documents JSON objects Wikipedia info-boxes Computer programs … we will see more examples 4

What are some examples of unstructured data? Images and photographs Videos Text documents PDF files Books … 5 What makes these unstructured? How are they different from the previous list?

Structured representations are useful because …. We know how to process them – Algorithms for managing symbolic data – Computational complexity well understood They abstract away unnecessary complexities – Why deal with text/images/etc when you can process a database with the same information? 6

Have we made the problem easier? 7 What does the splitting of water lead to? A: Light absorption B: Transfer of ions Water is split, providing a source of electrons and protons (hydrogen ions, H + ) and giving off O 2 as a by-product. Light absorbed by chlorophyll drives a transfer of the electrons and hydrogen ions from water to an acceptor called NADP +.

Reading comprehension can be hard! 8 Water is split, providing a source of electrons and protons (hydrogen ions, H + ) and giving off O 2 as a by-product. Light absorbed by chlorophyll drives a transfer of the electrons and hydrogen ions from water to an acceptor called NADP +. Enable Cause What does the splitting of water lead to? A: Light absorption B: Transfer of ions To answer the question, we need a structured representation.

Machine learning to the rescue Techniques from statistical learning can help build these representations In fact, machine learning is necessary to scale up and generalize this process 9

A detour about classification 10

Classification We know how to train classifiers – Given an , spam or not spam? – Is a review positive or negative? – Which folder should an be automatically be placed into? – “Predict if a car purchased at an auction is a lemon” And other such questions from kaggle 11

Standard classification setting Notation – X: A feature representation of input – Y: One of a set of labels ( spam, not-spam ) The goal: To learn a function X ! Y that maps examples into a category The standard recipe 1.Collect labeled examples {(x 1,y 1 ), (x 2, y 2 ),  } 2.Train a function f: X ! Y that a.Is consistent with the observed examples, and b.Can hopefully be correct on new unseen examples 12 Based on slides of Dan Roth

Classification is generally well understood Theoretically: generalization bounds – We know how many examples one needs to see to guarantee good behavior on unseen examples Algorithmically: good learning algorithms for linear representations – Efficient and can deal with high dimensionality (millions of features) Some open questions – What is a good feature representation? – Learning protocols: how to minimize supervision, efficient semi-supervised learning, active learning 13 Based on slides of Dan Roth Is this sufficient for solving problems like the reading comprehension one? No!

Examples where standard classification is not enough 14

Semantic Role Labeling X: John saw the dog chasing the ball. Y: 15 Predicatesee ViewerJohn Thing viewedThe dog chasing the ball PredicateChase ChaserThe dog Thing chasedthe ball See( Viewer : John, Viewed : the dog chasing the ball) Chase( Chaser : the dog, Chased : the ball) Or equivalently, predicate-argument representations

Semantic Parsing X: “A python function that takes a name and prints the string Hello followed by the name and exits.” Y: X: “Find the largest state in the US.” Y: 16 SELECT name FROM us_states WHERE size = (SELECT MAX(size) FROM us_states) In all these cases, the output Y is a structure

What is a structure? One definition By … linguistic structure, we refer to symbolic representations of language posited by some theory of language. 17 From the book Linguistic Structure Prediction, by Noah Smith, 2011.

What is in this picture? 18 Photo by Andrew Dressel - Own work. Licensed under Creative Commons Attribution-Share Alike 3.0

Object detection 19 Photo by Andrew Dressel - Own work. Licensed under Creative Commons Attribution-Share Alike 3.0 Right facing bicycle Right facing bicycle left wheel right wheel handle bar saddle/seat

The output: A schematic showing the parts and their relative layout 20 left wheel right wheel handle bar saddle/seat Right facing bicycle Right facing bicycle Once again, a structure

A working definition of a structure A structure is a concept that can be applied to any complex thing, whether it be a bicycle, a commercial company, or a carbon molecule. By complex, we mean: 1.It is divisible into parts, 2.There are different kinds of parts, 3.The parts are arranged in a specifiable way, and, 4.Each part has a specifiable function in the structure of the thing as a whole 21 From the book Analysing Sentences: An Introduction to English Syntax by Noel Burton-Roberts, 1986.

What is structured prediction? 22

Standard classification tools can’t predict structures X: “Find the largest state in the US.” Y: Classification is about making one decision – Spam or not spam, or predict one label, etc We need to make multiple decisions – Each part needs a label Should “US” be mapped to us_states or utah_counties? Should “Find” be mapped to SELECT or FROM or WHERE? – The decisions interact with each other If the outer FROM clause talks about the table us_states, then the inner FROM clause should not talk about utah_counties – How to compose the fragments together to create the whole structure? Should the output consist of a WHERE clause? What should go in it? 23 SELECT name FROM us_states WHERE size = (SELECT MAX(size) FROM us_states)

Structured prediction: Machine learning of interdependent variables Unlike standard classification problems, many problems have structured outputs with – Multiple interdependent output variables – Both local and global decisions to be made Mutual dependencies necessitate a joint assignment to all the output variables – Joint inference or Global inference or simply Inference These problems are called structured output problems 24

Computational issues 25 Model definition What are the parts of the output? What are the inter-dependencies? How to train the model? How to do inference? Data annotation difficulty Background knowledge about domain Semi- supervised/indirectly supervised?

Another look at the important issues Availability of supervision – Supervised algorithms are well studied; supervision is hard (or expensive) to obtain Complexity of model – More complex models encode complex dependencies between parts; complex models make learning and inference harder Features – Most of the time we will assume that we have a good feature set to model our problem. But do we? Domain knowledge – Incorporating background knowledge into learning and inference in a mathematically sound way 26