Methods in Computational Linguistics II Queens College Lecture 1: Introduction.

Slides:



Advertisements
Similar presentations
Garfield AP Computer Science
Advertisements

Fundamentals of Python: From First Programs Through Data Structures
DIVIDE AND CONQUER APPROACH. General Method Works on the approach of dividing a given problem into smaller sub problems (ideally of same size).  Divide.
Search algorithm In computer science, a search algorithm is an algorithm that takes a problem as input and returns a solution to the problem, usually after.
COMP 171 Data Structures and Algorithms Tutorial 4 In-Class Exercises: Algorithm Design.
What is an Algorithm? (And how do we analyze one?)
Shallow Processing: Summary Shallow Processing Techniques for NLP Ling570 December 7, 2011.
Lecture 5: Linear Time Sorting Shang-Hua Teng. Sorting Input: Array A[1...n], of elements in arbitrary order; array size n Output: Array A[1...n] of the.
1/7 INFO60021 Natural Language Processing Harold Somers Professor of Language Engineering.
DAST, Spring © L. Joskowicz 1 Data Structures – LECTURE 1 Introduction Motivation: algorithms and abstract data types Easy problems, hard problems.
CSCE156: Introduction to Computer Science II Instructor Stephen Scott Website
Introduction to CL Session 1: 7/08/2011. What is computational linguistics? Processing natural language text by computers  for practical applications.
CSSE221: Software Dev. Honors Day 23 Announcements: Announcements: Pass in yesterday’s capsule quiz Pass in yesterday’s capsule quiz Homework 7 electronic.
تمرين شماره 1 درس NLP سيلابس درس NLP در دانشگاه هاي ديگر ___________________________ راحله مکي استاد درس: دکتر عبدالله زاده پاييز 85.
Analysis of Algorithms 7/2/2015CS202 - Fundamentals of Computer Science II1.
CS146 Overview. Problem Solving by Computing Human Level  Virtual Machine   Actual Computer Virtual Machine Level L0.
DAST, Spring © L. Joskowicz 1 Data Structures – LECTURE 1 Introduction Motivation: algorithms and abstract data types Easy problems, hard problems.
Unit 1. Sorting and Divide and Conquer. Lecture 1 Introduction to Algorithm and Sorting.
ELN – Natural Language Processing Giuseppe Attardi
CAREERS IN LINGUISTICS OUTSIDE OF ACADEMIA CAREERS IN INDUSTRY.
Machine Learning Queens College Lecture 1: Introduction.
9/8/20151 Natural Language Processing Lecture Notes 1.
Machine Learning CUNY Graduate Center Lecture 1: Introduction.
April 2005CSA2050:NLTK1 CSA2050: Introduction to Computational Linguistics NLTK.
ITEC 2620A Introduction to Data Structures
1 Computational Linguistics Ling 200 Spring 2006.
Machine Learning in Spoken Language Processing Lecture 21 Spoken Language Processing Prof. Andrew Rosenberg.
10/12/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 10 Giuseppe Carenini.
1 Statistical NLP: Lecture 9 Word Sense Disambiguation.
Lecture 1: Introduction and Overview CSCI 700 – Algorithms 1.
This work is supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number.
DATA STRUCTURES (CS212D) Week # 1: Overview & Review.
CS 162 Intro to Programming II Searching 1. Data is stored in various structures – Typically it is organized on the type of data – Optimized for retrieval.
CIS3023: Programming Fundamentals for CIS Majors II Summer 2010 Ganesh Viswanathan Searching Course Lecture Slides 28 May 2010 “ Some things Man was never.
1 CSI 5180: Topics in AI: Natural Language Processing, A Statistical Approach Instructor: Nathalie Japkowicz Objectives of.
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
Natural language processing tools Lê Đức Trọng 1.
October 2005CSA3180 NLP1 CSA3180 Natural Language Processing Introduction and Course Overview.
Data Structure Introduction.
CSA2050 Introduction to Computational Linguistics Lecture 1 Overview.
TEXT ANALYTICS - LABS Maha Althobaiti Udo Kruschwitz Massimo Poesio.
Corpus Linguistics in Research Doctorate in Education University of Warwick 6th November 2008.
Data Structures and Algorithms Searching Algorithms M. B. Fayek CUFE 2006.
Algorithm Analysis. What is an algorithm ? A clearly specifiable set of instructions –to solve a problem Given a problem –decide that the algorithm is.
2IS80 Fundamentals of Informatics Fall 2015 Lecture 6: Sorting and Searching.
CES 592 Theory of Software Systems B. Ravikumar (Ravi) Office: 124 Darwin Hall.
DATA STRUCTURES (CS212D) Overview & Review Instructor Information 2  Instructor Information:  Dr. Radwa El Shawi  Room: 
 Design and Analysis of Algorithms تصميم وتحليل الخوارزميات (311 عال) Chapter 2 Sorting (insertion Sort, Merge Sort)
Overview of Statistical NLP IR Group Meeting March 7, 2006.
Principles of Imperative Computation Lecture 1 January 15 th, 2012.
Concept-Based Analysis of Scientific Literature Chen-Tse Tsai, Gourab Kundu, Dan Roth UIUC.
CMPT 120 Topic: Searching – Part 2 and Intro to Time Complexity (Algorithm Analysis)
Introductory Lecture. What is Discrete Mathematics? Discrete mathematics is the part of mathematics devoted to the study of discrete (as opposed to continuous)
Machine Language Computer languages cannot be directly interpreted by the computer – they are not in binary. All commands need to be translated into binary.
Problem Solving with NLTK MSE 2400 EaLiCaRA Dr. Tom Way.
Natural Language Processing (NLP)
Teach A level Computing: Algorithms and Data Structures
LING 388: Computers and Language
Data Structures (CS212D) Overview & Review.
Adapted from slides by Marty Stepp and Stuart Reges
Text Analytics Giuseppe Attardi Università di Pisa
CSc 110, Spring 2017 Lecture 39: searching.
Machine Learning in Practice Lecture 11
Adapted from slides by Marty Stepp and Stuart Reges
Data Structures (CS212D) Overview & Review.
Natural Language Processing (NLP)
CS224N Section 3: Corpora, etc.
Algorithms.
Natural Language Processing (NLP)
Presentation transcript:

Methods in Computational Linguistics II Queens College Lecture 1: Introduction

Methods in Computational Linguistics II 2 nd semester of a two semester course providing instruction in –The basics of computer science and programming (via python) –An introduction to techniques in computational linguistics 1

My background Research –Speech Synthesis, and Recognition –Prosody (Intonation) –Speech Segmentation –Non-native speech –Political speech, and other paralinguistics Computer Science professor at Queens and CUNY GC. Worked at IBM and Google 2

Your Background Name. What are your research interests in linguistics? How do you expect computational linguistics to fit into your work? –Are there techniques or applications that you are particularly looking to learn Programming background? –1 semester? more? Are you simultaneously taking Language Technologies 3

Outline NLTK –Overview –Major Capabilities Searching and Sorting. –Linear (Sequential) search –Binary Search –Insertion sort –MergeSort Course Policies Syllabus Review 4

NLTK Natural Language Toolkit. A set of utilities in python that facilitate the processing of text. 5

NLTK Functionality Accessing corpora String processing Collocation discovery Part of speech tagging Classification and Clustering Evaluation Metrics Chunking Parsing 6

NLTK Functionality Semantic interpretation –first order logic, lambda calculus, model checking Probability and estimation WordNet Browsing Chatbots 7

NLTK as a resource This range of functionality is quite broad, and not necessarily cohesive. However, there are resources and tools (functions and objects) that underpin most major computational linguistics tasks. 8

Major Computational Linguistics Tasks Syntax –Tagging –Parsing Semantics –Information Extraction –Semantic Role Labeling Phonology Sentence Processing Segmentation Summarization Speech Recognition Speech Synthesis Information Retrieval Sentiment Analysis Authorship studies Co-reference resolution 9

NLTK Resources NLTK also contained lexical material –Project Gutenberg –WordNet –Penn Treebank (subset) –Named Entity Recognition data –Inaugural addresses –Sentiment data –Names corpus –Switchboard (subset) –TIMIT –Webtext 10

Quick Assignment Methods I used NLTK. Homework 0 –Make sure that NLTK is installed and working correctly –Install matplotlib to use nltk’s graphing functions. “Due” asap. 11

One Question Pop Quiz Solve for p 12

Math Computational Linguistics requires a not- quite-trivial amount of math. Statistics and probabilistic modeling form the pillars underlying these computational techniques. This involves counting and algebra. Machine learning governs the classification and clustering techniques that CL makes heavy use of. –Requires calculus, statistics, linear algebra. 13

Math in this course Overview of probability. –Next class Algebra for evaluation, some common features Statistics for Naïve Bayes classification Entropy in Decision Trees 14

Outline NLTK –Overview –Major Capabilities Searching and Sorting. –Linear (Sequential) search –Binary Search –Insertion sort –MergeSort Course Policies Syllabus Review 15

Data Structures, Algorithms, etc. In computer science, there is a tight relationship between data structures and algorithms In general, the more complex the data structure –the more general or flexible the data and relationships that can be represented –the faster algorithms can run 16

Searching and Sorting Searching and sorting is a frequent example of the relationship between algorithm runtimes, and data structuring. Search: identify the location of a value, x, in a list, A. Sort: manipulate a list A, such that the values in A are increasing. A[i] <= A[i+1] 17

Sequential Search def search(A, x): for i in xrange(len(A)): if A[i] == x: return i return -1 18

How long does sequential search take to run? Best case? Worst case? Average case? 19

Binary Search If the list A is in increasing order, large chunks of the list can be be ignored. 20 def search(A, x): top = len(A) bottom = 0 while bottom < top: mid = (top + bottom) / 2 if A[mid] < x: bottom = mid + 1 elif A[mid] > x: top = mid else: return mid return -1

How long does binary search take to run? Best Case? Worst Case? Average Case? 21

Improvement of Binary Search Binary search is a significant improvement –log n < n However, Binary search requires that A is sorted. How long does it take to sort an Array and how does this impact the total runtime? 22

Insertion Sort Sort the list [5, 2, 4, 6, 1, 3] 23 def insertionSort(A): for j in xrange(1, len(A)): key = A[j] i = j - 1 while i > -1 and A[i] > key: A[i + 1] = A[i] i = i - 1 A[i + 1] = key

How long does Insertion sort take to run? Best Case? Worst Case? Average Case? 24

Can we sort faster? Yes. This requires recursion. We’ll come back to this, but here is a first example. 25

Merge Sort 26 def mergeSort(A): if len(A) == 1: return A mid = len(A) / 2 Abottom = mergeSort(A[1:mid]) Atop = mergeSort(A[mid + 1:len(A)]) return merge(Abottom, Atop)

Merge 27 def merge(A, B): C = [] i = 0 j = 0 A.append(float('inf')) B.append(float('inf')) for k in xrange(len(A) + len(B)): if A[i] < B[j]: C.append(A[i]) i = i + 1 else: C.append(B[j]) j = j + 1 return C

How long does Merge Sort take to run? Hint: This is a (much) harder question. Best Case? Worst Case? Average Case? 28

Comparison of run times 29 SortingSearching 0n n*log(n)log(n) How much searching do you need to do to make it worth sorting?

Class Structure and Policies Course website: – list –Banner does not have an function –Put your address on the sign up sheet. 30