1 Regular Expressions: grep LING 5200 Computational Corpus Linguistics Martha Palmer.

Slides:



Advertisements
Similar presentations
การใช้ระบบปฏิบัติการ UNIX พื้นฐาน บทที่ 4 File Manipulation วิบูลย์ วราสิทธิชัย นักวิชาการคอมพิวเตอร์ ศูนย์คอมพิวเตอร์ ม. สงขลานครินทร์ เวอร์ชั่น 1 วันที่
Advertisements

CSCI 330 T HE UNIX S YSTEM Regular Expressions. R EGULAR E XPRESSION A pattern of special characters used to match strings in a search Typically made.
1 More Xkwic and Tgrep LING 5200 Computational Corpus Linguistics Martha Palmer March 2, 2006.
LING 388: Language and Computers Sandiway Fong Lecture 2: 8/23.
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Spring 2006-Lecture 4.
Languages, grammars, and regular expressions
Regular Expressions. u A regular expression is a pattern which matches some regular (predictable) text. u Regular expressions are used in many Unix utilities.
CSE467/567 Computational Linguistics Carl Alphonce Computer Science & Engineering University at Buffalo.
UNIX Filters.
CS 124/LINGUIST 180 From Languages to Information Unix for Poets (in 2014) Dan Jurafsky (From Chris Manning’s modification of Ken Church’s presentation)
Shell Script Examples.
Review for Exam 1 Exam 1 on June 24 CSC 3320.
Chapter 4: UNIX File Processing Input and Output.
Mechanics Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See
Overview of the grep Command Alex Dukhovny CS 265 Spring 2011.
System Programming Regular Expressions Regular Expressions
Dedan Githae, BecA-ILRI Hub Introduction to Linux / UNIX OS MARI eBioKit Workshop; Nov , 2014.
CS 403: Programming Languages Fall 2004 Department of Computer Science University of Alabama Joel Jones.
Perl and Regular Expressions Regular Expressions are available as part of the programming languages Java, JScript, Visual Basic and VBScript, JavaScript,
UNIX Shell Script (1) Dr. Tran, Van Hoai Faculty of Computer Science and Engineering HCMC Uni. of Technology
Introduction to Unix – CS 21 Lecture 6. Lecture Overview Homework questions More on wildcards Regular expressions Using grep Quiz #1.
LING 388: Language and Computers Sandiway Fong Lecture 6: 9/15.
CSC 352– Unix Programming, Spring 2015 April 28 A few final commands.
Regular Expression Dr. Tran, Van Hoai Faculty of Computer Science and Engineering HCMC Uni. of Technology
CSCI 3130: Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong Text search.
1 Regular Expressions: grep LING 5200 Computational Corpus Linguistics Martha Palmer.
I/O Redirection and Regular Expressions February 9 th, 2004 Class Meeting 4.
Regular Expression - Intro Patterns that define a set of strings (or, pieces of a string) Not wildcards (similar notion, but different thing) Used by utilities.
Corpus Linguistics- Practical utilities (Lecture 7) Albert Gatt.
When you read a sentence, your mind breaks it into tokens—individual words and punctuation marks that convey meaning. Compilers also perform tokenization.
Regular Expressions in Perl CS/BIO 271 – Introduction to Bioinformatics.
Regular Expressions What is this line all about? while (!($search =~ /^\s*$/)) { It’s a string search just like before, but with a huge twist – regular.
Appendix A: Regular Expressions It’s All Greek to Me.
Test Automation For Web-Based Applications Portnov Computer School Presenter: Ellie Skobel.
Chapter Five Advanced File Processing. 2 Lesson A Selecting, Manipulating, and Formatting Information.
May 2008CLINT-LIN Regular Expressions1 Introduction to Computational Linguistics Regular Expressions (Tutorial derived from NLTK)
I/O Redirection & Regular Expressions CS 2204 Class meeting 4 *Notes by Doug Bowman and other members of the CS faculty at Virginia Tech. Copyright
Regular Expressions CS 2204 Class meeting 6 Created by Doug Bowman, 2001 Modified by Mir Farooq Ali, 2002.
1 DIG 3563: Lecture 2a: Regular Expressions Michael Moshell University of Central Florida Information Management.
By Corey Stokes 9/14/10. What is grep? Global Regular Expression Print grep is a command line search utility in Unix Try: Search for a word in a.cpp file.
CS 124/LINGUIST 180 From Languages to Information Unix for Poets (in 2013) Christopher Manning Stanford University.
UNIX Commands RTFM: grep(1), egrep(1) & fgrep(1) Gilbert Detillieux April 13, 2010 MUUG Meeting.
1 Introduction to Python LING 5200 Computational Corpus Linguistics Martha Palmer.
CSE391 – 2005 NLP 1 Events From KRR lecture. CSE391 – 2005 NLP 2 Ask Jeeves – A Q/A, IR ex. What do you call a successful movie? Tips on Being a Successful.
Validation using Regular Expressions. Regular Expression Instead of asking if user input has some particular value, sometimes you want to know if it follows.
CS 124/LINGUIST 180 From Languages to Information
1 XWindows apps: emacs, xkwic LING 5200 Computational Corpus Linguistics Martha Palmer February 9, 2006.
A Brief Overview of Unix Brandon Bohrer. Topics What is Unix? – Quick introduction Documentation – Where to get it, how to use it Text Editors – Know.
ORAFACT Text Processing. ORAFACT Searching Inside Files grep - searches for patterns within files grep [options] [[-e] pattern] filename [...] -n shows.
-Joseph Beberman *Some slides are inspired by a PowerPoint presentation used by professor Seikyung Jung, which was derived from Charlie Wiseman.
CSC 352– Unix Programming, Fall 2011 November 8, 2011, Week 11, a useful subset of regular expressions, grep and sed, parts of Chapter 11.
CS 403: Programming Languages Lecture 20 Fall 2003 Department of Computer Science University of Alabama Joel Jones.
Class Introduction. Agenda Syllabus Topics Text etc.
1 Regular Expressions and Xkwic LING 5200 Computational Corpus Linguistics Martha Palmer February 28, 2006.
Regular Expressions Copyright Doug Maxwell (
The UNIX Shell Learning Objectives:
Regular Expression - Intro
Language and Grammar classes
The chomsky hierarchy Module 03.3 COP4020 – Programming Language Concepts Dr. Manuel E. Bermudez.
Formal Language Theory
CS 124/LINGUIST 180 From Languages to Information
Folks Carelli, Instructor Kutztown University
An Overview of Grep and Regular Expression
CS 124/LINGUIST 180 From Languages to Information
CSE 303 Concepts and Tools for Software Development
Regular Expressions and Grep
CSCI The UNIX System Regular Expressions
Regular Expressions grep Familiy of Commands
Review.
Regular Expressions.
Presentation transcript:

1 Regular Expressions: grep LING 5200 Computational Corpus Linguistics Martha Palmer

LING 5200, 2006 BASED on Kevin Cohen’s LING Homework 2 Bytes Read path names ~ not necessary in home directory Display results of commands if they’re just a few lines.

LING 5200, 2006 BASED on Kevin Cohen’s LING Switches -c list a count of matching lines only  (like adding | wc) -i ignore the case of the letters in the pattern -n include the line numbers -v show lines that do NOT match the pattern grep -i lemma README.english grep -ic lemma README.english grep -in lemma README.english

LING 5200, 2006 BASED on Kevin Cohen’s LING The Chomsky Grammar Hierarchy Regular grammars, aabbbb S → aS | nil | bS Context free grammars, aaabbb S → aSb | nil Context sensitive grammars, aaabbbccc xSy → xby Transformational grammars - Turing Machines

LING 5200, 2006 BASED on Kevin Cohen’s LING Movement What did John give to Mary? *Where did John give to Mary? John gave cookies to Mary. John gave to Mary.

LING 5200, 2006 BASED on Kevin Cohen’s LING Nested Dependencies and Crossing Dependencies John, Mary and Bill ate peaches, pears and apples, respectively The dog chased the cat that bit the mouse that ran. The mouse the cat the dog chased bit ran. CF CS

LING 5200, 2006 BASED on Kevin Cohen’s LING Most parsers are Turing Machines To give a more natural and comprehensible treatment of movement For a more efficient treatment of features Not because of respectively – most parsers can’t handle it.

LING 5200, 2006 BASED on Kevin Cohen’s LING b*c matches the first character in the string cabbbcde, b*cd matches the third to seventh characters in the string cabbbcdebbbbbbcdbc.

LING 5200, 2006 BASED on Kevin Cohen’s LING Character classes: ranges All upper-case, all lower-case, all letters, any digit from zero to 9… [A-Z] [a-z] [A-Za-z] [0-9] Practice!

LING 5200, 2006 BASED on Kevin Cohen’s LING Character classes: complements Any character that's not a vowel [^aeiouAEIOU] In this context, means "not"

LING 5200, 2006 BASED on Kevin Cohen’s LING Anchors Any line that begins with… Any line that ends with… ^T line that begins with T VBZ$ line that ends with VBZ

LING 5200, 2006 BASED on Kevin Cohen’s LING Quantifiers One or more… Zero or more… One or zero… a+ one or more “a's” a* zero or more “a's” a? one “a”, or nothing And more…

LING 5200, 2006 BASED on Kevin Cohen’s LING grep/egrep X+ instead of xx* (xxx|yyy) ? Matches a single character

LING 5200, 2006 BASED on Kevin Cohen’s LING Searching the treebank cat ??/* | egrep -i '(push|pull)[a-z]*’

LING 5200, 2006 BASED on Kevin Cohen’s LING grep/egrep grep '^[^a-z]*epl' README.english grep ‘ epl' README.english egrep '^[^a-z]*(epl|epw)' README.english egrep ‘ (epl|epw)' README.english Nice when you have tokenized strings…

LING 5200, 2006 BASED on Kevin Cohen’s LING More grepping But when you don’t…. /corpora/celex/english/epw/epw.cd Find all capitalized words grep ^'[0-9][0-9]*.[A-Z]' epw.cd | wc -l

LING 5200, 2006 BASED on Kevin Cohen’s LING Exercises – pick a directory How many 5 letter words? head -10 wsj_0564 | grep -i ' [a-z][a-z][a-z][a-z][a-z] ' | wc grep -i ' [a-z][a-z][a-z][a-z][a-z] ' * | wc

LING 5200, 2006 BASED on Kevin Cohen’s LING Lab (cont.) Are there any words with no vowels? grep -i ' [^aeiou][^aeiou]* ' wsj_0564 | wc grep -i ' [^aeiouy][^aeiouy.]* ' wsj_0564 | wc grep -i ' [^aeiouy"][^aeiouy."]* ' wsj_ %?

LING 5200, 2006 BASED on Kevin Cohen’s LING Lab (cont.) Find “1-syllable” words. (words with exactly one vowel) grep -i ' [^aeiouy]*[aeiouy][^aeiouy]* ‘ Find “2- syllable” words. (words with exactly two vowels) Delete words ending with a silent “e” from the “2-syllable” list

LING 5200, 2006 BASED on Kevin Cohen’s LING Emacs emacs –nw Control x, control c – exit Control x, control s – save Control x, control v – visit Appropos