LING/C SC/PSYC 438/538 Lecture 12 Sandiway Fong.

Slides:



Advertisements
Similar presentations
1/12 Steven Leung Very Basic Perl Tricks A Few Ground Rules File I/O and Formatting Operators, Flow Control Statements Regular Expression Subroutines Hash.
Advertisements

LING/C SC/PSYC 438/538 Lecture 11 Sandiway Fong. Administrivia Homework 3 graded.
Computer Science & Engineering 2111 Text Functions 1CSE 2111 Lecture-Text Functions.
LING/C SC/PSYC 438/538 Lecture 4 Sandiway Fong. Administrivia Homework 1 graded – you should have gotten an from me.
LING/C SC/PSYC 438/538 Lecture 4 9/1 Sandiway Fong.
LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 3: 8/28.
CS 898N – Advanced World Wide Web Technologies Lecture 8: PERL Chin-Chih Chang
Scripting Languages Chapter 6 I/O Basics. Input from STDIN We’ve been doing so with $line = chomp($line); Same as chomp($line= ); line input op gives.
LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 2: 8/23.
LING 388: Language and Computers Sandiway Fong Lecture 2: 8/23.
Regular Expressions Regular Expression (or pattern) in Perl – is a template that either matches or doesn’t match a given string. if( $str =~ /hello/){
W3101: Programming Languages (Perl) 1 Perl Regular Expressions Syntax for purpose of slides –Regular expression = /pattern/ –Broader syntax: if (/pattern/)
LING 388: Language and Computers Sandiway Fong Lecture 3: 8/28.
UNIX Filters.
LING/C SC/PSYC 438/538 Lecture 5 9/8 Sandiway Fong.
Chapter 4: UNIX File Processing Input and Output.
LING/C SC/PSYC 438/538 Lecture 5 Sandiway Fong. Today’s Topics File input/output – open, References Perl modules Homework 2: due next Monday by midnight.
Programming Perl in UNIX Course Number : CIT 370 Week 4 Prof. Daniel Chen.
Computer Programming for Biologists Class 5 Nov 20 st, 2014 Karsten Hokamp
LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 4: 8/30.
LING/C SC/PSYC 438/538 Lecture 4 Sandiway Fong. Continuing with Perl Homework 3: first Perl homework – due Sunday by midnight – one PDF file, by .
ASP.NET Programming with C# and SQL Server First Edition Chapter 5 Manipulating Strings with C#
Perl and Regular Expressions Regular Expressions are available as part of the programming languages Java, JScript, Visual Basic and VBScript, JavaScript,
Copyright © 2010 Certification Partners, LLC -- All Rights Reserved Perl Specialist.
Regular Expressions in PHP. Supported RE’s The most important set of regex functions start with preg. These functions are a PHP wrapper around the PCRE.
Chapter 9: Perl (continue) Advanced Perl Programming Some materials are taken from Sams Teach Yourself Perl 5 in 21 Days, Second Edition.
CS 330 Programming Languages 10 / 02 / 2007 Instructor: Michael Eckmann.
Artificial Intelligence Lecture No. 26 Dr. Asad Ali Safi ​ Assistant Professor, Department of Computer Science, COMSATS Institute of Information Technology.
Copyright © 2003 ProsoftTraining. All rights reserved. Perl Fundamentals.
LING/C SC/PSYC 438/538 Lecture 8 Sandiway Fong. Adminstrivia Homework 4 not yet graded …
Perl Variables: Array Web Programming1. Review: Perl Variables Scalar ► e.g. $var1 = “Mary”; $var2= 1; ► holds number, character, string Array ► e.g.
LING/C SC/PSYC 438/538 Lecture 6 Sandiway Fong. Homework 4 Submit one PDF file Your submission should include code and sample runs Due date Monday 21.
8 1 String Manipulation CGI/Perl Programming By Diane Zak.
ORAFACT Text Processing. ORAFACT Searching Inside Files grep - searches for patterns within files grep [options] [[-e] pattern] filename [...] -n shows.
LING/C SC/PSYC 438/538 Online Lecture 7 Sandiway Fong.
Introduction to Programming the WWW I CMSC Winter 2003 Lecture 17.
CSC 4630 Perl 3 adapted from R. E. Beck. Problem But we worked on it first: Input: Read from a text file named in a command line argument Output: List.
CMSC330 More Ruby. Last lecture Scripting languages Ruby language –Implicit variable declarations –Many control statements –Classes & objects –Strings.
Quiz 3 Topics Functions – using and writing. Lists: –operators used with lists. –keywords used with lists. –BIF’s used with lists. –list methods. Loops.
Python Syntax tips Henrike Zschach. 2DTU Systems Biology, Technical University of Denmark Why are we talking about syntax ’Good’ coding Good syntax should.
PROGRAMMING THE BASH SHELL PART III by İlker Korkmaz and Kaya Oğuz
Regular Expressions Copyright Doug Maxwell (
LING/C SC/PSYC 438/538 Lecture 5 Sandiway Fong.
Regular Expressions Upsorn Praphamontripong CS 1110
Lecture 19 Strings and Regular Expressions
Perl-Compatible Regular Expressions Part 1
Regular Expressions in Perl
Tutorial On Lex & Yacc.
Miscellaneous Items Loop control, block labels, unless/until, backwards syntax for “if” statements, split, join, substring, length, logical operators,
Chapter 19 PHP Part II Credits: Parts of the slides are based on slides created by textbook authors, P.J. Deitel and H. M. Deitel by Prentice Hall ©
LING/C SC/PSYC 438/538 Lecture 11 Sandiway Fong.
LING/C SC/PSYC 438/538 Lecture 10 Sandiway Fong.
LING/C SC/PSYC 438/538 Lecture 4 Sandiway Fong.
LING/C SC/PSYC 438/538 Lecture 8 Sandiway Fong.
LING/C SC/PSYC 438/538 Lecture 7 Sandiway Fong.
Folks Carelli, Instructor Kutztown University
LING/C SC/PSYC 438/538 Lecture 3 Sandiway Fong.
Perl Variables: Array Web Programming.
LING/C SC/PSYC 438/538 Lecture 6 Sandiway Fong.
LING 408/508: Computational Techniques for Linguists
LING/C SC/PSYC 438/538 Lecture 10 Sandiway Fong.
CSCI 431 Programming Languages Fall 2003
LING/C SC/PSYC 438/538 Lecture 12 Sandiway Fong.
LING 408/508: Computational Techniques for Linguists
LING/C SC/PSYC 438/538 Lecture 13 Sandiway Fong.
LING/C SC/PSYC 438/538 Lecture 11 Sandiway Fong.
LING 388: Computers and Language
LING/C SC/PSYC 438/538 Lecture 7 Sandiway Fong.
LING/C SC/PSYC 438/538 Lecture 3 Sandiway Fong.
LING/C SC/PSYC 438/538 Lecture 8 Sandiway Fong.
Presentation transcript:

LING/C SC/PSYC 438/538 Lecture 12 Sandiway Fong

Administrivia Homework 9 Perl regex Python re import re slightly complicated string handling: use raw https://docs.python.or g/3/library/re.html

File I/O Summary Common: Perl: Python: open filehandle (concept comes from the underlying OS) streams: STDIN STDOUT STDERR (Perl) streams: sys.stdin sys.stdout sys.stderr (Python) close Perl: https://perldoc.perl.org/perlopentut.html <filehandle> (context: reads a line or the whole file) print filehandle String Python: https://docs.python.org/3/tutorial/inputoutput.html#reading-and-writing-files .read() (methods) .readline() .readlines() .write(String) (no newline) print(*objects, sep=' ', end='\n', file=sys.stdout, flush=False) (function)

Regular Expressions to the rescue https://xkcd.com/208/

Regular Expressions from Hell Email validation: RFC 5322: (?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~- ]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01- \x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0- 9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1- 9]?[0-9]))\.){3}(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9])|[a-z0-9- ]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01- \x09\x0b\x0c\x0e-\x7f])+)\])

Homework 9 File: hw9.txt Contents: each line has 3 fields 56 lines Contents: each line has 3 fields name of state or US territory (in alphabetical order) population area (sq. miles) fields are separated by a tab (\t) Source: Wikipedia

Homework 9 Question 1 Using Perl supply the file hw9.txt on the command line DO NOT MODIFY hw9.txt read the file use regex to extract the information create hash table(s) indexed by name containing population and land area Print a table of states/territories inversely ranked by land area Print a table of states/territories ranked by population (i.e. 1st is highest population) compute the density (population per sq. mile) Print a table of states/territories ranked by density (i.e. 1st is highest density)

Homework 9 Question 1 Hints: note that some state/territory names consist of more than one word note that numeric values may have commas read about @ARGV read about split read about tr: $num =~ tr/,//d deletes the pesky commas in $num revisit sort parameters: https://perldoc.perl.org/functions/sort.html if you need to trim whitespace from the ends: $line =~ s/^\s+|\s+$//g; for nicely-formatted lists, read http://perldoc.perl.org/functions/sprintf.html about printf FORMAT

Homework 9: Question 2 538 only (optional for 438): Do the same exercise as Question 1 in Python3 using a dictionary or dictionaries In your opinion, which code is simpler? These may prove useful: str.strip() str.replace() str.split() sys.argv int()

Homework 9 Usual submission rule: ONE PDF file Submit code/run/comments Email subject heading: 438/538 Homework 4 Your Name Due date by midnight of next Monday (review in class on Tuesday)

regex Read textbook chapter 2: section 1 on Regular Expressions

Perl regex Read up on the syntax of Perl regular expressions Online tutorials http://perldoc.perl.org/perlrequick.html http://perldoc.perl.org/perlretut.html

Perl regex Perl regex matching: Perl regex match and substitute: $s =~ /foo/ (/…/ contains a regex) can use in a conditional: e.g. if ($s =~ /foo/) … evaluates to true/false depending on what’s in $s can also use as a statement: e.g. $s =~ /foo/; global variable $& contains the match Perl regex match and substitute: $s =~ s/foo/bar/ s/…match… /…substitute… / contains two expressions will modify $s by looking for a single occurrence of match and replacing that with substitute s/…match… /…substitute… /g global substitution

Perl regex Most useful with the code template for reading in a file line-by-line: open($fh, $ARGV[0]) or die "$ARGV[0] not found!\n"; while ($line = <$fh>) { do RE stuff with $line } close($fh)

Chapter 2: JM spaces matter! character class: Perl lingo

Chapter 2: JM range: in ASCII table backslash lowercase letter for class Uppercase variant for all but class

Chapter 2: JM

Chapter 2: JM Can use (…) if > 1 char Sheeptalk

Perl regex \s is a whitespace, so \S is a non-whitespace \S+ing\b \s is a whitespace, so \S is a non-whitespace + is repetition (1 or more) \b is a word boundary, (words are made up of \w characters)

Perl regex global variables \b or \b{wb} other boundary metacharacters: ^ (beginning of line), $ (end of line)

Perl regex: Unicode and \b \b{wb} Note: global match in while-loop Note: .*? is the non-greedy version of .*

Perl regex: Unicode and \w \w is [0-9A-Za-z_] Definition is expanded for Unicode: use utf8; use open qw(:std :utf8); my $str = "school école École šola trường स्कूल škole โรงเรียน"; @words = ($str =~ /(\w+)/g); foreach $word (@words) { print "$word\n" } list context Pragma https://perldoc.perl.org/open.html

Chapter 2: JM Why? * means zero or more repetitions of the previous char/expr . means any single character ? means previous char/expr is optional

Chapter 2: JM Precedence of operators Perl: Precedence Hierarchy: Example: Column 1 Column 2 Column 3 … /Column [0-9]+ */ /(Column [0-9]+ *)*/ /house(cat(s|)|)/ (| = disjunction; ? = optional) Perl: in a regular expression the pattern matched by within the pair of parentheses is stored in global variables $1 (and $2 and so on). (?: … ) group but exclude from storage Precedence Hierarchy: space

Online regex tester https://regex101.com

returns 1 (true) or "" (empty if false) Perl regex http://perldoc.perl.org/perlretut.html returns 1 (true) or "" (empty if false) A shortcut: list context for matching returns a list