LING/C SC/PSYC 438/538 Lecture 11 Sandiway Fong
Administrivia Homework 4 Review: Continuing with Perl regex graded, email from me Continuing with Perl regex Ungraded Homework Exercises for next time
Homework 4 Review File: population.txt Contents: Source: Wikipedia rank name continent population (2016) population (2017) fields are separated by a tab (\t) Source: Wikipedia
Homework 4: Question 1 Review Using Perl read the file create hash table(s) indexed by country name containing the following information: continent/2016 population/2017 population Compute and print the country that decreased in population. Compute and print the country with the smallest positive increase in population. Print a table of countries in Asia and 2016 population ranked by 2016 population Print a table of countries in Africa and 2016 population ranked inversely by 2016 population
Homework 4: Question 1 Review perl hw4.perl population.txt Japan decreased in population Only one country, Japan, decreased in population! Russia increased in population the least! Asian countries ranked by 2016 population China 1403500365 India 1324171354 Indonesia 261115456 Pakistan 193203476 Bangladesh 162951560 Japan 127748513 Philippines 103320222 Vietnam 94569072 Iran 80277428 Turkey 79512426 Thailand 68863514 African countries ranked inversely by 2016 population Democratic Republic of the Congo 78736153 Egypt 95688681 Ethiopia 102403196 Nigeria 185989640
Homework 4: Question 1 Review Rank: #1, #2, #3, …
Homework 4: Question 1 Review Separate hash table for each field (continent, p2016, p2017, diff) (avoids references) For each line: chomp trim split remove commas assign
Homework 4: Question 1 Review $neggrowth = country with negative population growth $count = counts countries with negative population growth
Homework 4: Question 1 Review
Homework 4: Question 1 Review
Homework 4: Question 1 Review
Homework 4: Question 2 Review Do the same exercise in Python3 using a dictionary or dictionaries
Homework 4: Question 2 Review python3 hw4.py population.txt Only one country, Japan , lost population Country with minimum positive population growth is Russia Country 2016 population China 1,403,500,365 India 1,324,171,354 Indonesia 261,115,456 Pakistan 193,203,476 Bangladesh 162,951,560 Japan 127,748,513 Philippines 103,320,222 Vietnam 94,569,072 Iran 80,277,428 Turkey 79,512,426 Thailand 68,863,514 Country 2016 population Democratic Republic of the Congo 78,736,153 Egypt 95,688,681 Ethiopia 102,403,196 Nigeria 185,989,640
Homework 4: Question 2 Review with … as f: automatically closes the filehandle f for … in f: iterates over all the lines .strip().replace() left-to-right sequence order fields[0] rank fields[1] name fields[2] continent fields[3] 2016 population fields[4] 2017 population fields[2:] slice = fields[2:5] 'China' ['Asia', '1403500365', '1409517397']
Homework 4: Question 2 Review list comprehension grabs all country names c for countries where the 2016 population (table[c][1]) > 2017 population (table[c][2]) e.g. table['China'] = ['Asia', '1403500365', '1409517397'] table['China'][0] = 'Asia' table['China'][1] = '1403500365 ' table['China'][2] = '1409517397 '
Homework 4: Question 2 Review .pop() removes 'Japan' from the table; value is stored in variable saved function min() computes the smallest value in the table and returns the key (country name) associated with that value key= tells min() to compare the values given by a function (lambda) that when supplied with a country (k) return the expression given by 2017 population – 2016 population last line restores 'Japan' to the table
Homework 4: Question 2 Review the list comprehension finds all the countries in Asia function sorted() reverse sorts that list with parameter key=lambda k: int(table[k][1]) format string has basic form '{:s} {:d}'.format(X,Y) s=string, d=(decimal) integer. Options are <= left align, > = right align, and , = thousands comma
Homework 4: Question 2 Review Very similar code to that of the previous slide: no reversed=True
Homework 4: Question 3 Review Most of you preferred Python 3 of you preferred Perl Some cited % @ $ as making Perl hard (to write/read) Some used pandas (https://pandas.pydata.org)
Reading Homework Read up on the syntax of Perl Regular Expressions Online tutorials http://perldoc.perl.org/perlrequick.html http://perldoc.perl.org/perlretut.html Practice (ungraded): do regex exercises 2.1 in JM (pg. 42) I will review some of them on Thursday
Today's Topic More Perl Regex: Variables: $&, $`, $', $1, $2, $3, … Backreferences Greedy and non-greedy matching
Online regex tester https://regex101.com
Chapter 2: JM Precedence of operators Perl: Precedence Hierarchy: /house(cat(s|)|)/ (| = disjunction; ? = optional) Perl: in a regular expression the pattern matched by within the pair of parentheses is stored in global variables $1 (and $2 and so on) Precedence Hierarchy:
returns 1 (true) or "" (empty if false) Perl regex http://perldoc.perl.org/perlretut.html returns 1 (true) or "" (empty if false) A shortcut: list context for matching returns a list
Chapter 2: JM s/([0-9]+)/<\1>/ what does this do? Backreferences give Perl regexs more expressive power than Finite State Automata (FSA)
Shortest vs. Greedy Matching default behavior in Perl RE match: take the longest possible matching string aka greedy matching This behavior can be changed, see next slide
Shortest vs. Greedy Matching from http://www.perl.com/doc/manual/html/pod/perlre.html Example: $_ = "The food is under the bar in the barn."; if ( /foo(.*?)bar/ ) { print ”matched <$1>\n"; } Output: greedy (.*): matched <d is under the bar in the > shortest (.*?): matched <d is under the > Notes: ? immediately following a repetition operator like * (or +) makes the operator work in non-greedy mode (.*?) (.*)