Presentation is loading. Please wait.

Presentation is loading. Please wait.

LING 408/508: Computational Techniques for Linguists

Similar presentations


Presentation on theme: "LING 408/508: Computational Techniques for Linguists"— Presentation transcript:

1 LING 408/508: Computational Techniques for Linguists
Lecture 9

2 Adminstrivia Out of town: no class on October 23rd
Quick Homework 4 on awk out today

3 awk

4 awk Why awk? command line for extracting textual data
a mini-programming language can be very fast … 009/09/dont-mawk-awk-the- fastest-and-most-elegant-big- data-munging-language/

5 awk Powerful pattern-matching is at the heart of awk:

6 awk Manpage: Regular expressions (regex):

7 awk Manpage:

8 awk awk is a very useful command man awk for examples
it allows you process files line by line and extract matching information Words on a line: $1 is word #1 in a line $2 is word #2 in a line (separated from #1 by space(s)) etc. Some simple Awk code: print $3 means print word #3 in a line vname=0 set variable vname to 0 (note: no $) (arithmetic expressions ok on the right side of the =, e.g. vname=vname+2) if (…) { … } else { … } conditional: e.g. $1 >= 3 ; separates statements Syntax: awk 'BEGIN { } { } END { }' data.txt means execute awk code block { } at the beginning then process each line of data.txt using awk code block { } then at the end execute awk code block { } BEGIN { } is optional END { } is also optional man awk for examples

9 awk Manpage:

10 awk awk is locale sensitive: Ubuntu LTS supports UTF-8 by default

11 awk Example: Top 30 surnames and percentages in the Canary Islands according to Wikipedia Filename: surnames.txt (3 fields: rank, name, and percentage of population) Run the following awk code to figure out what the code does: awk '{ print $2; }' surnames.txt awk '{ if ($3>=1) {print $2;} }' surnames.txt Note accent marks: UTF-8

12 Quick Homework 4 Write awk code to:
print a table of and calculate the total percentage of population for the top 10, 20 and 30 surnames read and print out the table with table headings aligned with the field values (use printf)

13 Quick Homework 4 for printf documentation, read: #Printf

14 Quick Homework 4 Due Wednesday by midnight Usual rules: email to me
one PDF file subject: Homework 4 408/508 Your name


Download ppt "LING 408/508: Computational Techniques for Linguists"

Similar presentations


Ads by Google