LING 408/508: Computational Techniques for Linguists

Slides:



Advertisements
Similar presentations
CST8177 awk. The awk program is not named after the sea-bird (that's auk), nor is it a cry from a parrot (awwwk!). It's the initials of the authors, Aho,
Advertisements

XP 1 Working with JavaScript Creating a Programmable Web Page for North Pole Novelties Tutorial 10.
Using Unix Shell Scripts to Manage Large Data
Understanding the Basics of Computational Informatics Summer School, Hungary, Szeged Methos L. Müller.
Shell Scripting Awk (part1) Awk Programming Language standard unix language that is geared for text processing and creating formatted reports but it.
Lecture Note 3: ASP Syntax.  ASP Syntax  ASP Syntax ASP Code is Browser-Independent. You cannot view the ASP source code by selecting "View source"
LING/C SC/PSYC 438/538 Lecture 5 Sandiway Fong. Today’s Topics File input/output – open, References Perl modules Homework 2: due next Monday by midnight.
Pattern matching with regular expressions A common file processing requirement is to match strings within the file to a standard form, e.g. address.
Agenda Sed Utility - Advanced –Using Script-files / Example Awk Utility - Advanced –Using Script-files –Math calculations / Operators / Functions –Floating.
CIS 218 Advanced UNIX1 CIS 218 – Advanced UNIX (g)awk.
CS 105 Perl: Course Introduction Nathan Clement 13 May 2014.
CS 403: Programming Languages Fall 2004 Department of Computer Science University of Alabama Joel Jones.
Computer Science 101 Introduction to Programming.
LING/C SC/PSYC 438/538 Lecture 2 Sandiway Fong. Today’s Topics Did you read Chapter 1 of JM? – Short Homework 2 (submit by midnight Friday) Today is Perl.
Perl and Regular Expressions Regular Expressions are available as part of the programming languages Java, JScript, Visual Basic and VBScript, JavaScript,
Introduction to Awk Awk is a convenient and expressive programming language that can be applied to a wide variety of computing and data manipulation tasks.
P51UST: Unix and Software Tools Unix and Software Tools (P51UST) Exam Revision Ruibin Bai (Room AB326) Division of Computer Science The University of Nottingham.
Awk Dr. Tran, Van Hoai Faculty of Computer Science and Engineering HCMC Uni. of Technology
Sed, awk, & perl CS 2204 Class meeting 13 *Notes by Mir Farooq Ali and other members of the CS faculty at Virginia Tech. Copyright 2003.
Introduction to Unix – CS 21 Lecture 12. Lecture Overview A few more bash programming tricks The here document Trapping signals in bash cut and tr sed.
A talk about AWK Don Newcomb 18 Jan What is AWK? AWK is an interpreted computer language It is primarily used for text processing and data formatting.
Revision Lecture Mauro Jaskelioff. AWK Program Structure AWK programs consists of patterns and procedures Pattern_1 { Procedure_1} Pattern_2 { Procedure_2}
BY A Mikati & M Shaito Awk Utility n Introduction n Some basics n Some samples n Patterns & Actions Regular Expressions n Boolean n start /end n.
11/25/2015Slide 1 Scripts are short programs that repeat sequences of SPSS commands. SPSS includes a computer language called Sax Basic for the creation.
Searching and Sorting. Why Use Data Files? There are many cases where the input to the program may come from a data file.Using data files in your programs.
Chapter Twelve sed, awk & perl1 System Programming sed, awk & perl.
LING 408/508: Programming for Linguists Lecture 8 September 23 rd.
LING 408/508: Programming for Linguists Lecture 18 November 2 nd.
LING 408/508: Programming for Linguists Lecture 14 October 19 th.
Awk- An Advanced Filter by Prof. Shylaja S S Head of the Dept. Dept. of Information Science & Engineering, P.E.S Institute of Technology, Bangalore
1 P51UST: Unix and Software Tools Unix and Software Tools (P51UST) Awk Programming Ruibin Bai (Room AB326) Division of Computer Science The University.
LING 408/508: Programming for Linguists Lecture 20 November 16 th.
Alon Efrat Computer Science Department University of Arizona Unix Tools.
LING 408/508: Programming for Linguists Online Lecture 6 September 14 th.
Sed. Class Issues vSphere Issues – root only until lab 3.
1 Lecture 10 Introduction to AWK COP 3344 Introduction to UNIX.
Dale Roberts Introduction to Java - Input, Program Control and Instantiation Dale Roberts, Lecturer Computer Science, IUPUI
By Dr P.Padmanabham Professor (CSE)&Director Bharat Institute of Engineering &Technology Hyderabad Mobile
Awk- An Advanced Filter by Prof. Shylaja S S Head of the Dept. Dept. of Information Science & Engineering, P.E.S Institute of Technology, Bangalore
Programming Languages Meeting 12 November 18/19, 2014.
Awk 2 – more awk. AWK INVOCATION AND OPERATION the "-F" option allows changing Awk's "field separator" character. Awk regards each line of input data.
CSE 303 Concepts and Tools for Software Development Richard C. Davis UW CSE – 10/9/2006 Lecture 6 – String Processing.
If/Else Statements.
Arun Vishwanathan Nevis Networks Pvt. Ltd.
CSE 374 Programming Concepts & Tools
CSCI 203: Introduction to Computer Science I
* Lecture # 7 Instructor: Rida Noor Department of Computer Science
CS 403: Programming Languages
John Carelli, Instructor Kutztown University
LING 388: Computers and Language
LING 408/508: Computational Techniques for Linguists
LING 408/508: Computational Techniques for Linguists
LING/C SC/PSYC 438/538 Lecture 10 Sandiway Fong.
LING 388: Computers and Language
Hello World! Syntax.
LING 408/508: Computational Techniques for Linguists
LING 408/508: Computational Techniques for Linguists
Selection Statements.
Homework Any Questions?.
LING 388: Computers and Language
LING/C SC/PSYC 438/538 Lecture 13 Sandiway Fong.
LING/C SC/PSYC 438/538 Lecture 11 Sandiway Fong.
CSE 303 Concepts and Tools for Software Development
Awk.
Lab 8: Regular Expressions
Introduction to Bash Programming, part 3
CIS 136 Building Mobile Apps
LING 388: Computers and Language
LING/C SC/PSYC 438/538 Lecture 3 Sandiway Fong.
LING/C SC/PSYC 438/538 Lecture 12 Sandiway Fong.
Presentation transcript:

LING 408/508: Computational Techniques for Linguists Lecture 9

Adminstrivia Out of town: no class on October 23rd Quick Homework 4 on awk out today

awk

awk Why awk? command line for extracting textual data a mini-programming language can be very fast … https://brenocon.com/blog/2 009/09/dont-mawk-awk-the- fastest-and-most-elegant-big- data-munging-language/

awk Powerful pattern-matching is at the heart of awk: https://www.gnu.org/software/gawk/manual/html_node/Regexp.html

awk Manpage: http://manpages.ubuntu.com/manpages/bionic/en/man1/awk.1plan9.html Regular expressions (regex): http://manpages.ubuntu.com/manpages/bionic/en/man7/regex.7.html

awk Manpage: http://manpages.ubuntu.com/manpages/bionic/en/man1/awk.1plan9.html

awk awk is a very useful command man awk for examples it allows you process files line by line and extract matching information Words on a line: $1 is word #1 in a line $2 is word #2 in a line (separated from #1 by space(s)) etc. Some simple Awk code: print $3 means print word #3 in a line vname=0 set variable vname to 0 (note: no $) (arithmetic expressions ok on the right side of the =, e.g. vname=vname+2) if (…) { … } else { … } conditional: e.g. $1 >= 3 ; separates statements Syntax: awk 'BEGIN { ..1.. } { ..2.. } END { ..3.. }' data.txt means execute awk code block { ..1.. } at the beginning then process each line of data.txt using awk code block { ..2.. } then at the end execute awk code block { ..3.. } BEGIN { ..1.. } is optional END { ..3.. } is also optional man awk for examples

awk Manpage: http://manpages.ubuntu.com/manpages/bionic/en/man1/awk.1plan9.html

awk awk is locale sensitive: Ubuntu 18.04 LTS supports UTF-8 by default

awk Example: Top 30 surnames and percentages in the Canary Islands according to Wikipedia https://en.wikipedia.org/wiki/List_of_the_most_common_surnames_in_Europe Filename: surnames.txt (3 fields: rank, name, and percentage of population) Run the following awk code to figure out what the code does: awk '{ print $2; }' surnames.txt awk '{ if ($3>=1) {print $2;} }' surnames.txt Note accent marks: UTF-8

Quick Homework 4 Write awk code to: print a table of and calculate the total percentage of population for the top 10, 20 and 30 surnames read and print out the table with table headings aligned with the field values (use printf)

Quick Homework 4 for printf documentation, read: https://www.gnu.org/software/gawk/manual/html_node/Printf.html #Printf

Quick Homework 4 Due Wednesday by midnight Usual rules: email to me one PDF file subject: Homework 4 408/508 Your name