Download presentation
Presentation is loading. Please wait.
1
CSC 4630 Meeting 9 February 14, 2007 Valentine’s Day; Snow Day
2
Last of awk Quick review of scripting languages, and more generally, programming languages –Built-in variables –Variable typing –Implicit control structure of program –Assignment statements and operations –Control structures
3
Next Week and Next Next Week Exam 1: Monday, February 26 Project 2: Wednesday, February 28
4
Last of awk (2) Control structures Arrays Formatted printing Subtleties and intricacies
5
Control Structures if ( ) else can be any expression; true is defined to be non-zero or non-null and can be any group of statements Note the critical parentheses that separate the conditional expression from
6
Control Structures (2) while ( ) Same rules as for if-then-else
7
Control Structures (3) for ( ; ; ) is equivalent to ; while ( ) { ; } initializes the loop variable checks the loop variable for termination changes the value of the loop variable for (k in ) loops over the subscripts of an array but the order of the subscripts is random. Careful: awk allows general subscripting. Strings can be used as subscripts.
8
Control Structures (4) “Go to” structures break when executed within a for or while statement, causes an immediate exit continue when executed within a for or while statement, causes immediate execution of the next iteration next causes the next line (record) of the input file to be read and the sequence of pattern {action} statements executed on it exit causes the program to jump to the END pattern, execute it, and stop
9
Practice Time We’ll use pair programming –Pair up by twos –One person is in control of the keyboard –Sketch the features of the program –Test as you go
10
awk Practice: Example 1 Input: A file containing syntactically correct North American telephone numbers in the form XXX-XXX-XXXX Output: A file containing the numbers from the input file formatted as international numbers, namely +1.XXX.XXX.XXXX Test file: Create your own
11
awk Practice: Example 2 Input: A file, each line of which supposedly contains a North American style telephone number Output: The input file cleaned of bad numbers, inappropriate lines, and empty lines. Each correct number formatted as XXX-XXX-XXXX Test Input: /mnt/a/beck/samples/phonenumbers Notes: Program must handle arbitrary input files Start simple, add features as you investigate
12
awk Practice: Example 3 Input: A file in the same form as for Example 2. Output: The input file cleaned and correct numbers formatted in international format, +1.xxx.xxx.xxxx
13
awk Practice: Example 4 The website flightaware.com gives the departure and arrival history of commercial airline flights, among other things. You can easily extract the history to a text file by cutting and pasting. But then the file needs to be cleaned and reformatted to be useful. Input: A flight history file from flightaware.com, e.g. /mnt/a/beck/samples/flight1931
14
Example 4 (2) Output: Data from the input file involving one leg of the flight (use PHL to ATL), one line per day, fields separated by ::. Fields are date, departure time, arrival time, elapsed time. Include a header line that contains the flight number (1931 for the sample), origin (PHL), and destination (ATL). Include a second header line that labels the data columns.
15
awk Practice: Example 5 Computations involving flight data. Input: Cleaned flight data file (the output file from Example 4) Output: Earliest and latest departure, earliest and latest arrival, shortest elapsed time, longest elapsed time, average elapsed time. Notes: Programs from Examples 4 and 5 should work with any set of flight data.
16
awk Practice: Example 6 DNA to protein translation –In the computational biology world it is well- known that each triple of bases along a DNA segment translates to one of the 20 amino acids, which are the building blocks for proteins. Input: A DNA sequence Output: The corresponding amino acid sequence
17
Project 2 Due, Wednesday, February 28 Part 1 –Implement an improved version of mobilex entirely in awk. The program should take a file containing a chapter of the text and return the lexicon with frequency counts sorted in decreasing order of frequency.
18
Project 2 (2) –Notes on Part 1 Include one title line giving chapter number and title All trailing punctuation should be removed All initial capitalization should be removed No numbers in lexicon Compound words should be retained –Desirable features Remove contractions and spell them out Remove possessive constructions. The ‘s should not be counted as a different word. Retain capitalized proper names
19
Project 2 (3) Part 2 –Add summary statistics to the mobylex program that give Total number of words in chapter Number of different words in chapter Average word length (number of characters) (taken over distinct words) Maximum word length
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.