Reading and writing files Practical Computing for Biologists Chapter 10 (2nd half) Peter Brooks Jan 13 2012.

Slides:



Advertisements
Similar presentations
String and Lists Dr. Benito Mendoza. 2 Outline What is a string String operations Traversing strings String slices What is a list Traversing a list List.
Advertisements

Introduction to C Programming
Working with JavaScript. 2 Objectives Introducing JavaScript Inserting JavaScript into a Web Page File Writing Output to the Web Page Working with Variables.
XP 1 Working with JavaScript Creating a Programmable Web Page for North Pole Novelties Tutorial 10.
Basic Elements of C++ Chapter 2.
 2004 Prentice Hall, Inc. All rights reserved. Chapter 25 – Perl and CGI (Common Gateway Interface) Outline 25.1 Introduction 25.2 Perl 25.3 String Processing.
Group practice in problem design and problem solving
Shell Scripting Awk (part1) Awk Programming Language standard unix language that is geared for text processing and creating formatted reports but it.
XP Tutorial 14 New Perspectives on HTML, XHTML, and DHTML, Comprehensive 1 Working with Forms and Regular Expressions Validating a Web Form with JavaScript.
Introduction to Programming Prof. Rommel Anthony Palomino Department of Computer Science and Information Technology Spring 2011.
Python Programming Fundamentals
Introduction to Python Lecture 1. CS 484 – Artificial Intelligence2 Big Picture Language Features Python is interpreted Not compiled Object-oriented language.
Computer Programming for Biologists Class 5 Nov 20 st, 2014 Karsten Hokamp
Copyright © 2012 Pearson Education, Inc. Publishing as Pearson Addison-Wesley C H A P T E R 2 Input, Processing, and Output.
1 CSC 221: Introduction to Programming Fall 2012 Functions & Modules  standard modules: math, random  Python documentation, help  user-defined functions,
CS190/295 Programming in Python for Life Sciences: Lecture 3 Instructor: Xiaohui Xie University of California, Irvine.
XP Tutorial 10New Perspectives on Creating Web Pages with HTML, XHTML, and XML 1 Working with JavaScript Creating a Programmable Web Page for North Pole.
Strings The Basics. Strings can refer to a string variable as one variable or as many different components (characters) string values are delimited by.
What does C store? >>A = [1 2 3] >>B = [1 1] >>[C,D]=meshgrid(A,B) c) a) d) b)
Data TypestMyn1 Data Types The type of a variable is not set by the programmer; rather, it is decided at runtime by PHP depending on the context in which.
Working with Forms and Regular Expressions Validating a Web Form with JavaScript.
Time to talk about your class projects!. Shell Scripting Awk (lecture 2)
CSC 110 Using Python [Reading: chapter 1] CSC 110 B 1.
Introducing Python CS 4320, SPRING Lexical Structure Two aspects of Python syntax may be challenging to Java programmers Indenting ◦Indenting is.
XP New Perspectives on XML, 2 nd Edition Tutorial 7 1 TUTORIAL 7 CREATING A COMPUTATIONAL STYLESHEET.
Trinity College Dublin, The University of Dublin GE3M25: Computer Programming for Biologists Python, Class 2 Karsten Hokamp, PhD Genetics TCD, 17/11/2015.
XP Tutorial 7 New Perspectives on JavaScript, Comprehensive 1 Working with Forms and Regular Expressions Validating a Web Form with JavaScript.
Dr. Abdullah Almutairi Spring PHP is a server scripting language, and a powerful tool for making dynamic and interactive Web pages. PHP is a widely-used,
Today… Style, Cont. – Naming Things! Methods and Functions Aside - Python Help System Punctuation Winter 2016CISC101 - Prof. McLeod1.
XP Tutorial 10New Perspectives on HTML, XHTML, and DHTML, Comprehensive 1 Working with JavaScript Creating a Programmable Web Page for North Pole Novelties.
String and Lists Dr. José M. Reyes Álamo. 2 Outline What is a string String operations Traversing strings String slices What is a list Traversing a list.
FILES AND EXCEPTIONS Topics Introduction to File Input and Output Using Loops to Process Files Processing Records Exceptions.
Filters and Utilities. Notes: This is a simple overview of the filtering capability Some of these commands are very powerful ▫Only showing some of the.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 5 & 6 By Ravi Mandal.
CSI 32 Lecture 20 Chapter 8 Input, Output, and Files 8.1 Standard Input and Output (review) 8.2 Formatted Strings 8.3 Working with Files 8.5 Case Studies.
ENGINEERING 1D04 Tutorial 2. What we’re doing today More on Strings String input Strings as lists String indexing Slice Concatenation and Repetition len()
Chapter 13 Debugging Strategies Learning by debugging.
String and Lists Dr. José M. Reyes Álamo.
Reading and writing files
Chapter Topics The Basics of a C++ Program Data Types
Topics Designing a Program Input, Processing, and Output
Topics Introduction to Functions Defining and Calling a Void Function
Introduction to Python
Basic Elements of C++.
Miscellaneous Items Loop control, block labels, unless/until, backwards syntax for “if” statements, split, join, substring, length, logical operators,
Chapter 19 PHP Part II Credits: Parts of the slides are based on slides created by textbook authors, P.J. Deitel and H. M. Deitel by Prentice Hall ©
Arrays and files BIS1523 – Lecture 15.
Functions CIS 40 – Introduction to Programming in Python
Scripts & Functions Scripts and functions are contained in .m-files
Strings Part 1 Taken from notes by Dr. Neil Moore
Intro to PHP & Variables
MATLAB: Structures and File I/O
Basic Elements of C++ Chapter 2.
Variables In programming, we often need to have places to store data. These receptacles are called variables. They are called that because they can change.
Topics Introduction to File Input and Output
Number and String Operations
PHP.
Creating your first C program
String and Lists Dr. José M. Reyes Álamo.
CS190/295 Programming in Python for Life Sciences: Lecture 3
Topics Designing a Program Input, Processing, and Output
Topics Introduction to File Input and Output
Topics Designing a Program Input, Processing, and Output
Introduction to Computer Science
Topics Introduction to File Input and Output
Strings Taken from notes by Dr. Neil Moore & Dr. Debby Keen
Introduction to Computer Science
PYTHON - VARIABLES AND OPERATORS
Presentation transcript:

Reading and writing files Practical Computing for Biologists Chapter 10 (2nd half) Peter Brooks Jan

Chapter 10 topics – illustrated by transforming a text file into a Google Earth.kml file DiveDateLatLonDepthNotes Tiburon Jul N W1190holotype JSL II Sep N W518paratype JSL II Aug N W686Youngbluth (1989) Ventana Mar N W767 Ventana Jun N W934 Ventana Sep N W1001 Tiburon Nov N W1156 Tiburon Mar N W1144 Tiburon Mar N W1126 JSL II Sep N W862Francesc Pages (pers.comm) Flat text file rows and columns (fields) Tiburon 596 Tiburon Jul N W1190 holotype absolute , , JSL II 1411 JSL II Sep N W518 paratype absolute , , -518 (…) ML file Markup Language Each row parsed between series of hierarchical « tags ». XML A lingua franca usable by many programs. Goog Earth: Keyhole ML - kml

Chapter 10 topics – illustrated by transforming a text file into a Google Earth.kml file ● First part Ch 10Jan. 6 file parsing strategies open(FileName, 'r') and read lines with a for loop.strip('\n')remove trailing characters.split()parse line elements into lists [….] and access list elements open(NewFile, 'w') and OutFile.write() ● Last part Ch 10 Jan. 13 Reg expr search using re.search().group() to access re.search() results create custom functions with def my_function: generate XML and KML files multiple line strings bounded by triple quotes If time: raw strings; re.sub for search and replace

Review previous Open latlon_3.py #!/usr/bin/env python (...) LineNumber = 0 # Open the output file for writing -Do this *before* the loop, not inside it OutFileName=InFileName + ".kml" OutFile=open(OutFileName,'w') # You can append instead with 'a' # Loop through each line in the file for Line in InFile: # Skip the header, line # 0 if LineNumber > 0: # Remove the line ending characters Line=Line.strip('\n') ElementList=Line.split('\t') # Use the % operator to generate a string # We can use this for output both to the screen and to a file OutputString = "Depth: %s\tLat: %s\t Lon:%s" % \ (ElementList[4], ElementList[2], ElementList[3]) # Can still print to the screen then write to a file print OutputString # Unlike print statements,.write needs a linefeed OutFile.write(OutputString+"\n") # Index the counter used to keep track of line numbers LineNumber = LineNumber + 1 # After the loop is completed, close the files InFile.close() OutFile.close() Files – 2 variables 1. file name - “string.txt” InFileName = 'Marrus_claudanielis.txt' 2. file “handle” points to file InFile = open(InFileName, 'r') == == = = = Counter variable, header For loop to read each line Strip line returns. Split on \t – make list of fields. Format output and assign output to a variable. == === = Create a string for output file name. Define the output file handle for writing. Write output. to screen to file – add \n Close all files. Run program to see output. 12

Parsing with Regular Expressions in Python Convert Lat/Long fr degrees-minutes to decimal degrees ● Ch 2 and 3:Regex in text editor ● Python: Regex commands are in module “re” import re# import command – follows shebang early in code In interactive mode, >>> dir(re) shows available re tools ● Result = re.search(regexPattern, targetString) ● Extract latitude elements from string using regex See latlon_3.py output “ N”example of ElementList[2] Regex: 1 or more digits, space, 1 or more digits or decimal pts, space, a word character ( i.e. a letter) \d+ [\d\.]+ \w But want each as an independent element: (\d+) ([\d\.]+) (\w) => only elements in (…...) can be retrieved Code lines for program: patternLatLon = '(\d+) ([\d\.]+) (\w)'# the pattern itself is a quoted string Result = re.search(patternLatLon, ElementList[2])

re.search output: an object containing the results of the pattern search and methods to work with results ● Result = re.search(regexPattern, targetString) The “Match.Object” Result has methods invoked by the dot operator. ● Result.group() Result.group(0) is entire string DegreeString = Result.group(1) # this is equivalent to \1 in Text wrangler MinuteString = Result.group(2) Compass = Result.group (3) ● These values will be used in a custom function that converts to decimal degrees. ● Other useful methods are available for the search output: MinPosition = Result.start(2) # the index of the start of match of 2nd group DegMinEnds = Result.end(1,2) # one past the end of match of groups 1 and 2 ● Any alternate method to get degrees, minutes and compass letter? (without using re)

Defining custom functions ● Need to convert both latitude and longitude to decimal degrees. Easy to copy code for latitude, and modify to treat longitude. ● Function calls (subroutines) enhance readability, decrease chances of coding errors, and facilitate debugging. ● DRY – Don't Repeat Yourself Avoid code containing repetitive lines or blocks of lines. Do not copy / paste code to do something similar. When corrections or modifications are needed, they are done only in one place. ● Built-in and custom functions are used in the same way. F unction float(X) operates on a value of a parameter; returns decimal value. A custom function processes parameter(s) and returns result(s). The function is placed before the main block of code and is called from the main block or from within other functions. ● def decimalat(DegString): This line gives the name of the function and parameters. Indented lines following the colon contain the “definition” of the function. ● Functions typically end with a “return” of values, but not always. Always leave the call of the function with a return statement.

Degree-minutes to Decimal Degrees Function def decimalat(DegString): # The call can use other variable names. # This function requires that the re module is loaded # Take a string in the format " N" and return decimal deg. SearchStr= '(\d+) ([\d\.]+) (\w)' Result = re.search(SearchStr, DegString) # Get the captured character groups, as defined by the parenth. # in the regular expression, convert the numbers to floats, and # assign them to variables with meaningful names Degrees = float(Result.group(1)) Minutes = float(Result.group(2)) Compass = Result.group(3).upper() # make sure it is capital too # Calculate the decimal degrees DecimalDegree = Degrees + Minutes/60 # If the compass direction indicates the coordinate is South or # West, make the sign of the coordinate negative. if Compass == 'S' or Compass == 'W': DecimalDegree = -DecimalDegree return DecimalDegree # The output sent back to line of call. # End of the function definition ● Function takes one parameter: DegString ● Strings from search groups are converted to decimal numbers. ● Good habit: do not assume all records are uniform; as group(3) is a string, can use string methods – upper().

Calling a function from the main block of code 1.Simple, descriptive names. 2.New variable names get the value of the function's “return”. Names of sent variables are different from function's parameter name. 3.Use of backstroke character to continue long lines. Gives good readability. 4.WriteOutFile is a boolean, set as True or False earlier in the code. for Line in InFile: if LineNumber > 0: # print line # uncomment for debugging Line=Line.strip('\n') ElementList = Line.split('\t') # Returns a list in this format: # ['Tiburon 596', '19-Jul-03', …....] # print "ElementList:", ElementList # uncomment for debug Dive = ElementList[0] Date = ElementList[1] Depth = ElementList[4] Comment = ElementList[5] LatDegrees = decimalat(ElementList[2]) LonDegrees = decimalat(ElementList[3]) # Create string to 5 decimal places, padded to 10 total char. # (using line continuation character \) OutString = "%s\t%4s\t%10.5f\t%10.5f\t%9s\t%s" % \ (Dive,Depth,LatDegrees,LonDegrees,Date,Comment) print OutString if WriteOutFile: OutFile.write(OutString + '\n') # remember the line feed! LineNumber += 1 # this is outside the if, but inside the for loop # Close the files from latlon_4.py

Converting flat text to XML format ● ML: Markup Language Each data element annotated with opening and closing tags. Element Content Marrus claudanielis -Si , Sets of tags are always nested. ● ML file headers and footers – also use opening and closing tags (Use triple quotes for multiline strings or comments.) HeadString=' ' ' (no closing tag) ' ' ' Footer: # Note that tags are nested. ● Placemark string to build KML records. See latlon_5.py for multiline example. To preserve data not needed by program (Google Earth), include a Description field that has the entire original data line.

A.kml file can be visualized with Google Earth or Google Map

Raw strings s=" c: \ \ " Python will escape the backslash resulting string s will contain only one backstroke value is "c: \" raw string –precede string with letter r s=r" c: \ \ " suppresses Python's character escaping, value is "c : \ \ " Use raw string even if uncertain it is necessary. ===== = = = = = = = Substitutions – Search and Replace re.sub() >>> import re >>> OrigString = "Haeckel, Ernst" # The string you wish to search >>> SubFind = r"( \ w+), ( \ w+)" # The regular expression to search for >>> Result = re. search (SubFind, OrigString)# Perform the search, store the results >>> print Result.groups() # See what you found ('Haeckel', 'Ernst') # The two captured substrings >>> SubReplace = r" \ 2 \ t \ 1" # Set up a string for replacement using raw string style >>> NewString = re.sub(SubFind, SubReplace, OrigString) # Format for re.sub( ) >>> print NewString # See the substitUted string 'Ernst tab Haeckel'