PC204 Lecture 5 Conrad Huang Genentech Hall, N453A 476-0415.

Slides:



Advertisements
Similar presentations
Introduction to Python
Advertisements

Liang, Introduction to Java Programming, Ninth Edition, (c) 2013 Pearson Education, Inc. All rights reserved. 1 Chapter 9 Strings.
1 Strings and Text I/O. 2 Motivations Often you encounter the problems that involve string processing and file input and output. Suppose you need to write.
Head First Python Chapter 4 Persistence: Saving Data to Files SNU IDB Lab.
1 Drawing phylogenies We've seen several tree data structures but we still can't draw a tree  In today's first exercise we write a drawtree module for.
 2008 Pearson Education, Inc. All rights reserved JavaScript: Introduction to Scripting.
1 Drawing phylogenies We've seen several tree data structures but we still can't draw a tree  In the tree drawing exercise we write a drawtree function.
 2002 Prentice Hall. All rights reserved. 1 Intro: Java/Python Differences JavaPython Compiled: javac MyClass.java java MyClass Interpreted: python MyProgram.py.
 2002 Prentice Hall. All rights reserved. 1 Chapter 2 – Introduction to Python Programming Outline 2.1 Introduction 2.2 First Program in Python: Printing.
C++ for Engineers and Scientists Third Edition
Fundamentals of Python: From First Programs Through Data Structures
Chapter 4 MATLAB Programming Combining Loops and Logic Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
“Everything Else”. Find all substrings We’ve learned how to find the first location of a string in another string with find. What about finding all matches?
November 15, 2005ICP: Chapter 7: Files and Exceptions 1 Introduction to Computer Programming Chapter 7: Files and Exceptions Michael Scherger Department.
Data Structures in Python By: Christopher Todd. Lists in Python A list is a group of comma-separated values between square brackets. A list is a group.
Copyright © 2012 Pearson Education, Inc. Publishing as Pearson Addison-Wesley C H A P T E R 6 Value- Returning Functions and Modules.
Copyright © 2012 Pearson Education, Inc. Publishing as Pearson Addison-Wesley C H A P T E R 2 Input, Processing, and Output.
Guide to Programming with Python Chapter Seven (Part 1) Files and Exceptions: The Trivia Challenge Game.
1 Chapter 4: Selection Structures. In this chapter, you will learn about: – Selection criteria – The if-else statement – Nested if statements – The switch.
Programming With C.
Input, Output, and Processing
CMSC 104, Version 9/011 Introduction to C Topics Compilation Using the gcc Compiler The Anatomy of a C Program 104 C Programming Standards and Indentation.
Built-in Data Structures in Python An Introduction.
Copyright © 2012 Pearson Education, Inc. Publishing as Pearson Addison-Wesley C H A P T E R 2 Input, Processing, and Output.
With Python.  One of the most useful abilities of programming is the ability to manipulate files.  Python’s operations for file management are relatively.
Chapter 0 Getting Started. Objectives Understand the basic structure of a C++ program including: – Comments – Preprocessor instructions – Main function.
Data TypestMyn1 Data Types The type of a variable is not set by the programmer; rather, it is decided at runtime by PHP depending on the context in which.
Chapter 3 MATLAB Fundamentals Introduction to MATLAB Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Guide to Programming with Python Chapter Seven Files and Exceptions: The Trivia Challenge Game.
16. Python Files I/O Printing to the Screen: The simplest way to produce output is using the print statement where you can pass zero or more expressions,
 2008 Pearson Education, Inc. All rights reserved JavaScript: Introduction to Scripting.
Files Tutor: You will need ….
CSC 1010 Programming for All Lecture 3 Useful Python Elements for Designing Programs Some material based on material from Marty Stepp, Instructor, University.
Python Let’s get started!.
11 Writing Text Session 5.1. Session Overview  Show how fonts are managed in computers  Discover the difference between bitmap fonts and vector fonts.
Head First Python: Ch 3. Files and Exceptions: Dealing with Errors Aug 26, 2013 Kyung-Bin Lim.
Python Files and Lists. Files  Chapter 9 actually introduces you to opening up files for reading  Chapter 14 has more on file I/O  Python can read.
Winter 2016CISC101 - Prof. McLeod1 CISC101 Reminders Quiz 3 next week. See next slide. Both versions of assignment 3 are posted. Due today.
Quiz 3 Topics Functions – using and writing. Lists: –operators used with lists. –keywords used with lists. –BIF’s used with lists. –list methods. Loops.
L071 Introduction to C Topics Compilation Using the gcc Compiler The Anatomy of a C Program Reading Sections
Python: File Directories What is a directory? A hierarchical file system that contains folders and files. Directory (root folder) Sub-directory (folder.
Windows Programming Lecture 06. Data Types Classification Data types are classified in two categories that is, – those data types which stores decimal.
Linux Administration Working with the BASH Shell.
FILES AND EXCEPTIONS Topics Introduction to File Input and Output Using Loops to Process Files Processing Records Exceptions.
C++ for Engineers and Scientists Second Edition Chapter 4 Selection Structures.
Development Environment
Topics Designing a Program Input, Processing, and Output
Python Let’s get started!.
Containers and Lists CIS 40 – Introduction to Programming in Python
Variables, Expressions, and IO
IPC144 Introduction to Programming Using C Week 1 – Lesson 2
Introduction to Python
I/O in C Lecture 6 Winter Quarter Engineering H192 Winter 2005
File Handling Programming Guides.
Topics Introduction to File Input and Output
Chapter 7 Files and Exceptions
CISC101 Reminders Assn 3 due tomorrow, 7pm.
Iteration: Beyond the Basic PERFORM
File I/O in C Lecture 7 Narrator: Lecture 7: File I/O in C.
Computing in COBOL: The Arithmetic Verbs and Intrinsic Functions
Topics Introduction to Value-returning Functions: Generating Random Numbers Writing Your Own Value-Returning Functions The math Module Storing Functions.
Topics Designing a Program Input, Processing, and Output
Topics Introduction to File Input and Output
Topics Designing a Program Input, Processing, and Output
CISC101 Reminders Assignment 3 due next Friday. Winter 2019
Introduction to Computer Science
Topics Introduction to File Input and Output
CISC101 Reminders Assignment 3 due today.
Introduction to Computer Science
Presentation transcript:

PC204 Lecture 5 Conrad Huang Genentech Hall, N453A

Topics Homework review Case study code Implementation of Markov analysis Writing files Writing modules

Homework Review 4.1 – Check for duplicates in a list 4.2 – Puzzler: longest reducible word

Case Study Code The “data structure selection” case study is about trade-offs – How fast will this run? – How much memory will it use? – How hard is it to implement? The case study code uses – design patterns Reading the contents of a file Sorting items by associated data – optional arguments histogram.py

Markov Analysis Case study also mentions using Markov analysis for generating random (nonsense) sentences using statistics from some text – When generating a sentence, if we already generated some words, Markov analysis tells us what words are (somewhat) likely to follow markov.py – This implementation does not take into account ends of sentences or multi-word phrases or …

Writing Files Python provides many ways of writing data to files How will your data be used? – For reading by humans? – For reading by computers? – Both?

Human-readable Data Files Text files, usually formatted with words – Usually, output to files are more strictly structured than output to the screen, so the print statement may not be appropriate since it adds white space and line terminators at its discretion – The string formatting operator provides greater control over how strings are constructed col1 = 10 col2 = 20 output = “%d\t%d” % (col1, col2)

Format Operator “%” is the format operator when operand on the left is a string, aka “format string” – format string may have zero or more “format sequences” which are introduced by “%” “%d\t%d” is the format string in example “%d” is the format sequence that specifies what part of the format string should be replaced with an integer Other format sequences include “%s” (string), “%f” (floating point), “%g” (compact floating point), “%x” (base-16 integer), …

Format Operator (cont.) The operand to the right of the format operator is either a tuple of values or a single value – If there is only one format sequence in the format string, then a single value may be used in place a tuple – Each value in the right operand replaces a corresponding format sequence in the format string – If the number of values does not match the number of format sequences, or a value type does not match the corresponding format sequence type, an exception will be raised

Format Operator (cont.) Here are some examples of format exceptions Exceptions are not always errors (more on this later) >>> data = "this is not an integer" >>> "%d" % data Traceback (most recent call last): File " ", line 1, in ? TypeError: int argument required >>> "%d %d" % (1, 2, 3) Traceback (most recent call last): File " ", line 1, in ? TypeError: not all arguments converted during string formatting

Opening Files for Writing Files may be opened for reading, writing or appending – Optional second argument to “open” function specified the “mode”, which may be “r”ead, “w”rite, or “a”ppend. “r” is the default. Once open for writing or appending, there are two ways to send data to the file: – via the “write” method of the file – using the “print” statement

Writing to Files The “write” method takes a single string argument and sends the data to the file. – Note that we need to explicitly include “\n” in the data string because “write” is very simpleminded The “print” statement may be used to send data to a file instead of the screen using the “>>” operator – Note that we do not need to include “\n” in the print statement because “print” automatically terminates the output line f = open(‘output.txt’, ‘w’) f.write(“Hello world\n”) print >> f, “Hello world” f.close()

Closing Files The last operation on a file should be “close”, which tells Python that there will be no more output to the file – This allows Python to release any data associated with the open file – Most operating systems have a limit on the number of open files so it is good practice to clean up when files are no longer in use f = open(‘output.txt’, ‘w’) f.write(“Hello world\n”) print >> f, “Hello world” f.close()

with statement The with statement may be used with file objects (as well as many other types) for automatic disposal of resources You still need to handle errors that occur in the “open” call, eg file does not exist You do not need to worry about closing the file once it has been opened with open(‘output.txt’, ‘w’) as f: f.write(“Hello world\n”) print >> f, “Hello world”

Computer-readable Data Files Are your data files meant for “internal” use, ie only by your code? – There are simple solutions for reading and writing data from and to files if you assume that no one else will try to read your data files (pickle, anydbm, shelve, …) – There are more complex solutions when accommodating others (tab-separated values, comma-separated values, extensible markup language [XML], …)

shelveshelve Module Sometimes, you want to split a computation process into multiple steps – Preprocess once and reuse results multiple times – Checkpoint data in long calculations The “shelve” module lets you treat a file like a dictionary, except fetching and setting values actually read and write from the file – Saving and restoring data looks just like assignment operations – Example after next little detour

File Names and Paths Python provides functions for manipulating file names and paths – Python abstraction for file system is that files are stored in folders (aka directories) – Folders may also be stored in folders – The name of a file can be uniquely specified by joining a series of folder names followed by the file name itself, eg /Users/conrad/.cshrc The joined sequence of names is called the “path” Note that Python accepts “/” as the joining character, even on Windows which actually uses “\”

Files Names and Paths (cont.) The “os” and “os.path” modules provide functions for querying and manipulating file names and paths – “os.getcwd” – get current working directory – “os.listdir” – list names of items in a folder – “os.path.exists” – check if file or folder exists – “os.path.isdir” – check if name is a folder – “os.path.join” – combine names into path

Files Names and Paths (cont.) Examples of using file and path functions >>> import os >>> cwd = os.getcwd() >>> print cwd /var/tmp/conrad >>> os.listdir(cwd) ['x', 'chimera-build'] >>> cb = os.path.join(cwd, "chimera-build") >>> print cb /var/tmp/conrad/chimera-build >>> os.path.isdir(cb) True >>> os.listdir(cb) ['build', 'foreign', 'install']

Example using shelve In the Markov analysis example above, we built the Markov data structures and then used them to generate (nonsensical) sentences Suppose we want to generate a sentence once a day from an arbitrary text, but do not want to rebuild the data structures each time (perhaps it takes too long) How do we do this? – Split Markov analysis and sentence generation into two separate programs – Use shelve to store data in analysis, and restore data in generation – markov1_prep.py markov1_prep.py – markov1_use.py markov1_use.py

Interchangeable Data Files Saving and restoring data using shelve is very convenient, but not very useful for collaborators unless they also use your code Common interchange formats include tabular format with comma- or tab-separated values (CSV/TSV), and Extensible Markup Language (XML) – CSV/TSV files are easy to parse and (marginally) human readable – XML files are more flexible but (generally) not human friendly

Example using TSV Format The Markov example may be rewritten to use TSV instead of shelve – markov2_prep.py markov2_prep.py – markov2_use.py markov2_use.py Note that if we were to change the representations for the Markov data, we would need to modify both source code files to read and write the new representations For the shelve version, we would not need to modify the saving/restoring part of the code

Debugging Reading – Read your code critically, just as you do when you edit a paper Running – Gather information about bugs by adding print statements – Do not debug by “random walk” – Do not make a dozen changes and hope that one works Ruminating – We do not do enough of this! Retreating – The last refuge, but do not be too quick to take it – After all, if this version of the code is your best effort, what makes you think the next version will work better?

Homework 5.1 – use os.walk to count files in a directory tree 5.2 – retrieve data from RCSB web service