Using a Simple Python Script to Download Data Rob Letzler Goldman School of Public Policy July 2005.

Slides:



Advertisements
Similar presentations
Intro to Scala Lists. Scala Lists are always immutable. This means that a list in Scala, once created, will remain the same.
Advertisements

Chapter 7 Introduction to Procedures. So far, all programs written in such way that all subtasks are integrated in one single large program. There is.
CSCI 6962: Server-side Design and Programming Input Validation and Error Handling.
Form Handling, Validation and Functions. Form Handling Forms are a graphical user interfaces (GUIs) that enables the interaction between users and servers.
Python quick start guide
EPSII 59:006 Spring Topics Using TextPad If Statements Relational Operators Nested If Statements Else and Elseif Clauses Logical Functions For Loops.
What is RobotC?!?! Team 2425 Hydra. Overview What is RobotC What is RobotC used for What you need to program a robot How a robot program works Framework.
Chapter Seven Advanced Shell Programming. 2 Lesson A Developing a Fully Featured Program.
PHP Tutorials 02 Olarik Surinta Management Information System Faculty of Informatics.
WEEK EXCEPTION HANDLING. Syntax Errors Syntax errors, also known as parsing errors, are perhaps the most common kind of complaint you get while.
Introduction to Python
1 PHP and MySQL. 2 Topics  Querying Data with PHP  User-Driven Querying  Writing Data with PHP and MySQL PHP and MySQL.
November 15, 2005ICP: Chapter 7: Files and Exceptions 1 Introduction to Computer Programming Chapter 7: Files and Exceptions Michael Scherger Department.
Introduction to Python Basics of the Language. Install Python Find the most recent distribution for your computer at:
ICAPRG301A Week 4Buggy Programming ICAPRG301A Apply introductory programming techniques Program Bugs US Navy Admiral Grace Hopper is often credited with.
17. Python Exceptions Handling Python provides two very important features to handle any unexpected error in your Python programs and to add debugging.
Coupling and Cohesion Pfleeger, S., Software Engineering Theory and Practice. Prentice Hall, 2001.
Guide to Programming with Python Chapter Seven (Part 1) Files and Exceptions: The Trivia Challenge Game.
Tutorial 8 Programming with ActionScript 3.0. XP Objectives Review the basics of ActionScript programming Compare ActionScript 2.0 and ActionScript 3.0.
Hans-Peter Plag November 6, 2014 Session 4 (Programming Languages) (Data Types and Variables) Expressions and Operators Flow Control.
Hello.java Program Output 1 public class Hello { 2 public static void main( String [] args ) 3 { 4 System.out.println( “Hello!" ); 5 } // end method main.
If statements while loop for loop
Basic & Advanced Reporting in TIMSNT ** Part Two **
Chapter 7 File I/O 1. File, Record & Field 2 The file is just a chunk of disk space set aside for data and given a name. The computer has no idea what.
Cohesion and Coupling CS 4311
Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley STARTING OUT WITH Python Python First Edition by Tony Gaddis Chapter 7 Files.
Making Good Code AKA: So, You Wrote Some Code. Now What? Ray Haggerty July 23, 2015.
Just a Little PHP Programming PHP on the Server. Common Programming Language Features Comments Data Types Variable Declarations Expressions Flow of Control.
Looping and Counting Lecture 3 Hartmut Kaiser
CMP-MX21: Lecture 4 Selections Steve Hordley. Overview 1. The if-else selection in JAVA 2. More useful JAVA operators 4. Other selection constructs in.
C463 / B551 Artificial Intelligence Dana Vrajitoru Python.
Introducing Python CS 4320, SPRING Lexical Structure Two aspects of Python syntax may be challenging to Java programmers Indenting ◦Indenting is.
XP Tutorial 8 Adding Interactivity with ActionScript.
I Power Higher Computing Software Development High Level Language Constructs.
1 CS161 Introduction to Computer Science Topic #9.
Perl Tutorial. Why PERL ??? Practical extraction and report language Similar to shell script but lot easier and more powerful Easy availablity All details.
Guide to Programming with Python Chapter Seven Files and Exceptions: The Trivia Challenge Game.
Files Tutor: You will need ….
Introduction to Python Dr. José M. Reyes Álamo. 2 Three Rules of Programming Rule 1: Think before you program Rule 2: A program is a human-readable set.
Introduction to Perl. What is Perl Perl is an interpreted language. This means you run it through an interpreter, not a compiler. Similar to shell script.
Just a Little PHP Programming PHP on the Server. Common Programming Language Features Comments Data Types Variable Declarations Expressions Flow of Control.
FILES. open() The open() function takes a filename and path as input and returns a file object. file object = open(file_name [, access_mode][, buffering])
PHP Form Processing * referenced from
CMSC 104, Section 301, Fall Lecture 18, 11/11/02 Functions, Part 1 of 3 Topics Using Predefined Functions Programmer-Defined Functions Using Input.
Today… Modularity, or Writing Functions. Winter 2016CISC101 - Prof. McLeod1.
Quiz 3 Topics Functions – using and writing. Lists: –operators used with lists. –keywords used with lists. –BIF’s used with lists. –list methods. Loops.
COMPUTER PROGRAMMING Year 9 – lesson 1. Objective and Outcome Teaching Objective We are going to look at how to construct a computer program. We will.
EXCEPTIONS. Catching exceptions Whenever a runtime error occurs, it create an exception object. The program stops running at this point and Python prints.
Editing and Debugging Mumps with VistA and the Eclipse IDE Joel L. Ivey, Ph.D. Dept. of Veteran Affairs OI&T, Veterans Health IT Infrastructure & Security.
Control Structure  What is control Structure?  Types of Controls  Use the control structure in VBScript.  Example Summery.
Linux Administration Working with the BASH Shell.
FILES AND EXCEPTIONS Topics Introduction to File Input and Output Using Loops to Process Files Processing Records Exceptions.
Coupling and Cohesion Schach, S, R. Object-Oriented and Classical Software Engineering. McGraw-Hill, 2002.
Coupling and Cohesion Pfleeger, S., Software Engineering Theory and Practice. Prentice Hall, 2001.
Development Environment
Winter 2009 Tutorial #6 Arrays Part 2, Structures, Debugger
CS1022 Computer Programming & Principles
Introduction to Python
System Programming and administration
Scripts & Functions Scripts and functions are contained in .m-files
Microsoft Access Illustrated
User-Defined Functions
MATLAB: Structures and File I/O
Topics Introduction to File Input and Output
Chapter 7 Files and Exceptions
Coding Concepts (Basics)
Fundamentals of Python: First Programs
CISC101 Reminders Assignment 3 due next Friday. Winter 2019
Java Programming Language
Topics Introduction to File Input and Output
Presentation transcript:

Using a Simple Python Script to Download Data Rob Letzler Goldman School of Public Policy July 2005

Overview Explain the problem Talk about the solution strategy Then walk through the code line by line; and explain the tools and ideas in the solution

What’s not here that we might want to discuss in the future High speed numerical Python: a slow language; with fast libraries Writing your own objects good program structure Functional programming: map, filter, lambda, and reduce commands. Good short overview at: (Stata generate / replace commands are roughly map; and Stata drop if ~X is roughly filter)

The Challenge Download > 1000 daily and monthly electricity market database files from the California Independent System Operator Website.

Overview Explain the problem Talk about the solution strategy Then walk through the code line by line; and explain the tools and ideas in the solution

Solution Strategy Research the location (URL) of each database Write Python Code that executes once for each month t from the sample period Generate strings for the locations of the webpage and local disk file for month t Open the web page Create a local disk file Read the web page and save it in the local disk file

Disclaimer This is my first Python program. I fear that I’ve reinvented a lot of wheels. This program uses lots of basic Python functions rather than tapping into libraries and extensions in ways that would create a shorter program. This program structure – which has a main loop that is not in a function or object -- is fine for a simple program; but is dangerous for large, complex programs

Overview Explain the problem Talk about the solution strategy Then walk through the code line by line; and explain the tools and ideas in the solution

Python Syntax We’ll Need Loops Conditional Statements Functions File / web reading and writing Exception Handling

For Loops in Python Python loops over the elements of a list; not by updating an integer. Python requires a colon (:) between a conditional / loop / function declaration and the block of additional statements it affects For item in list: Do stuff Other programming languages would approach this as: For integer i = start to stop {Do stuff} Python’s range(start,stop+1) is identical to other languages’ start to stop

Solution Strategy Research the database’s location (URL) Write Python Code that executes once for each month t from the sample period Generate strings for the webpage and local disk file for month t open the web page create a local disk file Read the web page and save it in the local file

The Main Loop Part I month_length = [31,28,31,30,31,30,31,31,30,31,30,31] #list of number of days in each month for year in range(2001,2005): #years 2001 to notice ranges include the #first num, but are strictly less than the last num for month in range(1,13): if ((year in range(2002,2004)) or (year == 2001 and month > 3) or (year == 2004 and month < 10)): #only begins executing the main block if we are in #the sample period Red highlights: –Logical operators are words and and or; not & and | –To test whether a and b are the same use a == b with two equal signs; to put b in a use a=b with one equal sign.

Functions Functions are groups of statements other parts of the code can call def FunctionName (parameters): statements return optional return value Functions may return a value. If the function returns a value, you can call it in an assignment statement, like result=FunctionName(inputs) Functions and objects are crucial tools to design large programs that are modular, flexible, and reliable. See McConnell, Code Complete for more detail.

Python passes scalar parameters by value. It passes more complex things as references to their memory locations. Different functions work on different copies of the values / references which can protect values from being accidentally changed. If you create a new object in the function, the original will be unaffected. list_var = list_var+[“C”, “D”] If you modify the original object without changing its memory address, the original will be changed: list_var.extend(["C", "D"]) or list_var[1]=“C” Any variable that is defined outside of a function or object is global and can get changed by any part of the code. Avoid using global variables because it can be difficult to find and fix errors involving changes in them.

Passing by Value and Reference notice that test_list has changed to ['A', 'B', 'C', 'D'] but that test_integer is still 5 but the copy we returned is 5000 def python_copies_numbers_but_shares_lists_and_objects(list_input, integer_input): integer_input = integer_input*1000 list_input.extend(["C","D"]) return integer_input def main (): test_list = ["A","B"] test_integer = 5 updated_integer = python_copies_numbers_but_shares_lists_and_objects(test_list, test_integer) print "notice that test_list has changed to " print test_list print "but that test_integer is still " + fpformat.fix(test_integer,0) + " but the copy we returned has changed to " + fpformat.fix(updated_integer,0) return main()

Solution Strategy Research the location (URL) of each database Write Python Code that executes once for each month t from the sample period Generate strings for the webpage and local disk file for month t open the web page create a local disk file Read the web page and saves it in the local file

Main loop then Calls a Functions month_string = make_two_dgt_string(month) import fpformat # fpformat formats floating point numbers into strings def make_two_dgt_string(n): #takes a number and adds a leading zero if the number is less than 10 #assumes that the input number is < 100 if n > 9: #check whether we need to pad the date with a leading zero n_string = fpformat.fix(n,0) #if we don't need to pad, convert the number directly to a string else: #pad low numbers with a leading zero n_string = "0"+fpformat.fix(n,0) #otherwise convert to string and add a leading zero to the string. return n_string #either way, return the results.

Main Loop then creates strings and calls more functions #now, for each month in the sample, request a price data file #generate caiso URL load_url = " ear,0)+month_string… #generate file name for my hard disk load_file_name = "caiso_price_"+fpformat.fix(year,0)+"- "+month_string+"-"+"1- "+fpformat.fix(end_date,0)+".zip" #download and save the requested files. get_save_file(load_url,load_file_name) #continue looping until we go through every month in the sample...

Solution Strategy We have: Researched the location (URL) of each database Written Python Code that executes once for each month t from the sample period Generated strings for the webpage and local disk file for month t We’ve called but not seen the code that: opens the web page creates a local disk file Reads the web page and saves it in the local file

Connect to the webpage def get_save_file(url, file_name): #this function gets the file specified in URL from the web and then saves it in #location FILE_NAME #Designates the location in which to save the file path = "C:\\rjl\\ca_amp\\download\\price\\"+file_name try: web_data = urllib.urlopen(url) #attempt to create a shortcut / handle to the desired web page / web file except IOError, msg: print "didn't open URL %s: %s", url, str(msg)

Creating and Using Objects Many python libraries are object oriented An object bundles a kind of data with “member functions” for manipulating that data. Steps: 1) create (“instantiate”) objects 2) use their functions. objectName = libName.constructor(initial values) objectName.doSomething(parameters)

Exceptions try/except sequences handle routine problems like file not found errors ("exceptions") gracefully rather than ending the whole program. try: –SomethingThatMightNotWork #this will either work or it fail and generate an exception message of failureType except failureType1 –{If we get failure type 1, do this and continue from here} Dividing by zero or inverting a singular matrix might throw exceptions. limited goto statement – if there is an exception, the program stops executing and jumps immediately to the next except statement that handles that error

create a local file and save the downloaded page try: f = open(path, "wb") #create a handle to a new file for "wb": _w_riting in _b_inary f.write(web_data.read()) #write into the new file the results from downloading the webpage f.close() #complete writing process. print "saved %s", path except IOError, msg: print "didn't save %s: %s", path, str(msg) return #end the routine

File Manipulation in Python Details on files: Python Tutorial Section 7.2 Start: Construct a file object using the open command file_object_name = open(filename, mode) Read/writestring/data= file_object_name. read() file_object_name. write(data to write) Finish using the file file_object_name. close()

Possible extensions Unzip the files that we downloaded (easy?) import os os.system(‘unzip ’+file_name) (See Test that downloaded data have expected characteristics (e.g. four fields per line) using regular expressions Read in and manipulate the XML databases (harder?) Enter these file names into a SAS or Stata import / analysis code and run SAS / Stata

Python can do far more with webpages Details on web: Its sample programs include: –Webchecker.py (checks for broken links on a website) –Websucker.py (downloads a whole website) I found their code a bit hard to follow. I used snippets of those programs as examples for this program