Python Modules and Basic File Parsing

Slides:

Advertisements

Similar presentations

Computer Science 111 Fundamentals of Programming I Files.

Advertisements

10/1/2014BCHB Edwards Python Modules and Basic File Parsing BCHB Lecture 10.

COEN 445 Communication Networks and Protocols Lab 4

10/6/2014BCHB Edwards Sequence File Parsing using Biopython BCHB Lecture 11.

Python Web Applications A KISS Introduction. Web Applications with Python Fetching, parsing, text processing Database client – mySQL, etc., for building.

Python Mini-Course University of Oklahoma Department of Psychology Lesson 17 Reading and Writing Files 5/10/09 Python Mini-Course: Lesson 17 1.

Selecting and Combining Tools F. Duveau 02/03/12 F. Duveau 02/03/12 Chapter 14.

By Zeng Sheng Liu. os - provides dozens of functions for interacting with the operating system >>> import os >>> os.system('time 0:02') 0 >>> os.getcwd()

By: Joshua O’Donoghue. Operating System Interface In order to interact with the operating system in python you will want to become familiar with the OS.

By Ryan Smith The Standard Library In Python. Python’s “Batteries Included” Philosophy Python’s standard library was designed to be able to handle as.

Pandas: Python Programming for Spreadsheets Pamela Wu Sept. 17 th 2015.

1 In the good old days... Years ago… the WWW was made up of (mostly) static documents. –Each URL corresponded to a single file stored on some hard disk.

Python’s Standard Library - Part I Josh Lawrence.

9/16/2015BCHB Edwards Introduction to Python BCHB Lecture 5.

10/20/2014BCHB Edwards Advanced Python Concepts: Modules BCHB Lecture 14.

Libraries Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See

CSCI/CMPE 4341 Topic: Programming in Python Review: Exam I Xiang Lian The University of Texas – Pan American Edinburg, TX 78539

(A Very Short) Introduction to Shell Scripts CSCI N321 – System and Network Administration Copyright © 2000, 2003 by Scott Orr and the Trustees of Indiana.

Scripting Languages James Brucker Computer Engineering Dept Kasetsart University.

9/28/2015BCHB Edwards Basic Python Review BCHB Lecture 8.

CS105 Computer Programming PYTHON (based on CS 11 Python track: lecture 1, CALTECH)

Python – reading and writing files. ??? ???two ways to open a file open and file ??? How to write to relative and absolute paths?

Guide to Programming with Python Chapter Seven Files and Exceptions: The Trivia Challenge Game.

Using Local Tools: BLAST

Python’s Standard Library Part I Joe Houpert CS265.

PC204 Lecture 5 Conrad Huang Genentech Hall, N453A

CS2021 Python Programming Week 3 Systems Programming PP-Part II.

Lecture 4 Python Basics Part 3.

CIT 590 Intro to Programming Files etc. Agenda Files Try catch except A module to read html off a remote website (only works sometimes)

By: Aradhya Malhotra.  To interact with the OS in python you will want to become familiar with the OS module  The command “import os” is used for this.

CIT 590 Intro to Programming Lecture 6. Vote in the doodle poll so we can use some fancy algorithm to pair you up You.

Tips. Iteration On a list: o group = ["Paul","Duncan","Jessica"] for person in group: print(group) On a dictionary: o stock = {'eggs':15, 'milk':3, 'sugar':28}

Sequence File Parsing using Biopython

Relational Databases: Basic Concepts

Introduction to Python

Using Local Tools: BLAST

Advanced Python Concepts: Modules

Lecture 4 Python Basics Part 3.

Python Modules and Basic File Parsing

(optional - but then again, all of these are optional)

(optional - but then again, all of these are optional)‏

CSC1018F: Functional Programming

Introduction to Programming the WWW I

Final Project: Read from a csv file and write to a database table

Azure Machine Learning & ML Studio

Tutorial 8 Objectives Continue presenting methods to import data into Access, export data from Access, link applications with data stored in Access, and.

Sequence File Parsing using Biopython

The Linux Command Line Chapter 18

Fundamentals of Programming I Files

Basic Python Review BCHB524 Lecture 8 BCHB524 - Edwards.

Lecture 4 Python Basics Part 3.

Next Gen. Sequencing Files and pysam

Relational Databases: Object Relational Mappers – SQLObject II

Python’s Standard library part I

Advanced Python Concepts: Exceptions

Introduction to Python

Advanced Python Concepts: Modules

Relational Databases: Basic Concepts

Using Local Tools: BLAST

ETL – Extract, Transform, Load

Relational Databases: Basic Concepts

Introduction to Python

Basic Python Review BCHB524 Lecture 8 BCHB524 - Edwards.

Advanced Python Concepts: Exceptions

Using Local Tools: BLAST

Advanced Python Concepts: Modules

Python Modules and Basic File Parsing

Sequence File Parsing using Biopython

“Everything Else”.

Tutorial 8 Sharing, Integrating, and Analyzing Data

Presentation transcript:

Python Modules and Basic File Parsing BCHB524 Lecture 12 BCHB524 - Edwards

Outline Python library (modules) Basic stuff: os, os.path, sys Special files: zip, gzip, tar, bz2 Math: math, random Web stuff: urllib, cgi, html Formats: xml, .ini, csv Databases: SQL, DBM BCHB524 - Edwards

Python Library & Modules The python library contains lots and lots and lots of extremely useful modules “Batteries included” Many things you want to do have already been done for you! BCHB524 - Edwards http://xkcd.com/353/

Basic modules: sys Use in just about every program! sys.argv list provides the “command-line” arguments to your script sys.stdin, sys.stdout, sys.stderr provide "standard" input, output, and error file handles sys.exit() ends the program, now! BCHB524 - Edwards

Basic modules: sys import sys data = sys.stdin.read() if len(sys.argv) < 2: print >>sys.stderr, "There is a problem!" sys.exit() filename = sys.argv[1] more_data = open(filename,'r').read() results = compute(data,more_data) print >>sys.stdout, results c:\> test.py cmd-line-arg1 < stdin.txt > stdout.txt BCHB524 - Edwards

Basic modules: os, os.path os.getcwd() gets the current working directory os.path.abspath(filename) Full pathname for filename os.path.exists(filename) Does a file with filename exist? os.path.join(path1,path2,path3) Join partial paths os.path.split(path) Get the directory and filename for a path BCHB524 - Edwards

Basic modules: os, os.path # Import important modules import os import os.path import sys # Check for command-line arguement if len(sys.argv) < 2: print >>sys.stderr, "There is a problem!" sys.exit() # Get the filename filename = sys.argv[1] # Get the current working directory cwd = os.getcwd() print cwd # Turn a filename into a full path abspath = os.path.abspath(filename) print abspath BCHB524 - Edwards

Basic modules: os, os.path # make the home directory path homedir = '/home/student' print homedir # Check if the file is there if os.path.exists(filename): print filename,"is there" else: print filename,"does not exist" # Check if the file is in the current working directory new_filename = os.path.join(cwd,filename) if os.path.exists(new_filename): print new_filename,"is there" else: print new_filename, "does not exist" # Check if the file is in home directory new_filename = os.path.join(homedir,filename) if os.path.exists(new_filename): print new_filename,"is there" else: print new_filename, "does not exist" BCHB524 - Edwards

Special files: zip You can use the appropriate module to open various types of compressed and archival file-formats import zipfile import sys zipfilename = sys.argv[1] zf = zipfile.ZipFile(zipfilename) for filename in zf.namelist(): if filename.startswith("A2"): print filename ncore = 'M3.txt' thedata = zf.read(ncore) print thedata BCHB524 - Edwards

Special files: gz gzip format is very common for bioinformatics files (Extention is .gz) Use the gzip module to read and write as if a normal file (not an archive format like zip) import gzip zf = gzip.open('sprot_chunk.dat.gz') for i,line in enumerate(zf): print line.rstrip() if i > 10: break zf.close() BCHB524 - Edwards

Math: math, random math.floor(), math.ceil() round up and down random.random() random float between 0 and 1 random.randint(a,b) random int between a and b import math print math.floor(2.5) print math.ceil(2.5) import random print random.random() print random.randint(0,10) BCHB524 - Edwards

Web stuff: urllib Open a url just like a file import urllib url = 'http://edwardslab.bmcb.georgetown.edu/' + \ 'teaching/bchb524/2016/data/standard.code' print "The URL:",url handle = urllib.urlopen(url) for line in handle: print line.rstrip() handle.close() filename = 'standard.code' print "The File:",filename handle = open(filename) for line in handle: print line.rstrip() handle.close() BCHB524 - Edwards

File formats: CSV Comma separated values Can be read (and written) by lots of different tools Easy way to format data for Excel First row is (sometimes) "headings" or names Other rows list the values in each column import csv handle = open('data.csv') rows = csv.reader(handle) # No headers # Iterate through the rows for r in rows: # access r as a list of values print r[0],r[1],r[2] handle.close() BCHB524 - Edwards

File formats: CSV Most powerful with headings import csv file = open('data.txt') # Headers, and tab-separated-values rows = csv.DictReader(file,dialect='excel-tab') # Iterate through the rows for r in rows: # access r as a dictionary - headers are keys print r['TUMOUR'],r['R00884'] file.close() BCHB524 - Edwards

Exercise 1 Write a program that reads the microarray data in “data.csv” and computes the mean and standard deviation of the expression values of a specific gene overall, and within each sample category. Get the name of the microarray datafile from the command-line. Get the name of the gene from the command-line. BCHB524 - Edwards