Important modules: Biopython, SQL & COM. Information sources  python.org  tutor list (for beginners), the Python Package index, on-line help, tutorials,

Slides:



Advertisements
Similar presentations
Important modules: Biopython, SQL & COM. Information sources python.org tutor list (for beginners), the Python Package index, on-line help, tutorials,
Advertisements

Introduction to Computing Using Python CSC Winter 2013 Week 8: WWW and Search  World Wide Web  Python Modules for WWW  Web Crawling  Thursday:
BioPython Tutorial Joe Steele Ishwor Thapa. BioPython home page ial.html.
10/6/2014BCHB Edwards Sequence File Parsing using Biopython BCHB Lecture 11.
Web development  World Wide Web (web) is the Internet system for hypertext linking.  A hypertext document (web page) is an online document. It contains.
J4www/jea Week 3 Version Slide edits: nas1 Format of lecture: Assignment context: CRUD - “update details” JSP models.
1 Parsing XML sequence? We have i2xml filter (exercise) – we want xml2i also Don’t have to write XML parser, Python provides one Thus, algorithm: – Open.
The Poor Beginners’ Guide to Bioinformatics. What we have – and don’t have... a computer connected to the Internet (incl. Web browser) a text editor (Notepad.
Creating your website Using Plain HTML. What is HTML? ► Web pages are authored in HyperText Markup Language (HTML) ► Plain text is marked up with tags,
Common Services in a network Server : provide services Type of Services (= type of servers) –file servers –print servers –application servers –domain servers.
BioPerl. cpan Open a terminal and type /bin/su - start "cpan", accept all defaults install Bio::Graphics.
Why Worry About the WWW? Intranets -- with lots of HR applications »policies/procedures »job postings »benefits & other transactions »hiring & other workflows.
1 Spidering the Web in Python CSC 161: The Art of Programming Prof. Henry Kautz 11/23/2009.
Public Resources (II) – Analysis tools  Web-based analysis tools – easy to use, but often with less customization options.  Stand-alone analysis tools.
Computer Concepts 2014 Chapter 7 The Web and .
“Everything Else”. Find all substrings We’ve learned how to find the first location of a string in another string with find. What about finding all matches?
Lecture#2 on Internet and World Wide Web. Internet Applications Electronic Mail ( ) Electronic Mail ( ) Domain mail server collects incoming mail.
Postacademic Interuniversity Course in Information Technology – Module C1p1 Contents Data Communications Applications –File & print serving –Mail –Domain.
BioPython Workshop Gershon Celniker Tel Aviv University.
Introduction to Python for Biologists Lecture 3: Biopython This Lecture Stuart Brown Associate Professor NYU School of Medicine.
Public Resources for Bioinformatics Databases : how to find relevant information. Analysis Tools.
WEB API: WHY THEY MATTER ECOL 453/ Nirav Merchant
ITIS 1210 Introduction to Web-Based Information Systems Chapter 23 How Web Host Servers Work.
Python and REST Kevin Hibma. What is REST? Why REST? REST stands for Representational State Transfer. (It is sometimes spelled "ReST".) It relies on a.
Microsoft Internet Explorer and the Internet Using Microsoft Explorer 5.
Supporting High- Performance Data Processing on Flat-Files Xuan Zhang Gagan Agrawal Ohio State University.
Cross Site Integration “mashups” cross site scripting.
Putting it all together Dynamic Data Base Access Norman White Stern School of Business.
9 1 DBM Databases CGI/Perl Programming By Diane Zak.
Identifying the ortholog of TNF (Tumor necrosis factor) in mosquito genomes Pet Projects:
Web Page Design Introduction. The ________________ is a large collection of pages stored on computers, or ______________ around the world. Hypertext ________.
>gi| |gb|AAB | ADP-glucose pyrophosphorylase large subunit [Oryza sativa] 02-AUG-1996 Gene accession U66041 Plant Physiol. 112, 1399 (1996)
1 WWW. 2 World Wide Web Major application protocol used on the Internet Simple interface Two concepts –Point –Click.
BioPerl Ketan Mane SLIS, IU. BioPerl Perl and now BioPerl -- Why ??? Availability Advantages for Bioinformatics.
Information Retrieval and Web Search Crawling in practice Instructor: Rada Mihalcea.
Module: Software Engineering of Web Applications Chapter 2: Technologies 1.
GE3M25: Computer Programming for Biologists Python, Class 5
Chapter 16 Web Pages And CGI Scripts Department of Biomedical Informatics University of Pittsburgh School of Medicine
Practice – file types (Cont.) Load the “Mysequence.doc” file to Webcutter using “Choose file” and then “Upload sequence file”. -Notice that the “sequence”
Stand-alone tools 2. 1.Download the zip file to the GMS6014 folder. 2.Unzip the files to a folder named “clustalx”. 3.Edit the MDM2_isoforms_5.fasta file.
COSC 2328 – Web Programming.  PHP is a server scripting language  It’s widely-used and free  It’s an alternative to Microsoft’s ASP and Ruby  PHP.
MARC: Developing Bioinformatics Programs Alex Ropelewski PSC-NRBSC Bienvenido Vélez UPR Mayaguez Essential BioPython Manipulating Sequences with Seq 1.
Biopython 1. What is Biopython? tools for computational molecular biology to program in python and want to make it as easy as possible to use python for.
Python: Programming the Google Search (Crawling) Damian Gordon.
Web Page Design The Basics. The Web Page A document (file) created using the HTML scripting language. A document (file) created using the HTML scripting.
MARC: Developing Bioinformatics Programs Alex Ropelewski PSC-NRBSC Bienvenido Vélez UPR Mayaguez Essential BioPython: Overview 1.
Sequence File Parsing using Biopython
4.01 How Web Pages Work.
4.01 How Web Pages Work.
Modules and BioPerl.
BioPython Download & Installation Documentation
Lesson 4: Web Browsing.
Introduction to Computers
Address Verification Using SQL, TextPad and Web Link Validator.
Some bits on how it works
Essential BioPython Retrieving Sequences from the Web
BioPython Download & Installation Documentation
Sequence File Parsing using Biopython
Building Web Applications
Lesson 4: Web Browsing.
Python.
A Short Course on Geant4 Simulation Toolkit How to learn more?
A Short Course on Geant4 Simulation Toolkit How to learn more?
MVC Controllers.
Sequence File Parsing using Biopython
“Everything Else”.
Supporting High-Performance Data Processing on Flat-Files
4.01 How Web Pages Work.
How to search NCBI.
Desktop Mapping: Building Map Books
Presentation transcript:

Important modules: Biopython, SQL & COM

Information sources  python.org  tutor list (for beginners), the Python Package index, on-line help, tutorials, links to other documentation, and more.  biopython.org (and mailing list)‏  newsgroup comp.lang.python

Biopython Collection of many bioinformatics modules Some well tested, some experimental Check with biopython.org before writing new software. It may already exist. Contribute your code (even useful scripts) to them.

The Seq object >>> from Bio import Seq >>> seq = Seq.Seq("ATGCATGCATGATGATCG")‏ >>> print seq Seq('ATGCATGCATGATGATCG', Alphabet())‏ >>> Alphabet? What’s that? Python doesn’t know that you gave it DNA. (It could be a strange protein.)‏

Alphabets >>> from Bio import Seq >>> from Bio.Alphabet import IUPAC >>> protein = Seq.Seq("ATGCATGCATGC", IUPAC.protein)‏ >>> dna = Seq.Seq("ATGCATGCATGC", IUPAC.unambiguous_dna)‏ >>> protein[:10] Seq('ATGCATGCAT', IUPACProtein())‏ >>> protein[:10] + protein[::-1] Seq('ATGCATGCATCGTACGTACGTA', IUPACProtein())‏ >>> dna[:6] Seq('ATGCAT', IUPACUnambiguousDNA())‏ >>> dna[0] 'A' >>> protein[:10] + dna[:6] Traceback (most recent call last): File " ", line 1, in ? File "/usr/local/lib/python2.3/site-packages/Bio/Seq.py", line 45, in __add__ raise TypeError, ("incompatable alphabets", str(self.alphabet), TypeError: ('incompatable alphabets', 'IUPACProtein()', 'IUPACUnambiguousDNA()')‏ >>>

Translation >>> from Bio import Seq >>> from Bio.Alphabet import IUPAC >>> from Bio import Translate >>> >>> standard_translator = Translate.unambiguous_dna_by_id[1] >>> seq = Seq.Seq("GATCGATGGGCCTATATAGGATCGAAAATCGC",... IUPAC.unambiguous_dna)‏ >>> standard_translator.translate(seq)‏ Seq('DRWAYIGSKI', HasStopCodon(IUPACProtein(), '*'))‏ >>>

Reading sequence files We’ve put a lot of work into reading common bioinformatics file formats. As the formats change, we update our parsers. There’s (almost) no reason for you to write your own GenBank, SWISS-PROT,... parser!

Reading a FASTA file >>> from Bio import Fasta >>> parser = Fasta.RecordParser()‏ >>> infile = open("ls_orchid.fasta")‏ >>> iterator = Fasta.Iterator(infile, parser)‏ >>> record = iterator.next()‏ >>> record.title 'gi| |emb|Z |CIZ78533 C.irapeanum 5.8S rRNA gene and ITS1 and ITS2 DNA' >>> record.sequence 'CGTAACAAGGTTTCCGTAGGTGAACCTGCGGAAGGATCATTGATGAGACCGTGGAATAAACGATCGAGT GAATCCGGAGGACCGGTGTACTCAGCTCACCGGGGGCATTGCTCCCGTGGTGACCCTGATTTGTTGTTGG GCCGCCTCGGGAGCGTCCATGGCGGGTTTGAACCTCTAGCCCGGCGCAGTTTGGGCGCCAAGCCATATGA AAGCATCACCGGCGAATGGCATTGTCTTCCCCAAAACCCGGAGCGGCGGCGTGCTGTCGCGTGCCCAATG AATTTTGATGACTCTCGCAAACGGGAATCTTGGCTCTTTGCATCGGATGGAAGGACGCAGCGAAA TGCGATA AGTGGTGTGAATTGCAAGATCCCGTGAACCATCGAGTCTTTTGAACGCAAGTTGCGCCCGAGGCCATCAGGCTAAGGGCACGCCTGCTTG GGCGTCGCGCTTCGTCTCTCTCCTGCCAATGCTTGCCCGGCATACAGCCAGGCCGGCGTGGTGCGGATGTGAAAGATTGGCCCCTTGTGC CTAGGTGCGGCGGGTCCAAGAGCTGGTGTTTTGATGGCCCGGAACCCGGCAAGAGGTGGACGGATGCTGGCAGCAGCTGCCGTGCGAATC CCCCATGTTGTCGTGCTTGTCGGACAGGCAGGAGAACCCTTCCGAACCCCAATGGAGGGCGGTTGACCGCCATTCGGATGTGACCCCAGG TCAGGCGGGGGCACCCGCTGAGTTTACGC'

Reading all records >>> from Bio import Fasta >>> parser = Fasta.RecordParser()‏ >>> infile = open("ls_orchid.fasta")‏ >>> iterator = Fasta.Iterator(infile, parser)‏ >>> while 1:... record = iterator.next()‏... if not record:... break... print record.title[record.title.find(" ")+1:-1]... C.irapeanum 5.8S rRNA gene and ITS1 and ITS2 DN C.californicum 5.8S rRNA gene and ITS1 and ITS2 DN C.fasciculatum 5.8S rRNA gene and ITS1 and ITS2 DN C.margaritaceum 5.8S rRNA gene and ITS1 and ITS2 DN C.lichiangense 5.8S rRNA gene and ITS1 and ITS2 DN C.yatabeanum 5.8S rRNA gene and ITS1 and ITS2 DN.... additional lines removed....

Reading a GenBank file >>> from Bio import GenBank >>> parser = GenBank.RecordParser()‏ >>> infile = open("input.gb")‏ >>> iterator = GenBank.Iterator(infile, parser)‏ >>> record = iterator.next()‏ >>> record.locus '10A19I' >>> record.organism 'Oryza sativa (japonica cultivar-group)' >>> len(record.features)‏ 31 >>> record.features[0].key 'source' >>> record.features[0].location ' ' >>> record.taxonomy ['Eukaryota', 'Viridiplantae', 'Streptophyta', 'Embryophyta', 'Tracheophyta', 'Spermatophyta', 'Magnoliophyta', 'Liliopsida', 'Poales', 'Poaceae', 'Ehrhartoideae', 'Oryzeae', 'Oryza'] >>> Only changed the format name

Get data over the web Python includes the ‘urllib2’ (successor to urllib) to fetch data given a URL. It can handle GET and POST requests for HTTP and HTTPS, do ftp, and read local files. Several web servers have an interface for programs to get data directly from the server instead of going through the HTML page meant for humans. But most of the time we have to do some screen-scraping.