Relational Databases: Object Relational Mappers – SQLObject II BCHB524 Lecture 23 BCHB524 - Edwards
Relational Databases Store information in a table Rows represent items Columns represent items' properties or attributes Name Continent Region Surface Area Population GNP Brazil South America 8547403 170115000 776739 Indonesia Asia Southeast Asia 1904569 212107000 84982 India Southern and Central Asia 3287263 1013662000 447114 China Eastern Asia 9572900 1277558000 982268 Pakistan 796095 156483000 61289 United States North America 9363520 278357000 8510700 BCHB524 - Edwards
... as Objects Objects have data members or attributes. Store objects in a list or iterable. Abstract away details of underlying RDBMS c1 = Country() c1.name = 'Brazil' c1.continent = 'South America' c1.region = 'South America' c1.surfaceArea = 8547403 c1.population = 170115000 c1.gnp = 776739 # initialize c2, ..., c6 countryTable = [ c1, c2, c3, c4, c5, c6 ] for cnty in countryTable: if cnty.population > 100000000: print cnty.name, cnty.population BCHB524 - Edwards
Taxonomy Database, from scratch Specify the model Tables: Taxonomy and Name Populate basic data-values in the Taxonomy table from “small_nodes.dmp” Populate the Names table from “small_names.dmp” Insert basic data-values Insert relationship with Taxonomy table Fix Taxonomy parent relationship Fix Taxonomy derived information Use in a program… BCHB524 - Edwards
Taxonomy Database: model.py from sqlobject import * import os.path, sys dbfile = 'small_taxa.db3' def init(new=False): # Magic formatting for database URI conn_str = os.path.abspath(dbfile) conn_str = 'sqlite:'+ conn_str # Connect to database sqlhub.processConnection = connectionForURI(conn_str) if new: # Create new tables (remove old ones if they exist) Taxonomy.dropTable(ifExists=True) Name.dropTable(ifExists=True) Taxonomy.createTable() Name.createTable() BCHB524 - Edwards
Taxonomy Database: model.py # model.py continued… class Taxonomy(SQLObject): taxid = IntCol(alternateID=True) scientific_name = StringCol() rank = StringCol() parent = ForeignKey("Taxonomy") class Name(SQLObject): taxonomy = ForeignKey("Taxonomy") name = StringCol() name_class = StringCol() BCHB524 - Edwards
Taxonomy Database structure Name Taxonomy 1 2 3 4 5 6 1 2 3 4 5 parent taxonomy taxonomy: 4 parent: 2 taxonomy: 4 parent: 2 taxonomy: 4 Foreign Key: id number of some other row BCHB524 - Edwards
Populate Taxonomy table: load_taxa.py import sys from model import * init(new=True) # Read in the taxonomy nodes, populate taxid and rank h = open(sys.argv[1]) for l in h: l = l.strip('\t|\n') sl = l.split('\t|\t') taxid = int(sl[0]) rank = sl[2] t = Taxonomy(taxid=taxid, rank=rank, scientific_name=None, parent=None) h.close() import sys from model import * init(new=True) # Read in the taxonomy nodes, populate taxid and rank h = open(sys.argv[1]) for l in h: l = l.strip('\t|\n') sl = l.split('\t|\t') taxid = int(sl[0]) rank = sl[2] t = Taxonomy(taxid=taxid, rank=rank, scientific_name=None, parent=None) h.close() BCHB524 - Edwards
Populate Name table: load_names.py import sys from model import * init() # Read in the names, populate name, class, and id of # taxonomy row h = open(sys.argv[1]) for l in h: l = l.strip('\t|\n') sl = l.split('\t|\t') taxid = int(sl[0]) name_class = sl[3] name = sl[1] t = Taxonomy.byTaxid(taxid) n = Name(name=name, name_class=name_class, taxonomy=t) h.close() import sys from model import * init() # Read in the names, populate name, class, and id of taxonomy row h = open(sys.argv[1]) for l in h: l = l.strip('\t|\n') sl = l.split('\t|\t') taxid = int(sl[0]) name_class = sl[3] name = sl[1] t = Taxonomy.byTaxid(taxid) n = Name(name=name, name_class=name_class, taxonomy=t) h.close() BCHB524 - Edwards
Fix up the Taxonomy table: fix_taxa.py import sys from model import * init() # Read in the taxonomy nodes, get self and parent taxonomy objects, # and fix the parent field appropriately h = open(sys.argv[1]) for l in h: l = l.strip('\t|\n') sl = l.split('\t|\t') taxid = int(sl[0]) parent_taxid = int(sl[1]) t = Taxonomy.byTaxid(taxid) p = Taxonomy.byTaxid(parent_taxid) t.parent = p h.close() # Find all scientific names and fix their taxonomy objects' scientific # name files appropriately for n in Name.select(Name.q.name_class == 'scientific name'): n.taxonomy.scientific_name = n.name import sys from model import * init() # Read in the taxonomy nodes, get self and parent taxonomy objects, # and fix the parent field appropriately h = open(sys.argv[1]) for l in h: l = l.strip('\t|\n') sl = l.split('\t|\t') taxid = int(sl[0]) parent_taxid = int(sl[1]) t = Taxonomy.byTaxid(taxid) p = Taxonomy.byTaxid(parent_taxid) t.parent = p h.close() # Find all scientific names and fix their taxonomy objects' scientific # name files appropriately for n in Name.select(Name.q.name_class == 'scientific name'): n.taxonomy.scientific_name = n.name BCHB524 - Edwards
Back to the Taxonomy example Each taxonomy entry can have multiple names Many names can point (ForeignKey) to a single taxonomy entry name → taxonomy is easy... taxonomy → list of names requires a select statement from model import * init() hs = Taxonomy.byTaxid(9606) for n in Name.select(Name.q.taxonomy==hs): print n.name BCHB524 - Edwards
Taxonomy Database structure Name Taxonomy 1 2 3 4 5 6 1 2 3 4 5 parent taxonomy taxonomy: 4 parent: 2 taxonomy: 4 parent: 2 taxonomy: 4 Foreign Key: id number of some other row BCHB524 - Edwards
Taxonomy table relationships This relationship (one-to-many) is called a multiple join. Related joins (many-to-many) too... class Taxonomy(SQLObject): # other data members names = MultipleJoin("Name") children = MultipleJoin("Taxonomy",joinColumn='parent_id') from model import * init() hs = Taxonomy.byTaxid(9606) for n in hs.names: print n.name for c in hs.children: print c.scientific_name BCHB524 - Edwards
SQLObject Exceptions What happens when the row isn't in the table? from model import * try: hs = Taxonomy.get(7921) hs = Taxonomy.byTaxid(9606) except SQLObjectNotFound: # if row id 7921 / Tax id 9606 is not in table... results = Taxonomy.selectBy(taxid=9606) if results.count() == 0: # No rows satisfy the constraint! try: first_item = results[0] except IndexError: # No first item in the results BCHB524 - Edwards
Example Program import sys from model import * init() try: taxid = int(sys.argv[1]) except IndexError: print >>sys.stderr, "Need a taxonomy id argument" sys.exit(1) except ValueError: print >>sys.stderr, "Taxonomy id should be an intenger" sys.exit(1) #Get taxonomy row try: t = Taxonomy.byTaxid(taxid) except SQLObjectNotFound: print >>sys.stderr, "Taxonomy id",taxid,"does not exist" sys.exit(1) for n in t.names: print "Organism",t.scientific_name,"has name",n.name for c in t.children: print "Organism",t.scientific_name,"has child",c.scientific_name,c.taxid print "Organism",t.scientific_name,"has parent",t.parent.scientific_name,t.parent.taxid BCHB524 - Edwards
Example Program # Continued... # Iterate up through the taxonomy tree from t, to find its genus r = t g = None while r != r.parent: if r.rank == 'genus': g = r break r = r.parent if g == None: print "Organism",t.scientific_name,"has no genus" else: print "Organism",t.scientific_name,"has genus",g.scientific_name BCHB524 - Edwards
Exercises Write a python program using SQLObject to find the taxonomic lineage of a user-supplied organism name. Make sure you use the small_taxa.db3 file from the course data-folder Note that the supplied model.py files from Lectures 23 and 24 are different! Lecture 23 uses taxa.db3, Lecture 24 uses small_taxa.db3 BCHB524 - Edwards