Requirements of a Taxonomy Database Tcl-DB a Prototype.

Slides:



Advertisements
Similar presentations
The Hierarchical Model
Advertisements

Normalisation The theory of Relational Database Design.
1 Database Design and Development: A Visual Approach © 2006 Prentice Hall Chapter 2 Relational Theory DATABASE DESIGN AND DEVELOPMENT: A VISUAL APPROACH.
® IBM Software Group © 2006 IBM Corporation Rational Software France Object-Oriented Analysis and Design with UML2 and Rational Software Modeler 04. Other.
Slide 1 Web-Base Management Systems Aaron Brown and David Oppenheimer CS294-7 February 11, 1999.
Create new database Create staging table Import new taxonomy Index new taxonomy Load new taxonomy to core db New TNRS DB New taxonomic source More taxonomic.
Using Cabal and the Hackage Package Database. Hackage Hackage is a database of Haskell packages (or modules) written by others and available for public.
Oct 31, 2000Database Management -- Fall R. Larson Database Management: Introduction to Terms and Concepts University of California, Berkeley School.
Chapter 2 Database Environment.
Chapter 2 Database Environment. Agenda Three-Level ANSI-SPARC Architecture Database Languages Data Models Functions of DBMS Components of DBMS Teleprocessing.
Geographic Information Systems
ASP.NET Database Connectivity I. 2 © UW Business School, University of Washington 2004 Outline Database Concepts SQL ASP.NET Database Connectivity.
Chapter 4 Relational Databases Copyright © 2012 Pearson Education, Inc. publishing as Prentice Hall 4-1.
Attribute databases. GIS Definition Diagram Output Query Results.
Lecture Two Database Environment Based on Chapter Two of this book:
Chapter 4 Relational Databases Copyright © 2012 Pearson Education 4-1.
Database Environment 1.  Purpose of three-level database architecture.  Contents of external, conceptual, and internal levels.  Purpose of external/conceptual.
XML, distributed databases, and OLAP/warehousing The semantic web and a lot more.
The Relational Database Model
Database System Concepts and Architecture Lecture # 3 22 June 2012 National University of Computer and Emerging Sciences.
Chapter 2 Database Environment
Data Access Patterns Some of the problems with data access from OO programs: 1.Data source and OO program use different data modelling concepts 2.Decoupling.
Database System Concepts and Architecture Lecture # 2 21 June 2012 National University of Computer and Emerging Sciences.
Representing taxonomy MarBEF-IODE workshop Oostende, March 2007.
Midterm Exam Chapters 1,2,3,5, 6,7 (closed book) March 11, 2014.
Concepts and Terminology Introduction to Database.
ABC Insurance Co. Paul Barry Steve Randolph Jing Zhou CSC8490 Database Systems & File Management Dr. Goelman Villanova University August 2, 2004.
DBMS Spring 2014 Database Integrity Sources: Security in Computing, Pfleeger and Pfleeger, Prentice Hall, 2003 Lecture Slides, CSE6243, MSU, Rayford B.
Normalization (Codd, 1972) Practical Information For Real World Database Design.
Software Breakdown. Monday, October 26, 2009 CS410 Green Team Fall High Level Architecture.
Chapter 12 View Design and Integration. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Motivation for view design.
11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)
Lecture2: Database Environment Prepared by L. Nouf Almujally & Aisha AlArfaj 1 Ref. Chapter2 College of Computer and Information Sciences - Information.
Experience from Mapping Existing Models to the Transfer Schema Robert Kukla.
Chapter 9 View Design and Integration. © 2001 The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Outline Motivation for view design.
Lecture2: Database Environment Prepared by L. Nouf Almujally 1 Ref. Chapter2 Lecture2.
University of Crete Department of Computer Science ΗΥ-561 Web Data Management XML Data Archiving Konstantinos Kouratoras.
ATLAS Detector Description Database Vakho Tsulaia University of Pittsburgh 3D workshop, CERN 14-Dec-2004.
MANAGING DATA RESOURCES ~ pertemuan 7 ~ Oleh: Ir. Abdul Hayat, MTI.
1 CS 430 Database Theory Winter 2005 Lecture 2: General Concepts.
Field Based Data Validation: a very real experience in wrangling data, taxonomic names, and photos Moorea Biocode Project, supported by the Gordon and.
C OMPUTING E SSENTIALS Timothy J. O’Leary Linda I. O’Leary Presentations by: Fred Bounds.
Management Information Systems, 4 th Edition 1 Chapter 8 Data and Knowledge Management.
GBIF Data Access and Database Interoperability 2003 Work Programme Overview Donald Hobern, GBIF Programme Officer for Data Access and Database Interoperability.
Scalable Hybrid Keyword Search on Distributed Database Jungkee Kim Florida State University Community Grids Laboratory, Indiana University Workshop on.
Exam 1 Review Dr. Bernard Chen Ph.D. University of Central Arkansas.
Extending the biogeographical model Africamuseum 6 (7?) June 2013.
Chapter 2 Database Environment.
Mapping ER to Relational Model Each strong entity set becomes a table. Each weak entity set also becomes a table by adding primary key of owner entity.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Database Environment Chapter 2. The Three-Level ANSI-SPARC Architecture External Level Conceptual Level Internal Level Physical Data.
By Durga Dasari. DbFit - DB Unit and Integration test tool  Test-driven database development. Write readable, easy- to-maintain unit and integration.
Geographic Information Systems GIS Data Databases.
Introduction to Spatial Computing CSE 555
Database Development Lifecycle
Chapter 2 Database System Concepts and Architecture
IS221: Database Management
Quiz Questions Q.1 An entity set that does not have sufficient attributes to form a primary key is a (A) strong entity set. (B) weak entity set. (C) simple.
CPSC-310 Database Systems
Chapter 4 Relational Databases
Geographic Information Systems
MANAGING DATA RESOURCES
Normalization By Jason Park Fall 2005 CS157A.
Declarative Creation of Enterprise Applications
Analysis models and design models
Review of Week 1 Database DBMS File systems vs. database systems
Computer Science Projects Database Theory / Prototypes
Database Connectivity and Web Development
Normalization By Jason Park Fall 2005 CS157A.
Geographic Information Systems
Presentation transcript:

Requirements of a Taxonomy Database Tcl-DB a Prototype

Outline 1.Requirements Hierarchy Alternative Search Terms: Synonyms and Vernaculars Alternative Spellings Alternative Classifications 2.Tcl-DB Prototype System Tcl-DB Structure 2NF 3.Extensibile: Adding a new data source e.g. NCBI 4.Tcl-DB: UID Tracking 5.Tcl-DB: Stats 6.Utility and Further Work

1. Hierarchy

2. Alternative Search Terms: Synonyms and Vernaculars

3. Alternative Spellings: Caenorabditis elegans, C elegans and Caenorhabditis elegans

4. Alternative Classifications:

Tcl-DB Prototype System. Proposed Architecture

Tcl-DB: Logical Structure

Tcl-DB Physical Database Structure

Assertion: Resolving the M:M with an association entity

Node: Hierarchical Queries Nested Set, Path and Connect by >select count(name_id) from node start with name_id = ‘100891' connect by prior name_id = parent_name_id; >select count(name_id) from node where path like '/%'; >select count(name_id) from node where left_id between 1 and 9290;

synonym_name and vernacular: subtypes,multi-valued attributes or weak entities

Tcl-DB: 2NF

Adding a new data source e.g. NCBI Tcl-DB: Procedures, Packages and Functions:

Step 1: Build Views, what names are already in the database

Step 2: Move names from view to Tcl schema

Step 3: Fill the nodes table in tcl schema

Step 4: fill synonym_name table in tcl schema Step 5: fill vernacular table in tcl schema

Tcl-DB: UID Tracking after name data load: 1.Run two joins on name and nids_mv Nids – name_id when the name_text exist Null – name_id when the name_text not exist 2.Update name and give all new names a NID 3.Update name give all names their original NID 4.Refresh the NID_view

Tcl-DB: Utility and Further Work Computing Interesting Stats: How much overlap between ITIS and NCBI? How many names unique to NCBI? How many of these are binomials Vs ‘environmental sample 256’ How many of these names can be matched allowing for 1 – 3 letter mismatches. NCBI taxonomy – data quality, Integrity and Usability? Transitively closing the Synonyms Table and Vernacular Table Building an interface. Spell checkers

Lots of Questions? How do we use this to build taxonomically aware databases? How about updates to the data? Database links, Web services, Simple DB Cross References? Use Genbank Model? Open to Suggestions/Ideas! Do we need to think about: PhyloCode? Type Specimens?