November 18, 2004 David A. Gaitros Department of Computer Science

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

Welcome to Middleware Joseph Amrithraj
Development of a computer information system for wildlife conservation in Louisiana, with a prototype system for fishes Henry L. Bart Jr. and Nelson E.
Provenance in Open Distributed Information Systems Syed Imran Jami PhD Candidate FAST-NU.
Working with SQL and PL/SQL/ Session 1 / 1 of 27 SQL Server Architecture.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
9 Feb 2004Mikko Mäkinen & Saija Ylönen Joint UNECE/Eurostat/OECD work session on statistical metadata (METIS) Geneva, 9-11 February 2004, Topic (ii): Metadata.
16-1 The World Wide Web The Web An infrastructure of distributed information combined with software that uses networks as a vehicle to exchange that information.
Web-Enabled Decision Support Systems
Publishing and Visualizing Large-Scale Semantically-enabled Earth Science Resources on the Web Benno Lee 1 Sumit Purohit 2
A Metadata Catalog Service for Data Intensive Applications Presented by Chin-Yi Tsai.
Database System Concepts and Architecture
Introduction to Apache OODT Yang Li Mar 9, What is OODT Object Oriented Data Technology Science data management Archiving Systems that span scientific.
Service Computation 2010November 21-26, Lisbon.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
11 3 / 12 CHAPTER Databases MIS105 Lec15 Irfan Ahmed Ilyas.
Data Exchange Standards The Power of Being Stupidly Simple Chuck Miller Missouri Botanical Garden TDWG 2008, Fremantle October 24, 2008.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Presented by Scientific Annotation Middleware Software infrastructure to support rich scientific records and the processes that produce them Jens Schwidder.
A radiologist analyzes an X-ray image, and writes his observations on papers  Image Tagging improves the quality, consistency.  Usefulness of the data.
Presented by Jens Schwidder Tara D. Gibson James D. Myers Computing & Computational Sciences Directorate Oak Ridge National Laboratory Scientific Annotation.
EGEE User Forum Data Management session Development of gLite Web Service Based Security Components for the ATLAS Metadata Interface Thomas Doherty GridPP.
May 6, 2002Earth System Grid - Williams The Earth System Grid Presented by Dean N. Williams PI’s: Ian Foster (ANL); Don Middleton (NCAR); and Dean Williams.
Scientific Annotation Middleware (SAM) Jim Myers, Elena Mendoza PNNL Al Geist, Jens Schwidder ORNL.
The Korean Bird Information System (KBIS) National Science Meseum of Korea InCoB 2009, Singapore.
Achieving Semantic Interoperability through Controlled Annotations Michael Gertz Department of Computer Science University of California, Davis
Copyright 2007, Information Builders. Slide 1 iWay Web Services and WebFOCUS Consumption Michael Florkowski Information Builders.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Submitted by: Moran Mishan. Instructed by: Osnat (Ossi) Mokryn, Dr.
Database Principles: Fundamentals of Design, Implementation, and Management Chapter 1 The Database Approach.
1 CASE Computer Aided Software Engineering. 2 What is CASE ? A good workshop for any craftsperson has three primary characteristics 1.A collection of.
CIS 375 Bruce R. Maxim UM-Dearborn
Getting started with Accurately Storing Data
Databases (CS507) CHAPTER 2.
Databases and DBMSs Todd S. Bacastow January 2005.
New features in KE EMu 3.1 and beyond
Chapter 1 The Systems Development Environment
Chapter 2: Database System Concepts and Architecture - Outline
Working in the Forms Developer Environment
An Introduction to database system
Database System Concepts and Architecture
Introduction Multimedia initial focus
Computer Aided Software Engineering (CASE)
Knowledge Management Systems
Chapter 1 The Systems Development Environment
PHP / MySQL Introduction
9/22/2018.
Ch > 28.4.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
Chapter 2: Database System Concepts and Architecture
Database Management System (DBMS)
SchoolFront - Notifications Training
DATABASE MANAGEMENT SYSTEM
Ahmet Fatih Mustacoglu
System And Application Software
Data, Databases, and DBMSs
Lecture 1: Multi-tier Architecture Overview
ARCH-1: Application Architecture made Simple
Analysis models and design models
Capturing and Organizing Scientific Annotations
Serpil TOK, Zeki BAYRAM. Eastern MediterraneanUniversity Famagusta
Manuscript Transcription Assistant Initiative
Introduction of Week 11 Return assignment 9-1 Collect assignment 10-1
Paul Jacobs The iSchool University of Maryland Thursday, Oct. 5, 2017
The Database Environment
HAO/SCD: VO, metadata, catalogs, ontologies, querying
Chapter 1 The Systems Development Environment
MIS 385/MBA 664 Systems Implementation with DBMS/ Database Management
UML Design for an Automated Registration System
SDMX IT Tools SDMX Registry
Presentation transcript:

Representation of Ontology Annotation Information in Grid Computing Prospectus November 18, 2004 David A. Gaitros Department of Computer Science Florida State University Welcome, I am here today to present my dissertation topic in annotation technology.

Overview Background on Annotation The Problem Research Goals Research Objectives Projected Accomplishments Projected Activities Graphical Annotation Generic Annotation Biological Database Problems Proposed Web Services Implementation Morphbank Database Schema Expected Challenges Masters Thesis/Projects Conclusion These are the topics I am covering

Background on Annotation Scientific Annotation Middleware (SAM) “We are creating a Scientific Annotation Middleware (SAM) system that will provide researchers and developers with the capabilities necessary to manage the complexity resulting from the collaborative, cross-disciplinary, compute-intensive research. SAM will include components and services that enable researchers, applications, problem solving environments (PSE) and software agents to create metadata and annotations about data objects and the semantic relationships between them. Human access to the middleware will be through a researcher’s notebook interface available via desktop computers and PDA devices. “ Source: http://collaboratory.emsl.pnl.gov/docs/collab/sam/samprojoverview.html I want to go over a few of the topics in annotation technology to give eveyone a very brief background in the current state. SAM ( Scientific Annotation Middleware) is under development by the US Department of Energy by Al Geist and Jim Myers They are in the process of developing a middleware product that combines desktop scientific notebooks with data grid technologies in what they say is a heterogeneous environment. Started in 2000 they are in version 1.2 of the SAM software. Limited release.

Background on Annotation

Background on Annotation Garlic – IBM “Garlic is a project being developed by members of the database group in Computer Science. The goal of Garlic is to enable large-scale multimedia information systems: large scale in that they involve lots of data with multimedia taken as broadly as possible to mean data of many types. We are particularly concerned about situations in which there is enough data of sufficiently specialized types that users have already made decisions about how to manage it, and have stored it in separate repositories that are specifically adapted to data of that type. “ Source: http://www.almaden.ibm.com/cs/garlic/

Background on Annotation The Garlic Approach Query tool C++ API Garlic Schema Object Oriented Middleware Metadata Image Wrapper Relational Wrapper Document Wrapper RDBMS Document Store Image Store Source:http://www.almaden.ibm.com/cs/garlic/

Background on Annotation Data Annotation in Collaborative Research Environments Michael Gertz, Department of Computer Science, University of California at Davis, Concept-based data annotation techniques for scientific databases “It is well accepted that the creation, management, and utilization of different forms of metadata play a major role in realizing information systems infrastructure that are able to provide a rich data query, sharing, and management techniques.” “We claim there is still a major gap between the creation of such semantic rich structures and the usage of these structures to actually enrich various forms of data.

Background on Annotation Concept Based Data Annotations Concepts (Base concepts and relationships type concepts Data Annotation Web accessible data Scientific Data at Site B Scientific Data at Site A Source: Dr. Michael Gertz, UC Davis

The Problem The discovery of information relies on the ability of scientists to find and access the correct data As such, grids and grid computing have emerged as an ever increasing means of sharing large of amounts of information among collaborating organizations. Searches conducted on annotation of metadata are still limited due to the fact that most database and grid applications are still using ad hoc data storage and retrieval techniques. Searches on information still rely on a scientists intimate knowledge of data location, format, and how to use specific applications An annotation tool capable of satisfying the requirements of the Biological community does not currently exist

Research Goals Improve the ability of biological researchers to search annotated databases for information they need to support their research or findings Suggest that such improvements can be applied to other scientific applications

Research Objectives Examine current methods of annotation Categorize general features of annotation Define systematic techniques that can be applied to current ad hoc annotation methods

Projected Accomplishments Identification of the functional areas within the data grid community Define a relationship model that applies to all scientific annotation Develop a transformation model whereby any annotation can expressed as an object ( data + operations)

Projected Activities Initial Plan is to use the new MorphBank Database to prove the concept Develop a new MorphBank Schema Develop a new MorphBank Website Develop a reliable and more capable multi-annotation software tool to replace the I-Note 1.0 annotation package Develop the methods and schemas that will allow scientists to extract different annotations from Biological images and other objects

Graphical Annotation There may be more then one image or object associated with a specimen No practical upper limit can be defined Standards are still being defined Each image or object may hold several pieces of information. Automating annotation is still in the early stages. Searching the image themselves for data is not feasible in large database systems. Searching large strings free entry text is also inefficient

Graphical Annotation(cont) Initially used the I-NOTE software to defined the requirements for the development of a new piece of software to work with Morphbank and on Windows XP/Linux. Will employ at least the ability to annotate any addressable object in the new tool with Morphbank to show that annotations can be mixed.

Morphology Publication Example Riccardi, Annotation Nov 5, 2004

Example of Extensible Annotation Riccardi, Annotation Nov 5, 2004

Example of Extensible Annotation Riccardi, Annotation Nov 5, 2004

Example of Extensible Annotation Riccardi, Annotation Nov 5, 2004

Source: http://www.iath.virginia.edu/inote/

Source: http://www.iath.virginia.edu/inote/

Limitations of I-note Software Currently not supported University of Virginia has cut funding for the project. Used University programmers for development Works only on a Windows 95 platform Code is not maintainable, development was accomplished in a Java Development Environment Development project was not documented. Could not attach other objects or documents Only worked on certain graphic images. Annotations were not scalable with the image Annotations were overlay images and had to be stored as full images. Cannot address multiple objects.

Generic Annotation Need to develop a method to store different annotations as objects Need to develop a method to search different annotations for similar or associated information Replace Ad Hoc queries with more systematic methods Higher level of ontology for annotations Need to determine the minimum amount of information needed to represent and access this object

Generic Annotation General Requirements Platform and architecture independent Stand-alone application that can function as a web services Looking at both server and client side applications Exchange of information must be done using web service features such as XML documents Annotation on images must include Multiple annotations per image/object Must not alter the original image/object Must include references to points and areas Must include text, graphics, and voice Must include the ability to make general annotation remarks Must be able to associate multiple objects with an annotation including other annotations

MorphBank Annotation Morphbank XML Morphbank Viewer Morphbank Browser Applet XML MorphBank RDMS Image Files

Biological Database Problems Taxonomy terms and definitions are not universally defined Any database system would have to accommodate different taxonomic structures Darwin Core standard is not sufficient to satisfy this problem Each Biological study group develops their own character codes and states There is no standardization Any database system would have to accommodate different character codes and states There is currently not enough justification for the different Biological communities to develop tight integration standards

Proposed MorphBank WebServices WORLD BROWSE INSERTION AND UPDATE BIOLOGICAL DATA ANALYSIS SEARCH & DISCOVERY ADMINISTRATION DATA DISPLAY HIGH LEVEL WEBSERVICES ANNOTATION DISCOVERY METADATA ANNOTATION USER VALIDATION & SECURITY ANNOTATION QUERY BIOLOGICAL QUERY ANNOTATION AGGREGATION DATA VALIDATION ANNOTATION DATA DISPLAY BIOLOGICAL DATA DISPLAY BIO DATA DISCOVERY CORE WEBSERVICES Web Services Access (update, insert, delete, query) SERVICE TRANSLATION LIBRARY METADATA HOLDINGS Other Bio DB Character State Catalog MorphBank XML Files Image XML Files MorphBank DB Image Files Based upon the Earth Systems Grid (ESG) Model

MorphBank Website DS3 DS2 DS1 Working Data Set Under Review World Read Intro Screen Info/Help WEB/DB Administration Login Restricted User World Browse Add Update Delete Annotate RU/Browse Browse DS3 DS2 DS1 Working Data Set Under Review World Read

Specimen Table # # Table structure for table 'specimen' CREATE TABLE specimen( MorphBankSpecimenID int(32) auto-increment NOT NULL, CatalogNumber varchar(128) NOT NULL, DateLastModified date NOT NULL default '0000-00-00', InstitutionCode varchar(128), CollectionCode varchar(128), ScientificName varchar(128), BasisOfRecord char(1), TSN int(32), CollectionNumber varchar (128), FieldNumber varchar (128), CollectorName (128), DateCollected date NOT NULL default '0000-00-00', TimeofDate time, ContinentOcean varchar(128),

Specimen Table – cont. # CONTINUED FROM PREVIOUS PAGE. Country varchar(56), StateProvince varchar(56), County varchar(56), Locality varchar(56), Latitude double, Longitude double, CoordinatePrecision int(8), MinimumElevation int(32), MaximumElevation int(32), MinimumDepth int(32), MaximumDepth int(32), Sex varchar(8), PreparationType varchar(255), IndividualCount int(32), PreviousCatalogNumber varchar(128), RelationshipType varchar(128), RelatedCatalogItem varchar (128), DevelopmentalStage varchar (128), Notes varchar(255), PRIMARY KEY(MorphBankSpecimenID)) TYPE=MyISAM #DEFAULT CHARSET=latin1; ;

Image Table # #Table Structure for Table 'image' CREATE TABLE image ( ImageID int(32) NOT NULL auto-increment, MorphBankSpecimenID int(32), ViewNumber int(32) , ImageScale varchar(64), XDimensionPixels int(32) NOT NULL, YDimensionPixels int(32) NOT NULL ResolutionInPixelsPerInch int(32) NOT NULL, OriginalFileName varchar (255) NOT NULL, Magnification varchar(128), ImageFileType varchar(128), PRIMARY KEY (ImageID)) TYPE=MyISAM #DEFAULT CHARSET=latin1; ;

Viewtable # #Table Structure for Table 'viewtable' CREATE TABLE viewtable ( ViewNumber int(32) NOT NULL, ImagingTechnique varchar (128), ImagingPreparationTechnique varchar (128), SpecimenPart varchar (128), ViewAngle varchar (128), Sex varchar(8), DevelopmentalStage varchar (128), PRIMARY KEY (ViewNumber)) TYPE=MyISAM #DEFAULT CHARSET=latin1; ;

Objectannotation Table # #Table Structure for Table 'imageannotation' CREATE TABLE imageannotation ( AcessionNumber int(32) ImageAnnotationSeqNo int(32) NOT NULL auto-incremental, CatalogNumber varchar(128) NOT NUL AnnotationLocX int(32), AnnotationLocy int(32), AnnotationRadius int(32), AnnotationTypeid int(32), PhylogeneticCharacterID int(32), PhylogeneticCharacterStateID int(32), AnnotationAuthor varchar(128), AnnotationDate date DEFAULT '0000-00-00', ImageID int(32), AnnotationObject varchar(255), PRIMARY KEY (ImageAnnotationSeqNo)) TYPE=MyISAM #DEFAULT CHARSET=latin1; ;

AnnotationType Table # #Table Structure for Table ‘annotationtype' CREATE TABLE annotationtype ( annotationtypeID int(32) NOT NULL auto-incremental, annotationtitle varchar(25), keywords varchar(255), description varchar(128), PRIMARY KEY (annotationtypeID)) TYPE=MyISAM #DEFAULT CHARSET=latin1; ;

PhylogeneticCode Table # #Table Structure for Table 'phylogeneticcode' CREATE TABLE phylogeneticcharacter ( PhylogeneticCharID int(32) NOT NULL auto-increment, CharacterNumber int(32), PublicationID int(32), TSN int(32), CharacterDescription varchar (128), ViewID int(32), Sex varchar(8), Stage varchar (128), SimilarEntries varchar (128), RelatedCharacterID int (32), RelationType varchar (128), SuggestedTaxonRange varchar (128), PRIMARY KEY (CharacterID)) TYPE=MyISAM #DEFAULT CHARSET=latin1; ;

Phylogeneticstate Table # # Table Structure for Table 'phylogeneticstate' CREATE TABLE phylogeneticstate ( StateID int(32) NOT NULL auto-increment, phylogeneticcharID int(32) NOT NULL, Description varchar(128), ImageID int(32), AnnotationSequenceNumber int(32), PRIMARY KEY (StateID)) TYPE=MyISAM #DEFAULT CHARSET=latin1; ;

SpecimenPhyChar Table # # Table Structure for Table ‘SpecimenPhyChar' CREATE TABLE SpecimenPhyChar( SpecimenPhyCharID int (32) NOT NULL Auto-increment, SpecimenID int (32) NOT NULL, PhylogeneticCharID int(32), ImageID int(32), ImageAnnotationSeqNo int (32), PRIMARY KEY (SpecimenPhyCharID)) TYPE=MyISAM #DEFAULT CHARSET=latin1;;

Publication Table # # Table Structure for Table 'PublicationTable' CREATE TABLE publicationtable ( PublicationID int (32) NOT NULL auto-inrement, PublicationAuthor varchar (128), PublicationYear char(4), PublicationJournal varchar (128), PublicationTitle varchar (128), PublicationPagesFrom int(32), PublicationPagesto int(32), PRIMARY KEY (PublicationID)) TYPE=MyISAM #DEFAULT CHARSET=latin1;;

UserTable # # Table Structure for Table 'UserTable' CREATE TABLE usertable ( UserID int (32) NOT NULL Auto-increment, Level int (8), UIN int (8), PIN int (16), Name varchar (128), Email varchar (128), Affiliation varchar (128), Address varchar (255), Country varchar (128), GroupID int(32), PRIMARY KEY (UserID)) TYPE=MyISAM #DEFAULT CHARSET=latin1;;

GroupTable # # Table Structure for Table 'Grouptable' CREATE TABLE grouptable ( GroupID int (32) NOT NULL, GroupName varchar (128) NOT NULL, User int(32), PRIMARY KEY (GroupID)) TYPE=MyISAM #DEFAULT CHARSET=latin1;;

Expected Challenges The effort is contingent upon development of a reliable annotation toolset Development of a generic biological schema Integration of web services with the new MorphBank system and other Biological Database Systems Obtaining consensus among the different participants on basic biology ontology issues Possible use of a general biological thesaurus

Masters Thesis/Projects MorphBank Requirements Analysis (Thesis/Project) MorphBank Module Implementation(Project) MorphBank Security (Thesis/Project) MorphBank Mirror Site Implementation (Thesis/Project) MorphBank Operational Site Procedures (Project)

Masters Thesis/Projects Biological Image eXchangE System (BIXES) A method and associate software to allow heterogeneous Biological Image Database Systems to exchange images and metadata (project/thesis) Biological Image Search technique (School of Computation Sciences research project/thesis)

Conclusion More efficient search on large scientific data systems Demonstrate that this application is works for biological databases Show this’s feasible for any scientific application Provide a new and supported annotation tool set that can be used across the web.