Presentation is loading. Please wait.

Presentation is loading. Please wait.

November 18, 2004 David A. Gaitros Department of Computer Science

Similar presentations


Presentation on theme: "November 18, 2004 David A. Gaitros Department of Computer Science"— Presentation transcript:

1 Representation of Ontology Annotation Information in Grid Computing Prospectus
November 18, 2004 David A. Gaitros Department of Computer Science Florida State University Welcome, I am here today to present my dissertation topic in annotation technology.

2 Overview Background on Annotation The Problem Research Goals
Research Objectives Projected Accomplishments Projected Activities Graphical Annotation Generic Annotation Biological Database Problems Proposed Web Services Implementation Morphbank Database Schema Expected Challenges Masters Thesis/Projects Conclusion These are the topics I am covering

3 Background on Annotation
Scientific Annotation Middleware (SAM) “We are creating a Scientific Annotation Middleware (SAM) system that will provide researchers and developers with the capabilities necessary to manage the complexity resulting from the collaborative, cross-disciplinary, compute-intensive research. SAM will include components and services that enable researchers, applications, problem solving environments (PSE) and software agents to create metadata and annotations about data objects and the semantic relationships between them. Human access to the middleware will be through a researcher’s notebook interface available via desktop computers and PDA devices. “ Source: I want to go over a few of the topics in annotation technology to give eveyone a very brief background in the current state. SAM ( Scientific Annotation Middleware) is under development by the US Department of Energy by Al Geist and Jim Myers They are in the process of developing a middleware product that combines desktop scientific notebooks with data grid technologies in what they say is a heterogeneous environment. Started in 2000 they are in version 1.2 of the SAM software. Limited release.

4 Background on Annotation

5 Background on Annotation
Garlic – IBM “Garlic is a project being developed by members of the database group in Computer Science. The goal of Garlic is to enable large-scale multimedia information systems: large scale in that they involve lots of data with multimedia taken as broadly as possible to mean data of many types. We are particularly concerned about situations in which there is enough data of sufficiently specialized types that users have already made decisions about how to manage it, and have stored it in separate repositories that are specifically adapted to data of that type. “ Source:

6 Background on Annotation
The Garlic Approach Query tool C++ API Garlic Schema Object Oriented Middleware Metadata Image Wrapper Relational Wrapper Document Wrapper RDBMS Document Store Image Store Source:

7 Background on Annotation
Data Annotation in Collaborative Research Environments Michael Gertz, Department of Computer Science, University of California at Davis, Concept-based data annotation techniques for scientific databases “It is well accepted that the creation, management, and utilization of different forms of metadata play a major role in realizing information systems infrastructure that are able to provide a rich data query, sharing, and management techniques.” “We claim there is still a major gap between the creation of such semantic rich structures and the usage of these structures to actually enrich various forms of data.

8 Background on Annotation
Concept Based Data Annotations Concepts (Base concepts and relationships type concepts Data Annotation Web accessible data Scientific Data at Site B Scientific Data at Site A Source: Dr. Michael Gertz, UC Davis

9 The Problem The discovery of information relies on the ability of scientists to find and access the correct data As such, grids and grid computing have emerged as an ever increasing means of sharing large of amounts of information among collaborating organizations. Searches conducted on annotation of metadata are still limited due to the fact that most database and grid applications are still using ad hoc data storage and retrieval techniques. Searches on information still rely on a scientists intimate knowledge of data location, format, and how to use specific applications An annotation tool capable of satisfying the requirements of the Biological community does not currently exist

10 Research Goals Improve the ability of biological researchers to search annotated databases for information they need to support their research or findings Suggest that such improvements can be applied to other scientific applications

11 Research Objectives Examine current methods of annotation
Categorize general features of annotation Define systematic techniques that can be applied to current ad hoc annotation methods

12 Projected Accomplishments
Identification of the functional areas within the data grid community Define a relationship model that applies to all scientific annotation Develop a transformation model whereby any annotation can expressed as an object ( data + operations)

13 Projected Activities Initial Plan is to use the new MorphBank Database to prove the concept Develop a new MorphBank Schema Develop a new MorphBank Website Develop a reliable and more capable multi-annotation software tool to replace the I-Note 1.0 annotation package Develop the methods and schemas that will allow scientists to extract different annotations from Biological images and other objects

14 Graphical Annotation There may be more then one image or object associated with a specimen No practical upper limit can be defined Standards are still being defined Each image or object may hold several pieces of information. Automating annotation is still in the early stages. Searching the image themselves for data is not feasible in large database systems. Searching large strings free entry text is also inefficient

15 Graphical Annotation(cont)
Initially used the I-NOTE software to defined the requirements for the development of a new piece of software to work with Morphbank and on Windows XP/Linux. Will employ at least the ability to annotate any addressable object in the new tool with Morphbank to show that annotations can be mixed.

16 Morphology Publication Example
Riccardi, Annotation Nov 5, 2004

17 Example of Extensible Annotation
Riccardi, Annotation Nov 5, 2004

18 Example of Extensible Annotation
Riccardi, Annotation Nov 5, 2004

19 Example of Extensible Annotation
Riccardi, Annotation Nov 5, 2004

20 Source: http://www.iath.virginia.edu/inote/

21 Source: http://www.iath.virginia.edu/inote/

22 Limitations of I-note Software
Currently not supported University of Virginia has cut funding for the project. Used University programmers for development Works only on a Windows 95 platform Code is not maintainable, development was accomplished in a Java Development Environment Development project was not documented. Could not attach other objects or documents Only worked on certain graphic images. Annotations were not scalable with the image Annotations were overlay images and had to be stored as full images. Cannot address multiple objects.

23 Generic Annotation Need to develop a method to store different annotations as objects Need to develop a method to search different annotations for similar or associated information Replace Ad Hoc queries with more systematic methods Higher level of ontology for annotations Need to determine the minimum amount of information needed to represent and access this object

24 Generic Annotation General Requirements
Platform and architecture independent Stand-alone application that can function as a web services Looking at both server and client side applications Exchange of information must be done using web service features such as XML documents Annotation on images must include Multiple annotations per image/object Must not alter the original image/object Must include references to points and areas Must include text, graphics, and voice Must include the ability to make general annotation remarks Must be able to associate multiple objects with an annotation including other annotations

25 MorphBank Annotation Morphbank XML Morphbank Viewer Morphbank Browser
Applet XML MorphBank RDMS Image Files

26 Biological Database Problems
Taxonomy terms and definitions are not universally defined Any database system would have to accommodate different taxonomic structures Darwin Core standard is not sufficient to satisfy this problem Each Biological study group develops their own character codes and states There is no standardization Any database system would have to accommodate different character codes and states There is currently not enough justification for the different Biological communities to develop tight integration standards

27 Proposed MorphBank WebServices
WORLD BROWSE INSERTION AND UPDATE BIOLOGICAL DATA ANALYSIS SEARCH & DISCOVERY ADMINISTRATION DATA DISPLAY HIGH LEVEL WEBSERVICES ANNOTATION DISCOVERY METADATA ANNOTATION USER VALIDATION & SECURITY ANNOTATION QUERY BIOLOGICAL QUERY ANNOTATION AGGREGATION DATA VALIDATION ANNOTATION DATA DISPLAY BIOLOGICAL DATA DISPLAY BIO DATA DISCOVERY CORE WEBSERVICES Web Services Access (update, insert, delete, query) SERVICE TRANSLATION LIBRARY METADATA HOLDINGS Other Bio DB Character State Catalog MorphBank XML Files Image XML Files MorphBank DB Image Files Based upon the Earth Systems Grid (ESG) Model

28 MorphBank Website DS3 DS2 DS1 Working Data Set Under Review World Read
Intro Screen Info/Help WEB/DB Administration Login Restricted User World Browse Add Update Delete Annotate RU/Browse Browse DS3 DS2 DS1 Working Data Set Under Review World Read

29 Specimen Table # # Table structure for table 'specimen'
CREATE TABLE specimen( MorphBankSpecimenID int(32) auto-increment NOT NULL, CatalogNumber varchar(128) NOT NULL, DateLastModified date NOT NULL default ' ', InstitutionCode varchar(128), CollectionCode varchar(128), ScientificName varchar(128), BasisOfRecord char(1), TSN int(32), CollectionNumber varchar (128), FieldNumber varchar (128), CollectorName (128), DateCollected date NOT NULL default ' ', TimeofDate time, ContinentOcean varchar(128),

30 Specimen Table – cont. # CONTINUED FROM PREVIOUS PAGE.
Country varchar(56), StateProvince varchar(56), County varchar(56), Locality varchar(56), Latitude double, Longitude double, CoordinatePrecision int(8), MinimumElevation int(32), MaximumElevation int(32), MinimumDepth int(32), MaximumDepth int(32), Sex varchar(8), PreparationType varchar(255), IndividualCount int(32), PreviousCatalogNumber varchar(128), RelationshipType varchar(128), RelatedCatalogItem varchar (128), DevelopmentalStage varchar (128), Notes varchar(255), PRIMARY KEY(MorphBankSpecimenID)) TYPE=MyISAM #DEFAULT CHARSET=latin1; ;

31 Image Table # #Table Structure for Table 'image' CREATE TABLE image (
ImageID int(32) NOT NULL auto-increment, MorphBankSpecimenID int(32), ViewNumber int(32) , ImageScale varchar(64), XDimensionPixels int(32) NOT NULL, YDimensionPixels int(32) NOT NULL ResolutionInPixelsPerInch int(32) NOT NULL, OriginalFileName varchar (255) NOT NULL, Magnification varchar(128), ImageFileType varchar(128), PRIMARY KEY (ImageID)) TYPE=MyISAM #DEFAULT CHARSET=latin1; ;

32 Viewtable # #Table Structure for Table 'viewtable'
CREATE TABLE viewtable ( ViewNumber int(32) NOT NULL, ImagingTechnique varchar (128), ImagingPreparationTechnique varchar (128), SpecimenPart varchar (128), ViewAngle varchar (128), Sex varchar(8), DevelopmentalStage varchar (128), PRIMARY KEY (ViewNumber)) TYPE=MyISAM #DEFAULT CHARSET=latin1; ;

33 Objectannotation Table
# #Table Structure for Table 'imageannotation' CREATE TABLE imageannotation ( AcessionNumber int(32) ImageAnnotationSeqNo int(32) NOT NULL auto-incremental, CatalogNumber varchar(128) NOT NUL AnnotationLocX int(32), AnnotationLocy int(32), AnnotationRadius int(32), AnnotationTypeid int(32), PhylogeneticCharacterID int(32), PhylogeneticCharacterStateID int(32), AnnotationAuthor varchar(128), AnnotationDate date DEFAULT ' ', ImageID int(32), AnnotationObject varchar(255), PRIMARY KEY (ImageAnnotationSeqNo)) TYPE=MyISAM #DEFAULT CHARSET=latin1; ;

34 AnnotationType Table # #Table Structure for Table ‘annotationtype'
CREATE TABLE annotationtype ( annotationtypeID int(32) NOT NULL auto-incremental, annotationtitle varchar(25), keywords varchar(255), description varchar(128), PRIMARY KEY (annotationtypeID)) TYPE=MyISAM #DEFAULT CHARSET=latin1; ;

35 PhylogeneticCode Table
# #Table Structure for Table 'phylogeneticcode' CREATE TABLE phylogeneticcharacter ( PhylogeneticCharID int(32) NOT NULL auto-increment, CharacterNumber int(32), PublicationID int(32), TSN int(32), CharacterDescription varchar (128), ViewID int(32), Sex varchar(8), Stage varchar (128), SimilarEntries varchar (128), RelatedCharacterID int (32), RelationType varchar (128), SuggestedTaxonRange varchar (128), PRIMARY KEY (CharacterID)) TYPE=MyISAM #DEFAULT CHARSET=latin1; ;

36 Phylogeneticstate Table
# # Table Structure for Table 'phylogeneticstate' CREATE TABLE phylogeneticstate ( StateID int(32) NOT NULL auto-increment, phylogeneticcharID int(32) NOT NULL, Description varchar(128), ImageID int(32), AnnotationSequenceNumber int(32), PRIMARY KEY (StateID)) TYPE=MyISAM #DEFAULT CHARSET=latin1; ;

37 SpecimenPhyChar Table
# # Table Structure for Table ‘SpecimenPhyChar' CREATE TABLE SpecimenPhyChar( SpecimenPhyCharID int (32) NOT NULL Auto-increment, SpecimenID int (32) NOT NULL, PhylogeneticCharID int(32), ImageID int(32), ImageAnnotationSeqNo int (32), PRIMARY KEY (SpecimenPhyCharID)) TYPE=MyISAM #DEFAULT CHARSET=latin1;;

38 Publication Table # # Table Structure for Table 'PublicationTable'
CREATE TABLE publicationtable ( PublicationID int (32) NOT NULL auto-inrement, PublicationAuthor varchar (128), PublicationYear char(4), PublicationJournal varchar (128), PublicationTitle varchar (128), PublicationPagesFrom int(32), PublicationPagesto int(32), PRIMARY KEY (PublicationID)) TYPE=MyISAM #DEFAULT CHARSET=latin1;;

39 UserTable # # Table Structure for Table 'UserTable'
CREATE TABLE usertable ( UserID int (32) NOT NULL Auto-increment, Level int (8), UIN int (8), PIN int (16), Name varchar (128), varchar (128), Affiliation varchar (128), Address varchar (255), Country varchar (128), GroupID int(32), PRIMARY KEY (UserID)) TYPE=MyISAM #DEFAULT CHARSET=latin1;;

40 GroupTable # # Table Structure for Table 'Grouptable'
CREATE TABLE grouptable ( GroupID int (32) NOT NULL, GroupName varchar (128) NOT NULL, User int(32), PRIMARY KEY (GroupID)) TYPE=MyISAM #DEFAULT CHARSET=latin1;;

41 Expected Challenges The effort is contingent upon development of a reliable annotation toolset Development of a generic biological schema Integration of web services with the new MorphBank system and other Biological Database Systems Obtaining consensus among the different participants on basic biology ontology issues Possible use of a general biological thesaurus

42 Masters Thesis/Projects
MorphBank Requirements Analysis (Thesis/Project) MorphBank Module Implementation(Project) MorphBank Security (Thesis/Project) MorphBank Mirror Site Implementation (Thesis/Project) MorphBank Operational Site Procedures (Project)

43 Masters Thesis/Projects
Biological Image eXchangE System (BIXES) A method and associate software to allow heterogeneous Biological Image Database Systems to exchange images and metadata (project/thesis) Biological Image Search technique (School of Computation Sciences research project/thesis)

44 Conclusion More efficient search on large scientific data systems
Demonstrate that this application is works for biological databases Show this’s feasible for any scientific application Provide a new and supported annotation tool set that can be used across the web.


Download ppt "November 18, 2004 David A. Gaitros Department of Computer Science"

Similar presentations


Ads by Google