Exploitation of Structural Similarity in Semi-Structured Bioinformatics Data for Efficient Storage Construction Dongkyoo Shin Sejong.

Slides:



Advertisements
Similar presentations
Lecture 10 Disjoint Set ADT.
Advertisements

XML: Extensible Markup Language
XML DOCUMENTS AND DATABASES
By Daniela Floresu Donald Kossmann
XML and Enterprise Computing. What is XML? Stands for “Extensible Markup Language” –similar to SGML and HTML –document “tags” are used to define content.
An Introduction to XML Based on the W3C XML Recommendations.
Database Systems: Design, Implementation, and Management Tenth Edition
Bloom Based Filters for Hierarchical Data Georgia Koloniari and Evaggelia Pitoura University of Ioannina, Greece.
Describing Process Specifications and Structured Decisions Systems Analysis and Design, 7e Kendall & Kendall 9 © 2008 Pearson Prentice Hall.
Relational Databases for Querying XML Documents: Limitations & Opportunities VLDB`99 Shanmugasundaram, J., Tufte, K., He, G., Zhang, C., DeWitt, D., Naughton,
Xyleme A Dynamic Warehouse for XML Data of the Web.
Storing and Querying XML Data in Databases Anupama Soli
1 COS 425: Database and Information Management Systems XML and information exchange.
Storage of XML Data XML data can be stored in –Non-relational data stores Flat files –Natural for storing XML –But has all problems discussed in Chapter.
Storing and Querying Ordered XML Using a Relational Database System By Khang Nguyen Based on the paper of Igor Tatarinov and Statis Viglas.
Page 1 Multidatabase Querying by Context Ramon Lawrence, Ken Barker Multidatabase Querying by Context.
GCB/CIS 535 Microarray Topics John Tobias November 15 th, 2004.
Introduction to XML Rashmi Kukanur. XML XML stands for Extensible Markup Language XML was designed to carry data XML and HTML designed with different.
MARS: Microarray analysis, retrieval, and storage system Albert F. Cervantes.
1 Advanced Topics XML and Databases. 2 XML u Overview u Structure of XML Data –XML Document Type Definition DTD –Namespaces –XML Schema u Query and Transformation.
4/20/2017.
8/17/20151 Querying XML Database Using Relational Database System Rucha Patel MS CS (Spring 2008) Advanced Database Systems CSc 8712 Instructor : Dr. Yingshu.
1.A file is organized logically as a sequence of records. 2. These records are mapped onto disk blocks. 3. Files are provided as a basic construct in operating.
BIONFORMATIC ALGORITHMS Ryan Tinsley Brandon Lile May 9th, 2014.
XML-to-Relational Schema Mapping Algorithm ODTDMap Speaker: Artem Chebotko* Wayne State University Joint work with Mustafa Atay,
Storage CMSC 461 Michael Wilson. Database storage  At some point, database information must be stored in some format  It’d be impossible to store hundreds.
Lecture 6 of Advanced Databases XML Schema, Querying & Transformation Instructor: Mr.Ahmed Al Astal.
PowerPoint Presentation for Dennis & Haley Wixom, Systems Analysis and Design, 2 nd Edition Copyright 2003 © John Wiley & Sons, Inc. All rights reserved.
Data Mining CS157B Fall 04 Professor Lee By Yanhua Xue.
XML as a Boxwood Data Structure Feng Zhou, John MacCormick, Lidong Zhou, Nick Murphy, Chandu Thekkath 8/20/04.
The main mathematical concepts that are used in this research are presented in this section. Definition 1: XML tree is composed of many subtrees of different.
Chapter 27 The World Wide Web and XML. Copyright © 2004 Pearson Addison-Wesley. All rights reserved.27-2 Topics in this Chapter The Web and the Internet.
1 MIAME The MIAME website: © 2002 Norman Morrison for Manchester Bioinformatics.
A Summary of XISS and Index Fabric Ho Wai Shing. Contents Definition of Terms XISS (Li and Moon, VLDB2001) Numbering Scheme Indices Stored Join Algorithms.
南台科技大學 資訊工程系 A web page usage prediction scheme using sequence indexing and clustering techniques Adviser: Yu-Chiang Li Speaker: Gung-Shian Lin Date:2010/10/15.
Softsmith Infotech XML. Softsmith Infotech XML EXtensible Markup Language XML is a markup language much like HTML Designed to carry data, not to display.
XML 2nd EDITION Tutorial 1 Creating An Xml Document.
Copyright © 2004 Pearson Education, Inc.. Chapter 26 XML and Internet Databases.
1 maxdLoad The maxd website: © 2002 Norman Morrison for Manchester Bioinformatics.
2005 Epocrates, Inc. All rights reserved. Integrating XML with legacy relational data for publishing on handheld devices David A. Lee Senior member of.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Exam and Lecture Overview.
RRXS Redundancy reducing XML storage in relations O. MERT ERKUŞ A. ONUR DOĞUÇ
University of Crete Department of Computer Science ΗΥ-561 Web Data Management XML Data Archiving Konstantinos Kouratoras.
Clustering XML Documents for Query Performance Enhancement Wang Lian.
XP 1 Creating an XML Document Developing an XML Document for the Jazz Warehouse XML Tutorial.
Accessing Data Using XML CHAPTER NINE Matakuliah: T0063 – Pemrograman Visual Tahun: 2009.
XML technologies for text encoding Tamás Váradi
Chapter 27 The World Wide Web and XML. Copyright © 2004 Pearson Addison-Wesley. All rights reserved.27-2 Topics in this Chapter The Web and the Internet.
XML Introduction. Markup Language A markup language must specify What markup is allowed What markup is required How markup is to be distinguished from.
XML and Database.
CS 157B: Database Management Systems II February 11 Class Meeting Department of Computer Science San Jose State University Spring 2013 Instructor: Ron.
1 Outline Standardization - necessary components –what information should be exchanged –how the information should be exchanged –common terms (ontologies)
Structured Documents - XML and FrameMaker 7 Asit Pant.
Working with XML. Markup Languages Text-based languages based on SGML Text-based languages based on SGML SGML = Standard Generalized Markup Language SGML.
BINARY TREES Objectives Define trees as data structures Define the terms associated with trees Discuss tree traversal algorithms Discuss a binary.
Efficient Discovery of XML Data Redundancies Cong Yu and H. V. Jagadish University of Michigan, Ann Arbor - VLDB 2006, Seoul, Korea September 12 th, 2006.
SEMI-STRUCTURED DATA (XML) 1. SEMI-STRUCTURED DATA ER, Relational, ODL data models are all based on schema Structure of data is rigid and known is advance.
CHAPTER NINE Accessing Data Using XML. McGraw Hill/Irwin ©2002 by The McGraw-Hill Companies, Inc. All rights reserved Introduction The eXtensible.
ArrayExpress Ugis Sarkans EMBL - EBI
Department of Computer Science Sir Syed University of Engineering & Technology, Karachi-Pakistan. Presentation Title: DATA MINING Submitted By.
XML Databases Presented By: Pardeep MT15042 Anurag Goel MT15006.
1 XML and XML in DLESE Katy Ginger November 2003.
XML: Extensible Markup Language
XML QUESTIONS AND ANSWERS
Data Models.
XML and Databases.
2/18/2019.
Metadata use in the Statistical Value Chain
Unit 6 - XML Transformations
New Perspectives on XML
Presentation transcript:

Exploitation of Structural Similarity in Semi-Structured Bioinformatics Data for Efficient Storage Construction Dongkyoo Shin Sejong University, InCob2007

Multimedia & Internet Laboratory, Sejong University2/20 Table of contents Abstract Background Methods Results Conclusions

Multimedia & Internet Laboratory, Sejong University3/20 Abstract (1) Background –Many researches related to storing XML data Reduce the number of joins between tables Not proper to microarray data with distinctive hierarchy –Hierarchical feature of microarray data model a few core values occurs iteratively –New approach for capturing the feature Class elements with similar structure into a group Design common database table for the group

Multimedia & Internet Laboratory, Sejong University4/20 Abstract (2) Results –Database schema created by our approach Reduce the number of table joins remarkably Improve performance of storing and loading XML-based microarray data Conclusions –Efficient way to improve performance of microarray data is mining structural similarity of elements

Multimedia & Internet Laboratory, Sejong University5/20 Background (1) DTD (Data Type Definition)-dependent base –Map one element into one table For each e  E, #(S) ≥1 OR #(A) ≥1 -> define_Class(e) For each Se  S -> Add_attributes_of_Class(e) Se  SequenceType -> Define_multivalued_att(Se, e)

Multimedia & Internet Laboratory, Sejong University6/20 Background (2) Inline technique base –Reduce the complexity of DTD (Data Type Definition) For each e, #(S) == 1 AND Se  SequenceType -> Add_Multi-valued_attribute_of_Paren-tClass(e)

Multimedia & Internet Laboratory, Sejong University7/20 Background (3) Drawback of previous approaches –DTD-dependent Database schema has the same complexity with DTD –Inline technique Strongly depend on the number of omissible elements New design approach for microarray database –Capture similar structural features of microarray data –Need fast and simple way to mine the structural features

Multimedia & Internet Laboratory, Sejong University8/20 Background (5) Microarray data and MAGE (Microarray Gene Expression) standards –Research groups share microarray data with others, and use it to solve their biological questions –MGED society’s standard definitions MIAME (Minimum Information for the Annotation of a Microarray Experiment) MAGE-OM and MAGE-ML –Exchange object model and format for MIAME –Structural feature of MAGE-OM a variety set of objects defining the same data types including complex types.

Multimedia & Internet Laboratory, Sejong University9/20 Background (6) Decision Tree –a simple model for easy understanding classification rules correlations, and effects between variables –Proper for mining structural features of MAGE-ML DTD itself (Not MAGE-ML instances !!!) Possible to classify all elements three levels: –A root, mediators group, and bottoms group

Multimedia & Internet Laboratory, Sejong University10/20 Methods (1) Classification of core features using decision tree –Terminologies for expression of a complexType e: an element defined in XML schema E: an elements set of e SE: a sub-elements set of e a: an attribute of e A: an attributes set of e SA: an attributes set for all sub-elements of e complexType: Structural information that consists of SE and (or) A of e. Lowest child: an element without a sub-element Lowest parent: an element with a sub-element that is one of the lowest child elements PG (Parent Group): a set of candidate elements to be parents of a Lowest Child LPCG (The Lowest Parent Candidate Group): a set of candidates to be Lowest Parent LCG (The Lowest Child Group): a set of Lowest child elements LPG (The Lowest Parent Group): a set of Lowest Parent elements ULPG (Upper Level Parent Group): a set of upper level parents, including elements that are neither Lowest Child nor Lowest Parent

Multimedia & Internet Laboratory, Sejong University11/20 Methods (2) Expression of a complexType –A complexType defines structural information of elements A set of arrays including data type Definition of structural similarity SEelex = {e1, e2, …, en}, SAelex = {Ae1, Ae2, …, Aen} complexType(elex) = {SEelex, SAelex} complexType(elex) == complexType(eley)

Multimedia & Internet Laboratory, Sejong University12/20 Methods (3) Decision Tree for recognizing the core features –Condition 1: If rule 1 is satisfied, then e arrives at LCG. Otherwise, it arrives at PG. –Condition 2: If rule 2 is satisfied, then e and its similar element e arrive at a new LCG. –Condition 3: If rule 3 is satisfied, then e arrives at LPG. Otherwise, it arrives at ULPG. –Condition 4: If rule 4 is satisfied, then e and elements similar to e arrive at a new LPG.

Multimedia & Internet Laboratory, Sejong University13/20 Methods (4) Classification rules –Rule 1 Decide that an element should belong to group LCG or PG For each ei  E { if(number of elements in SEei == 0){ ei is classified into LCG; }else{ ei is classified into PG; }

Multimedia & Internet Laboratory, Sejong University14/20 Methods (5) Classification rules –Rule 2 Classify multiple sets of LCG p = 0; For each ei  LCG 0 { Flag=0; If (p>0) { For q=1 to p If (complexType(ei) = complexType(element in LCGq) { ei is classified into LCGq; Flag=1; } If (Flag==0) { For each ej  LCG 0 if(complexType(ei) = complexType(ej) { p=p+1; ei and ej are classified into a new group of LCGp; }

Multimedia & Internet Laboratory, Sejong University15/20 Methods (6) Classification rules –Rule 3 Separate elements in PG into two groups: LPG and ULPG For each ei  PG { if(SEei  LCG) { ei is classified into LPG; }else{ ei is classified into ULPG; }

Multimedia & Internet Laboratory, Sejong University16/20 Methods Classification rules –Rule 4 Classify multiple sets of LPG p = 0; For each ei  LPG 0 { Flag=0; If (p>0) { For q=1 to p If (complexType(ei) = complexType(element in LPGq) { ei is classified into LPGq; Flag=1; } If (Flag==0) { For each ej  LPG 0 if(complexType(ei) = complexType(ej) { p=p+1; ei and ej are classified into a new group of LPGp; }

Multimedia & Internet Laboratory, Sejong University17/20 Result (1) Database design by the proposed decision tree

Multimedia & Internet Laboratory, Sejong University18/20 Result (2) Database space complexity Time complexity Raw schemaClassified schema Total classes Total tables Total records Total DB size710 (Kb)27 (Kb)

Multimedia & Internet Laboratory, Sejong University19/20 Result (3) Reconstructing the XML Document

Multimedia & Internet Laboratory, Sejong University20/20 Conclusions Proposed approach –Mine elements with structural similarity from XML Schema for biological information –Experimental result Mining structural similarity of object model is proper to microarray data and more efficient than previous approaches Future work –Plan to extend current classification rules to root, LCG, LPG, ULPG respectively