CCCT-041 Semantic Extensions to Domain- Specific Markup Languages Aparna Varde, Elke Rundensteiner, Murali Mani, Mohammed Maniruzzaman and Richard D. Sisson.

Slides:



Advertisements
Similar presentations
Relational Database and Data Modeling
Advertisements

Database Design: ER Modelling (Continued)
XML DOCUMENTS AND DATABASES
Analysis Modeling.
Lecturer: Sebastian Coope Ashton Building, Room G.18 COMP 201 web-page: Lecture.
The Relational Model System Development Life Cycle Normalisation
Xyleme A Dynamic Warehouse for XML Data of the Web.
Requirements Specification
Physical Database Monitoring and Tuning the Operational System.
Software Requirements
The QuenchMiner ™ Expert System for Quenching and Distortion Control Aparna S. Varde, Mohammed Maniruzzaman, Elke Rundensteiner and Richard D. Sisson Jr.
XML –Query Languages, Extracting from Relational Databases ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria.
Copyright © 2012 Pearson Education, Inc. Publishing as Prentice Hall 7.1.
1 Augmenting MatML with Heat Treating Semantics Aparna Varde, Elke Rundensteiner, Murali Mani Mohammed Maniruzzaman and Richard D. Sisson Jr. Worcester.
Computational Estimation of Heat Transfer Curves for Microstructure Prediction and Decision Support Aparna S. Varde, Mohammed Maniruzzaman, Elke A. Rundensteiner.
Methodology Conceptual Database Design
Mgt 20600: IT Management & Applications Databases Tuesday April 4, 2006.
Unit 4 – XML Schema XML - Level I Basic.
Manohar – Why XML is Required Problem: We want to save the data and retrieve it further or to transfer over the network. This.
Logical Database Design Nazife Dimililer. II - Logical Database Design Two stages –Building and validating local logical model –Building and validating.
Trisha Cummings.  Most people involved in application development follow some kind of methodology.  A methodology is a prescribed set of processes through.
Copyright © 2004 Pearson Education, Inc. Chapter 1 Introduction.
Ch5: ER Diagrams - Part 2 Much of the material presented in these slides was developed by Dr. Ramon Lawrence at the University of Iowa.
Entity-Relationship modeling Transparencies
Entity-relationship Modeling Transparencies 1. ©Pearson Education 2009 Objectives How to use ER modeling in database design. The basic concepts of an.
Information storage: Introduction of database 10/7/2004 Xiangming Mu.
DATABASE MANAGEMENT SYSTEMS BASIC CONCEPTS 1. What is a database? A database is a collection of data which can be used: alone, or alone, or combined /
Lecture 21 XML querying. 2 XSL (eXtensible Stylesheet Language) In HTML, default styling is built into browsers as tag set for HTML is predefined and.
9/14/2012ISC329 Isabelle Bichindaritz1 Database System Life Cycle.
CSCI 3140 Module 2 – Conceptual Database Design Theodore Chiasson Dalhousie University.
Database Processing: Fundamentals, Design and Implementation, 9/e by David M. KroenkeChapter 2/1 Copyright © 2004 Please……. No Food Or Drink in the class.
Concepts and Terminology Introduction to Database.
Schema-Based Query Optimization for XQuery over XML Streams Hong Su Elke A. Rundensteiner Murali Mani Worcester Polytechnic Institute, Massachusetts, USA.
Of 33 lecture 3: xml and xml schema. of 33 XML, RDF, RDF Schema overview XML – simple introduction and XML Schema RDF – basics, language RDF Schema –
Designing Semantics-Preserving Cluster Representatives for Scientific Input Conditions Aparna Varde, Elke Rundensteiner, Carolina Ruiz, David Brown, Mohammed.
©2003 Prentice Hall Business Publishing, Accounting Information Systems, 9/e, Romney/Steinbart 4-1 Accounting Information Systems 9 th Edition Marshall.
Approaching a Problem Where do we start? How do we proceed?
1/26/2004TCSS545A Isabelle Bichindaritz1 Database Management Systems Design Methodology.
DATABASE MGMT SYSTEM (BCS 1423) Chapter 5: Methodology – Conceptual Database Design.
Lecture2: Database Environment Prepared by L. Nouf Almujally & Aisha AlArfaj 1 Ref. Chapter2 College of Computer and Information Sciences - Information.
1 Chapter 1 Introduction. 2 Introduction n Definition A database management system (DBMS) is a general-purpose software system that facilitates the process.
Lecture2: Database Environment Prepared by L. Nouf Almujally 1 Ref. Chapter2 Lecture2.
10/10/2012ISC239 Isabelle Bichindaritz1 Physical Database Design.
1 Introduction to Software Engineering Lecture 1.
Methodology - Conceptual Database Design
1. Objectives At the end of this chapter you should be able to:  Discuss the use and features of a data model  Define the terms entity and attribute.
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
Part4 Methodology of Database Design Chapter 07- Overview of Conceptual Database Design Lu Wei College of Software and Microelectronics Northwestern Polytechnical.
A radiologist analyzes an X-ray image, and writes his observations on papers  Image Tagging improves the quality, consistency.  Usefulness of the data.
Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.
Entity-Relation Model. E-R Model The Entity-Relationship (ER) model was originally proposed by Peter in 1976 ER model is a conceptual data model that.
©2003 Prentice Hall Business Publishing, Accounting Information Systems, 9/e, Romney/Steinbart 4-1 Relational Databases.
Jemerson Pedernal IT 2.1 FUNDAMENTALS OF DATABASE APPLICATIONS by PEDERNAL, JEMERSON G. [BS-Computer Science] Palawan State University Computer Network.
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
©2003 Prentice Hall Business Publishing, Accounting Information Systems, 9/e, Romney/Steinbart 4-1 Relational Databases.
Web-based Data Mining for Quenching Data Analysis Aparna S. Varde, Makiko Takahashi, Mohammed Maniruzzaman, Richard D. Sisson Jr. Center for Heat Treating.
1 The T4SQL Temporal Query Language Presented by 黃泰豐 2007/12/26.
Enhanced Entity-Relationship and UML Modeling. 2.
WELCOME TO OUR PRESENTATION UNIFIED MODELING LANGUAGE (UML)
IT 5433 LM2 ER & EER Model. Learning Objectives: Explain importance of data modeling Define and use the entity-relationship model Define E/R terms Describe.
Introduction: Databases and Database Systems Lecture # 1 June 19,2012 National University of Computer and Emerging Sciences.
Informatics for Scientific Data Bio-informatics and Medical Informatics Week 9 Lecture notes INF 380E: Perspectives on Information.
COP Introduction to Database Structures
XML: Extensible Markup Language
Datab ase Systems Week 1 by Zohaib Jan.
Outline of the ER Model By S.Saha
Databases and Information Management
Teaching slides Chapter 8.
Database Systems Instructor Name: Lecture-3.
Databases and Information Management
Presentation transcript:

CCCT-041 Semantic Extensions to Domain- Specific Markup Languages Aparna Varde, Elke Rundensteiner, Murali Mani, Mohammed Maniruzzaman and Richard D. Sisson Jr. Worcester Polytechnic Institute (WPI) Worcester, Massachusetts, USA

CCCT-042 Introduction XML, the eXtensible Markup Language: Widespread standard in storing and publishing data. Domain-specific markup languages designed with XML tag sets. Standardization bodies extend these to include additional semantics. Aspects such domain knowledge, XML constraints are important. Focus of Paper: Generic issues in extending markup languages.

CCCT-043 Domain-specific markup language Medium of communication for potential users of the domain. Users: industries, consumers, universities, research organizations, publishers etc. Follows XML syntax. Encompasses the semantics of the domain. Examples MML: Medical Markup Language MatML: Materials Science Markup Language Markup Language Industries Consumers Universities Research Organizations Publishers

CCCT-044 MML: Medical Markup Language Creates standards for medical data to be stored and accessed worldwide. MML module contents, e.g., “basic clinic information”, “surgery record information”. Used by primary care physicians, general surgeons etc. Specific information in sub-areas such as “opthalmology” cannot be stored with these modules. Thus there is need for more semantics in MML.

CCCT-045 Motivation for extension to markup languages Analogous to medical domain and opthalmology there are specifics in other domains. Why not define a new markup language for each aspect? –Typically basic information in generic language that needs cross-referencing, e.g., basic surgical details in opthalmology. –Common information should not be stored twice. Advisable to extend existing markup language with additional semantics.

CCCT-046 Extending the Materials Science Markup Language, MatML MatML: Materials Science Markup Language. XML for materials property data. Heat Treating: controlled heating and cooling of materials to achieve desired mechanical and thermal properties. Need to include semantics of Heat Treating in MatML. At WPI, Heat Treating extension to MatML is proposed. Several issues, domain-specific and XML-related crucial here. …………… ……………... ………………….

CCCT-047 General issues in extending any markup language Steps essential in markup language extension. Desired language features. XML schema constraints. Retrieval using XQuery.

CCCT-048 Steps essential in markup language extension 1.Understand domain semantics. 2.Model the data. 3.Conduct interviews. 4.Define the ontology. 5.Reiterate the ontology. 6.Outline the initial schema. 7.Revise the schema based on critical reviews.

CCCT Understand domain semantics Acquire domain knowledge: terminology, processes, entities etc. This helps determine essential tags to store data in the domain. Study existing markup language in detail. This is to understand where exactly it needs extension.

CCCT Model the data Build data model after studying domain. Use techniques such as Entity-Relationship diagrams. Thus represent domain entities, their properties and relationships. Subset of E-R Diagram for Heat Treating

CCCT Conduct interviews Needs of potential users are important. This helps determine entities and attributes in extension. Users: industries, universities, research organizations, publishers etc. Domain experts can identify needs of users. Hence, interview the domain experts.

CCCT Define the ontology Ontology serves as established lingo for the domain. Hence defining ontology is important to proceed with design. Issues Synonyms: two or more words with same meaning, e.g., in financial domain, “salary” and “income”. Homographs: one word with multiple meanings, e.g., “share” in financial domain could refer to “sharing of assets” or “shares in the stock market”. Clarify such terms with reference to context through ontology.

CCCT Reiterate the ontology Once ontology established, useful to have another round of discussions with experts. Additional discussions with domain experts may lead to further clarifications. –Example: remove existing entities, create new ones, based on terminology. Accordingly ontology needs to be altered. Use this ontology for schema design. High-level ontology for Heat Treating

CCCT Outline the initial schema Schema provides structure, i.e., defines grammar for the markup language. Once data model and ontology are approved by domain experts, outline the initial schema. Adhere to the syntax of original markup language to be accommodated as extension. Partial snapshot of schema for Heat Treating extension to MatML.

CCCT Revise the schema based on critical reviews Initial schema serves as medium of communication between designers and users. This is subject to further changes until domain experts are satisfied. Schema revision may involve several iterations. Some of these include discussions with standards bodies. For proposed extension to be accepted as worldwide standard, it must be approved by experts & standards bodies.

CCCT-0416 Desired language features 1.Avoid redundancy. 2.Make information non-ambiguous. 3.Provide easy interpretability of data. 4.Capture domain constraints in the schema.

CCCT Avoid redundancy Markup language extension should be such that duplication of storage is avoided. Data stored in the original markup language should be cross-referenced in the extension. Example –In medical domain, there should be cross-referencing between “basic clinic information” in the original language and “opthalmological details” in the extension. Schema should be structured accordingly.

CCCT Make information non-ambiguous Domain terminology, its semantics, aspects such as synonyms / homographs are significant. The schema design should adhere to the ontology to avoid ambiguity. Annotations should be included within the schema to enhance clarity. Example: –For spectacle prescriptions in opthalmology, include meanings of terms “myope” and “hypermetrope” in schema as annotations.

CCCT Provide easy interpretability of data Data is stored using markup language tags. Readers should be able to interpret this data without much reference to the literature. Thus the schema design should be organized accordingly. Example: –In science and engineering domains, experimental conditions should be stored close to results to enhance readability.

CCCT Capture domain constraints in the schema Certain requirements imposed by the domain need to be captured in schema. Done through XML constraints feature. Some constraints –Primary key: To uniquely identify an entity. –Choice: To declare mutually exclusive elements. Example: In financial domain, a person could be either “insolvent” (bankrupt) or “asset-holder” but not both.

CCCT-0421 XML schema constraints 1.Sequence constraint. 2.Disjunction constraint. 3.Key constraint. 4.Occurrence constraint.

CCCT Sequence constraint To declare a list of elements in order. Enclose elements in tags. Example: –In Heat Treating extension, element “QuenchConditions” must occur before “Results”.

CCCT Disjunction constraint To declare mutually exclusive elements, i.e., only one of them can exist. Enclose elements in tags. Example: –In Heat Treating, a part can be made by “Casting” OR “Powder Metallurgy”, not both.

CCCT Key Constraint To declare an attribute to be a primary key, i.e., it must be unique and non-null. Indicate the attribute as type “xsd:ID” and its use as “required”. Example: –In Heat Treating, the name of the cooling medium (quenchant) is crucial because the purpose of the experiments is to categorize the quenchants.

CCCT Occurrence constraint To declare minimum and maximum permissible occurrences of an element. Indicate “minOccurs = x” and “maxOccurs = y” where “x” and “y” denote the minimum and maximum occurrences respectively. Value “maxOccurs = unbounded” means no upper bound on number of occurrences. Value “minOccurs = 0” means that element need not be stored even once. Example: –In Heat Treating, Cooling Rate must be recorded at a minimum of 8 points in an experiment and there is no upper bound for it. The maximum number of graphs stored per experiment is 3 and it is not necessary that at least one graph be stored.

CCCT-0426 Retrieval using XQuery 1.Encourage users to store data in a case- sensitive manner. 2.Use tags to enhance querying efficiency.

CCCT Encourage users to store data in a case-sensitive manner XQuery is case-sensitive Hence it is useful to place emphasis on case when storing data using markup language. This facilitates retrieval using XQuery.

CCCT Use tags to enhance querying efficiency It is possible to anticipate a typical user query in a domain. Thus advisable to add a level of abstraction for faster retrieval of information. Example: –In Heat Treating, a user is likely to retrieve name details of quenchant without its property details. –Hence place tags and around quenchant information. –Thus entire path of quenchant need not be traversed for name details. –This enhances querying efficiency.

CCCT-0429 Conclusions Aspects of extending domain-specific markup languages discussed here. These include motivation for extension, steps in extension, language features, XML constraints and retrieval considerations. Extension to MatML proposed at CHTE, WPI to include Heat Treating semantics. Paper summarizes general issues in extending domain-specific markup languages.

CCCT-0430 Acknowledgments Database Systems Research Group in Department of Computer Science at WPI. Quenching Research Team in Department of Materials Science at WPI. Center for Heat Treating Excellence and its member companies.