Presentation is loading. Please wait.

Presentation is loading. Please wait.

LoG: A Methodology for Metadata Registry-based Management of Scientific Data July 5, 2002 Doo-Kwon Baik

Similar presentations


Presentation on theme: "LoG: A Methodology for Metadata Registry-based Management of Scientific Data July 5, 2002 Doo-Kwon Baik"— Presentation transcript:

1 LoG: A Methodology for Metadata Registry-based Management of Scientific Data July 5, 2002 Doo-Kwon Baik baik@software.korea.ac.kr

2 July 5, 2002CODATA/DSAO 20022 Content  Motivation  Objectives  Related works Overview on the MDR  The scientific data properties  User levels and the data property  Data visibility  The conceptual model of the LoG  A LoG Framework  An Example  Conclusions and Future work

3 July 5, 2002CODATA/DSAO 20023 Motivation  The existing data integration approaches just focus on the technical researches and system developments not consider the properties of the domain knowledge

4 July 5, 2002CODATA/DSAO 20024 The Domain Knowledge  The domain knowledge property is a very important factor in data integration Many works and services depends on the domain knowledge properties The quality degree and the quantity scope in data integration are defined depending on the domain knowledge property. Many other services such as data services and application services depend on it. Domain knowledge the quality degree of data integration the quantity scope of data integration data services (information providing) application services

5 July 5, 2002CODATA/DSAO 20025 Objectives  The objectives of our research to solve the problems of the existing data integration approaches to analyze and define the domain knowledge properties In this paper, we focus on the scientific data. to define relationship among the domain knowledge properties, users and metadata i.e., define the considerations for data integration. to create a new methodology considering the results of domain knowledge analysis we called it as LoG (Localization-based Global MDR methodology). finally to design a framework which is suitable for the methodology.

6 July 5, 2002CODATA/DSAO 20026 Related works: Bottom-up approach(1/2)  The existing data integration approaches are classified into the top-down approach and the bottom-up approach  Bottom-up approach is the most general approach The ontology-based methodology is representative Design and create a guideline such as a global view from the specified databases new databases (the number of them = c) Analyze all factual databases (the number of databases = n) the number of databases = n + c

7 July 5, 2002CODATA/DSAO 20027 Related works: Bottom-up approach(2/2)  Advantages can reach the perfect data integration because we use a global guideline which is created through analysis and design about all databases  Disadvantages the creation of a global guideline spends many costs and time is not suitable for very large scale data integration provides a static integration management mechanism Whenever a new schema or a new database is added to the integrated database, the previous processes is required. It causes the increase of costs and time geometrically. not provide a standardized guideline i.e., it depends on its domain. each application domain for integration define and utilize the different and various guidelines respectively.

8 July 5, 2002CODATA/DSAO 20028 Related works: Top-down approach(1/2)  Top-down approach to solve the problems of the bottom-up approach MDR(ISO/IEC 11179) is representative MDR is the international standard Design and create a guideline such as a global view(metadata elements) from the specified databases new databases Analyze all factual databases Define the schemas of new database according to the standardized guideline

9 July 5, 2002CODATA/DSAO 20029 Related works: Bottom-up approach(2/2)  Advantages reduces many costs because it doesn’t require for the rebuilding process of the global guideline. provides a standardized schema all new databases can be built and managed consistently.  Disadvantages It also spends many costs initially as the bottom-up approach because it require for the create a global view through analysis of all legacy databases. It is a hard work in case of the very large scale integration.

10 July 5, 2002CODATA/DSAO 200210 Overview on the MDR: Definition  Definition of MdR Metadata Registry System of Registering, Storing and managing the specification(Metadata) about data elements Evolution of ISO/IEC 11179 Metamodel of Data Registry : ANSI X3.285  Purpose Metadata Registry for data standardization Support of data search, data specification Support of data sharing among systems or organizations Supporting System of creating, registering and managing data element Support understanding of meaning, representation and identification of data for users

11 July 5, 2002CODATA/DSAO 200211 Overview on the MDR: Basic concepts  Data Element The basic unit of data management the unit specifying the identification, context, representation of value about data  Components of Data Element Object Class : The data for collecting or storing Property : the characteristics needed to identify and explain objects Representation : The description about representational form and value domain of each data elements Object Class Property Data Element Concept 1:N 1:1 Object Class Property Data Element 1:N 1:1 Representation 1:1

12 July 5, 2002CODATA/DSAO 200212 Overview on the MDR: Specification  Specification of Data Element Basic Attribute for specifying data element ClassificationCharacteristics IdentificationIdentification of data element DefinitionDescription of meaning RelationRelation of data elements RepresentationDescription of data element representation AdministrationDescription of data element management

13 July 5, 2002CODATA/DSAO 200213 Overview on the MDR: An Example  Definition of a metadata element Identifying and Definition Attributes Data Element NameStudent_ID Identifier2002020177 Version1 Synonymous nameStudent Number ContextStudent’s ID Definitional Attribute DefinitionAssigned the unique number to each student Relational and Representational Attributes TypeData Element Representation CategoryNumber Representation FormCode Data TypeNumeric Min.size7 Max.size12 Representation LayoutN(12) Data Domainreference of student ID classification Administrative Attribute Registration AuthorityKOREA UNIV. Registration Statusrecorded

14 July 5, 2002CODATA/DSAO 200214 The scientific data properties  The scientific data(knowledge) has the following properties: the general data most people can understand and use it easily. most databases in the scientific fields have the similar or same data elements. the specialized data are more complicated and detailed. the general users can’t understand it. the experts in the specific group are interested in the data, and can utilize it. ※ Building the MDR for all data as a whole is not necessary

15 July 5, 2002CODATA/DSAO 200215 User levels and the data property  Classification of users The users are classified into two groups according to the scientific data property The general users and the specialized users. The general users use the general data in high-level and in the many fields. The specialized users domain experts in a specific field. use the general data and specialized data. also differentiated into more detailed fields. i.e., The specialized users are distributed into several groups, the experts in each group are interested in more specialized data independently.

16 July 5, 2002CODATA/DSAO 200216 Data visibility  Data visibility The quantity and the specialized degree is differentiated into several levels according to the knowledge property, and each level has a independent data set all users detailed -specialized users n specialized users detailed -specialized users 1 general users... used by all users used by specialized users used in independent expert domain group the whole data set set 1 set 2 set 3 set 4 set 5

17 July 5, 2002CODATA/DSAO 200217 The conceptual relation diagram General User 1General User 2General User n Domain Expert 1... Domain Expert 2 Domain Expert n Local MDR 1 (Domain 1) Local MDR 2 (Domain 2) Local MDR m (Domain m ) DB 11DB 12 DB 1n... DB 21DB 22 DB 2n... DB m1DB m2 DB mn... Domain m Domain 2 Domain 1... Global MDR Localization Globalization Specialization Generalization...

18 July 5, 2002CODATA/DSAO 200218 The conceptual model of the LoG  The LoG methodology has four layers Interface Layer provides the user interface environments for all users. Global MDR Layer manages the global MDR for the most generalized and common data which all users(general and specialized users) utilize and access. Local MDR Layer manages the local MDRs for the specialized data which the experts use. The local MDR may be hierarchical structure. Factual Database Layer manages the low and factual data. User Interface Layer Factual Database Layer Global MDR Layer (Generalized Layer) Local MDR Layer (Specialized Layer)

19 July 5, 2002CODATA/DSAO 200219 Factual DB Layer A LoG Framework(1/2) DB 11DB 12 DB 1n... DB 21DB 22 DB 2n... DB m1DB m2 DB mn... Domain m Domain 2 Domain 1 Global User Interface (General User Level Interface) Local User Interface (Expert Level Interface) Expert Level Interface Agent LMDR Agent (Registration, Classification, Authorization) LMDRs LMDR 1LMDR 2LMDR n … LMeta Repository (Sets of actual metadata) General User Level Interface Agent GMDR Agent (Registration, Classification) GMDR GMeta Repository Global MDR Layer Local MDR Layer User Interface Layer Factual DB Layer

20 July 5, 2002CODATA/DSAO 200220 A LoG Framework(2/2)  Interface Layer Global user interface and local user interface sub-layers  Global MDR layer GMDR agent manage the GMDR(global MDR) and the GMeta(global metadata repository). GMDR(global MDR) a standardized guideline for general users and experts. the set of metadata elements used commonly in all databases. GMeta(global metadata repository) the set of actual metadata  Local MDR layer LMDR agent manage the LMDRs and the LMeta LMDRs(local MDRs) a standardized guideline for the specialized users. a set of metadata elements which is to generalize data in each field or detailed field.

21 July 5, 2002CODATA/DSAO 200221 GMDR LMDRs An Example Name definition the unique object name version1 registration status standard datatypecharacter formatcharacter(20) Biological Order Name definition The systematic name that represents the biological Species version1 registration status standard datatypecharacter formatcharacter(50) Chemical Molecular Formula Code definition The code that represents the number of atoms of each element in a molecule of a chemical substance version1 registration status standard datatypecharacter formatcharacter(100) Name Biological Order Name... Name Chemical Molecular Formula Code...

22 July 5, 2002CODATA/DSAO 200222 Conclusions and Future work  Conclusions We considered and defined the domain knowledge property The LoG methodology is proposed with the knowledge property provides a dynamic integration mechanism partially. provides a standardization guideline based on ISO/IEC 11179, the international standard. reduces unnecessary costs from analysis and design all databases for creation of a global view.  Future work to analyze and define the domain knowledge property in detail to implement a prototype based on the framework we described

23 Q / A Thanks !


Download ppt "LoG: A Methodology for Metadata Registry-based Management of Scientific Data July 5, 2002 Doo-Kwon Baik"

Similar presentations


Ads by Google