Knowledge Management for Digital Libraries, Mediated Views, & Archives Bertram Ludäscher * Data and Knowledge Systems (DAKS) * San Diego.

Slides:



Advertisements
Similar presentations
Ontology-Based Computing Kenneth Baclawski Northeastern University and Jarg.
Advertisements

XML: Extensible Markup Language
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
1 Overview XML crash course –HTML vs. XML –pure XML data model (XML = linear syntax for trees) XML Schema Rubin Landau, Bertram Ludaescher, Richard Marciano,
Advanced Topics COMP163: Database Management Systems University of the Pacific December 9, 2008.
1 COS 425: Database and Information Management Systems XML and information exchange.
1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2.
CSE 636 Data Integration XML Distributed Query Processing Slides by Yannis Papakonstantinou.
Knowledge Mediation in the WWW based on Labelled DAGs with Attached Constraints Jutta Eusterbrock WebTechnology GmbH.
Amarnath Gupta Univ. of California San Diego. An Abstract Question There is no concrete answer …but …
XML, distributed databases, and OLAP/warehousing The semantic web and a lot more.
RDF (Resource Description Framework) Why?. XML XML is a metalanguage that allows users to define markup XML separates content and structure from formatting.
Modeling Interactive Web Sources for Information Mediation Information Mediation Framework/Motivation Modeling Interactive Sources with Interaction Diagrams.
Aurora: A Conceptual Model for Web-content Adaptation to Support the Universal Accessibility of Web-based Services Anita W. Huang, Neel Sundaresan Presented.
CSC2012 Database Technology & CSC2513 Database Systems.
Towards Bootstrapping Knowledge- Based Archives* Bertram Ludäscher Richard Marciano Reagan Moore San Diego Supercomputer Center
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center National Partnership for Advanced.
 Copyright 2005 Digital Enterprise Research Institute. All rights reserved. Towards Translating between XML and WSML based on mappings between.
San Diego Supercomputer Center EDBT'02, Prague 1 EDBT Panel, March 2002, Prague: Scientific Data Integration for Complex Multiple-Worlds Scenarios: Databases.
National Partnership for Advanced Computational Infrastructure Digital Library Architecture Reagan Moore Chaitan Baru Amarnath Gupta George Kremenek Bertram.
1 Tutorial #5: Scientific Data Integration and Mediation San Diego Supercomputer Center U.C. San Diego U.C. San Diego Bertram Ludäscher Ilkay Altintas.
GEON-UTEP GEON-Knowledge Representation WG Update GEON-KR list (currently) Bertram Ludaescher (SDSC: Bertram Ludaescher (SDSC:
Model Based Mediation With Domain Maps ___________________________ Xiaosen Li Guanrao William
Database System Concepts and Architecture
Towards Bootstrapping Knowledge- Based Archives* Bertram Ludäscher Richard Marciano Reagan Moore San Diego Supercomputer Center
Data R&D Issues for GTL Data and Knowledge Systems San Diego Supercomputer Center University of California, San Diego Bertram Ludäscher
GEON AHM, April 16-18, SDSC C YBERINFRASTRUCTURE FOR THE G EOSCIENCES Towards Semantic Mediation for GEON: Facilitating Scientific Data Integration using.
CSE-291: Ontologies in Data & Process Integration Department of Computer Science & Engineering University of California, San Diego CSE-291: Ontologies.
Intro. to XML & XML DB Bun Yue Professor, CS/CIS UHCL.
San Diego Supercomputer Center University of California, San Diego The MIX Project Native XML Database XML View(s) Wrappers export: 1. Schemas & Metadata.
Scientific Data Integration with Model-Based Mediation : Databases Meets * Knowledge Representation Bertram Ludäscher Bertram
Navigation-Driven Evaluation of Virtual Mediated Views Bertram Ludäscher, SDSC/UCSD Yannis Papakonstantinou, UCSD Pavel Velikhov, UCSD Overview Mediator.
RELATIONAL FAULT TOLERANT INTERFACE TO HETEROGENEOUS DISTRIBUTED DATABASES Prof. Osama Abulnaja Afraa Khalifah
1 CS 430 Database Theory Winter 2005 Lecture 17: Objects, XML, and DBMSs.
Model-Based Mediation: Framework and Challenges Bertram Ludäscher Data and Knowledge Systems San Diego Supercomputer Center U.C. San.
1 Lessons from the TSIMMIS Project Yannis Papakonstantinou Department of Computer Science & Engineering University of California, San Diego.
Metadata. Generally speaking, metadata are data and information that describe and model data and information For example, a database schema is the metadata.
Rule-Based Programming for VORBs Bertram Ludaescher Arcot Rajasekar Data and Knowledge Systems San Diego Supercomputer Center U.C. San Diego.
Panel 4: Semantic Technologies Bertram Ludäscher (Moderator) UC DAVIS Department of Computer Science San Diego Supercomputer Center Associate Professor.
Towards Self-Validating Knowledge-Based Archives* Bertram Ludäscher Richard Marciano Reagan Moore San Diego Supercomputer.
1 Model-Based Information Integration in a Neuroscience Mediator System Bertram Ludaescher Amarnath Gupta Maryann E. Martone University of California San.
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Persistent Management of Distributed Data Reagan W. Moore.
©Ferenc Vajda 1 Semantic Grid Ferenc Vajda Computer and Automation Research Institute Hungarian Academy of Sciences.
San Diego Supercomputer Center XMLDM'02, Prague 1 Time to Leave the Trees: From Syntactic to Conceptual Querying of XML Bertram Ludäscher Ilkay Altintas.
Web-site Building Methodologies Current Research.
From Data Integration To Semantic Mediation: Addressing Heterogeneities in Data Bertram Ludäscher Bertram Ludäscher Knowledge-Based Information.
Ontology-Based Computing Kenneth Baclawski Northeastern University and Jarg.
Knowledge-Based Integration of Neuroscience Data Sources Amarnath Gupta Bertram Ludäscher Maryann Martone University of California San Diego.
Information Integration BIRN supports integration across complex data sources – Can process wide variety of structured & semi-structured sources (DBMS,
From Database Federation to Model-Based Mediation: Databases Meets * Knowledge Representation Bertram Ludäscher Data and Knowledge Systems.
Data Integration Hanna Zhong Department of Computer Science University of Illinois, Urbana-Champaign 11/12/2009.
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure Kevin D. Munroe Bertram Ludäscher Yannis Papakonstantinou.
Mining the Biomedical Research Literature Ken Baclawski.
Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.
Model-Based Mediation with Domain Maps Bertram Ludäscher * Amarnath Gupta * Maryann E. Martone + * San Diego Supercomputer Center (SDSC) + National Center.
Semantic Mediation and Scientific Workflows Bertram Ludäscher Data and Knowledge Systems San Diego Supercomputer Center University of California, San Diego.
CSE-291: Ontologies in Data Integration Department of Computer Science & Engineering University of California, San Diego CSE-291: Ontologies in Data Integration.
An Extensible Model-Based Mediator System with Domain Maps Amarnath Gupta * Bertram Ludäscher * Maryann E. Martone + * San Diego Supercomputer Center (SDSC)
Enable Semantic Interoperability for Decision Support and Risk Management Presented by Dr. David Li Key Contributors: Dr. Ruixin Yang and Dr. John Qu.
Semantic Interoperability in GIS N. L. Sarda Suman Somavarapu.
SEMI-STRUCTURED DATA (XML) 1. SEMI-STRUCTURED DATA ER, Relational, ODL data models are all based on schema Structure of data is rigid and known is advance.
National Partnership of Advanced Computational Infrastructure San Diego Supercomputer Center KNOW-ME (KNOWledge-Map-Explorer) Semantic Browsing of Integrated.
Data Grids, Digital Libraries and Persistent Archives: An Integrated Approach to Publishing, Sharing and Archiving Data. Written By: R. Moore, A. Rajasekar,
Semantic Data Integration: From Syntax and Structural Transformations to Semantics Bertram Ludäscher Data and Knowledge Systems San Diego.
UCSD Neuron-Centered Database
Data R&D Issues for GTL Bertram Ludäscher Data and Knowledge Systems
Knowledge Representation
Query Optimization.
Ontologies: Introduction and Some Uses
Presentation transcript:

Knowledge Management for Digital Libraries, Mediated Views, & Archives Bertram Ludäscher * Data and Knowledge Systems (DAKS) * San Diego Supercomputer Center U.C. San Diego * formerly: DICE

2 Data and Knowledge Systems (Re-)Organization Data and Knowledge Systems Labs (formerly “DICE”): –Data Grids (SRB et al.) –Advanced Query Processing (KRDB/MBM) –Knowledge-Based Integration (KRDB/MBM) –Knowledge and Information Discovery (Data Mining) –Spatial Information Systems (GIS) Project Areas: –Data and Knowledge Grids (GriPhyN, NVO, BIRN, I2T, GeoGrid, SciDAC/SDM,...) –Digital Libraries (DLI2, NSDL,...) –Persistent Archives (NARA, NHPRC).... R&D: –SRB/(E)MCAT, mySRB,... –XML/Model-based mediator: from proof-of-concept to reusable prototypes –KBA methodology, preliminary archival prototypes

3 The Message: Knowledge Management for Digital Libraries, Mediated Views, & Archives Making data sources (DBs, collections, archives) “smarter” by adding semantics, context, “knowledge”:  extend scope of information integration, mediation, and digital library federation  “intelligent”/“informed” browsing and querying of information (e.g., via Topic Maps, “concept spaces”, “Semantic Web” tech.)  richer, more self-contained knowledge-based archives (KBA) Which Knowledge Representation Formalisms? –formal ontologies (domain maps), expressed in Description Logics (aka concept-definition/terminological languages) –XML, RDF(S), DAML+OIL, Onto..., KIF, KQML, LOOM,.... Goal: Create “Executable Knowledge”... => Right mix between DB and KR technologies!

4 Outline Information Integration from a Database Perspective –examples, mediator approach, some technical challenges Part I: XML-Based Mediation –based on querying semistructured data & XML Part II: Model-Based Mediation –basic ideas & architecture, lifting data to knowledge sources –“glue maps” (domain maps, process maps) and ontologies –ongoing/future research: mix of DB & KR techniques Part III: Knowledge-Based Archives –how to add more semantics to archives Discussion

An Online Shopper’s Information Integration Problem El Cheapo: “Where can I get the cheapest copy (including shipping cost) of Wittgenstein’s Tractatus Logicus-Philosophicus within a week?” ? Information Integration ? Information Integration addall.com “One-World” Mediation “One-World” Mediation amazon.com A1books.com half.com barnes&noble.com WWWpublic library

A Home Buyer’s Information Integration Problem What houses for sale under $500k have at least 2 bathrooms, 2 bedrooms, a nearby school ranking in the upper third, in a neighborhood with below-average crime rate and diverse population? ? Information Integration ? Information Integration Realtor Demographics School Rankings Crime Stats “Multiple-Worlds” Mediation “Multiple-Worlds” Mediation

7 Information Integration from a DB Perspective Information Integration Challenge –Given: data sources S_1,..., S_k (DBMS, web sites,...) and user questions Q_1,...,Q_n that can be answered using the S_i –Find: the answers to Q_1,..., Q_n The Database Perspective: source = “database”  S_i has a schema (relational, XML, OO,...)  S_i can be queried  define virtual (or materialized) integrated views V over S_1,...,S_k using database query languages  questions become queries Q_i against V(S_1,...,S_k) Why a Database Perspective? –scalability, efficiency, reusability (declarative queries),...

8 PART I: XML-Based Mediation

9 Abstract XML-Based Mediator Architecture S_1 MEDIATOR XML Queries & Results USER/Client USER/Client Wrapper XML View S_2 Wrapper XML View S_k Wrapper XML View Integrated XML View V Integrated View Definition IVD(S1,...,Sn) Query Q o V (S_1,...,S_k) Query Q o V (S_1,...,S_k)

10 A Concrete (Future) XML-Based Mediator System S1 S2 S3 XML (Integrated View) MEDIATOR Engine XQuery Processor Integrated View Definition IVD XML Queries & Results XQuery XPATH XQuery XSLT XQuery XSQL USER/Client USER/Client XML-Wrapper XQuery XScan XPath SQL XSQL http-get XSLT XML-Wrapper First Results & Demos: XMAS language and algebra, VXD evaluation, BBQ UI, [WebDB99] [SSD99] [SIGMOD99] [EDBT00] (w/ Papakonstantinou, Vianu,...)

11 Some Technical Challenges... XML Query Languages –DB community: QLs for semistructured data, e.g., TSIMMIS/MSL, Lorel, Yatl,..., Florid/F-logic [InfSystems98] –CSE/SDSC: XMAS [SSD99,WebDB99,EDBT00] –W3C: XPath, XSLT, XQuery (Working Draft, June 2001) DB Theory: Expressiveness/Complexity Trade-Off –querying: FO, (WF/S-)Datalog, FO(LFP), FO(PFP),..., all –reasoning: query satisfiability, containment, equivalence

12... Some More Technical Challenges... DB Practice: Query Composition –compute Q o V(S_1,...,S_k) w/o computing all of V  “push Q through V into S_i”  in Datalog: view unfolding (resolution, unification) + simplification ~ top-down evaluation ~ magic sets  in XML: some solutions ( Papakonstantinou,...) Navigation-Driven Evaluation of Integrated View V: –V materialized => warehousing approach –V virtual => mediator approach –V virtual & driven by user-navigation => VXD approach [EDBT00] (w/ Papakonstantinou, Velikhov)

13 XMAS: XML Matching And Structuring language Integrated View Definition: “Find books from amazon.com and DBLP, join on author, group by authors and title” CONSTRUCT $a1 $t $p { $p } { $a1, $t } WHERE $a1 : $t : IN "amazon.com" AND $a2 : $p : IN " " AND value( $a1 ) = value( $a2 ) CONSTRUCT $a1 $t $p { $p } { $a1, $t } WHERE $a1 : $t : IN "amazon.com" AND $a2 : $p : IN " " AND value( $a1 ) = value( $a2 ) XMAS XMAS Algebra

14 XML (XMAS) Query Processing Translator Rewriter/Optimizer composed plan optimized plan XMAS Query Q Composition (Q o V) XMAS View Definition V algebraic plans Plan Execution Compile-time Run-time: lazy VXD evaluation Run-time: lazy VXD evaluation

15 PART II: Model-Based Mediation

A Geoscientist’s Information Integration Problem What is the distribution and U/ Pb zircon ages of A-type plutons in VA? How about their 3-D geometry ? How does it relate to host rock structures? ? Information Integration ? Information Integration Geologic Map (Virginia) Geologic Map (Virginia) GeoChemical GeoPhysical (gravity contours) GeoPhysical (gravity contours) GeoChronologic (Concordia) GeoChronologic (Concordia) Foliation Map (structure DB) Foliation Map (structure DB) “Complex Multiple-Worlds” Mediation “Complex Multiple-Worlds” Mediation

A Neuroscientist’s Information Integration Problem What is the cerebellar distribution of rat proteins with more than 70% homology with human NCS-1? Any structure specificity? How about other rodents? ? Information Integration ? Information Integration protein localization (NCMIR) protein localization (NCMIR) neurotransmission (SENSELAB) neurotransmission (SENSELAB) sequence info (CaPROT) sequence info (CaPROT) morphometry (SYNAPSE) morphometry (SYNAPSE) “Complex Multiple-Worlds” Mediation “Complex Multiple-Worlds” Mediation

18 What’s the Problem with XML & Complex Multiple-Worlds? XML is Syntax –canonical syntax for labeled ordered trees –a metalanguage, but all semantics lies outside of XML DTDs => tags (=controlled vocabulary) + nesting XML Schema => DTDs + data modeling need anything else? => write comments! –but: agreed-upon XML standards still a good thing! Domain Semantics is complex: –implicit assumptions, hidden semantics  sources seem unrelated to the non-expert Need Structure and Semantics beyond XML trees!  employ richer OO models  make domain semantics and “glue knowledge” explicit  use ontologies to fix terminology and conceptualization  avoid ambiguities by using formal semantics

19 DB mediation techniques Ontologies KR formalisms Model-Based Mediation Information Integration Landscape conceptual distance one-world multiple-worlds conceptual complexity/depth low high addall book-buyer BLAST EcoCyc Cyc WordNet GO home-buyer 24x7 consumer UMLS MIA Entrez RiboWeb Tambis Bioinformatics Geoinformatics

20 From XML-Based to Model-Based Mediation Data and Knowledge Sharing Potential: Database Mediation + Knowledge Representation ________________________ = Model-Based Mediation Basic Ideas: –turn primary data sources into knowledge sources –employ secondary glue knowledge sources generic: UMLS,... specific: community/laboratory ontologies

XML-Based vs. Model-Based Mediation Raw Data IF  THEN  Logical Domain Constraints Integrated-CM := CM-QL(Src1-CM,...) Integrated-CM := CM-QL(Src1-CM,...) (XML) Objects Conceptual Models XML Elements XML Models C2 C3 C1 R Classes, Relations, is-a, has-a,... Glue Maps DMs, PMs Glue Maps DMs, PMs Integrated-DTD := XML-QL(Src1-DTD,...) Integrated-DTD := XML-QL(Src1-DTD,...) No Domain Constraints A = (B*|C),D B =... Structural Constraints (DTDs), Parent, Child, Sibling,... CM ~ {Descr.Logic, ER, UML, RDF/XML(-Schema), …} CM-QL ~ {F-Logic, DAML+OIL, …}

22 Information Integration Landscape Conceptual Distance (“number of hops”) –... speciality... (sub-)discipline... interdisciplinary concepts... –... one (micro) world... multiple worlds... Conceptual Complexity –complexity of interactions between relations, concepts, rules Level of Integration –“Let's put links to all our data on a web page!” –portals to primary (databases) and secondary information sources (literature): NCBI,... –specialized web services: (meta-)BLAST,... –integration services: MIA, Entrez,...

23 What’s the Glue? What’s in a Link? Syntactic Joins –  (X,Y) := X.SSN = Y.SSN equality –  (X,Y) := X.UMLS-ID = Y.UID “Speciality” Joins –  (X,Y,Score) := BLAST(X,Y,Score) similarity Semantic/Rule-Based Joins –  (X,Y,C) := X isa C, Y isa C, BLAST(X,Y,S), S>0.8 homology, lub –  (X,Y,[produces,B,increased_in]) := X produces B, B increased_in Y. rule-based e.g., X=  - secretase, B=beta amyloid, Y=Alzheimer’s disease A Technical Challenge : –“compile” semantic joins into efficient rule evaluation + syntactic joins X Y 

24 Model-Based Mediation Methodology... Lift Sources to export Conceptual Models (CMs): CM(S) = OM(S) + KB(S) + CON(S) Object Model OM(S): –complex objects (frames), class hierarchy, OO constraints Knowledge Base KB(S): –explicit representation of (“hidden”) source semantics –logic rules over OM(S) Contextualization CON(S): –situate OM(S) data using “glue maps” (GMs):  domain maps DMs (ontology) = terminological knowledge: concepts + roles  process maps PMs = “procedural knowledge”: states, events, transitions

25... Model-Based Mediation Methodology Integrated View Definition (IVD) –declarative (logic) rules with object-oriented features –defined over CM(S), domain maps, process maps –needs “mediation engineers” = domain + KRDB experts Knowledge-Based Querying and Browsing (runtime): –mediator composes the user query Q with the IVD... rewrites (Q o IVD), sends subqueries to sources... post-processes returned results (e.g., situate in context)

26 S1 S2 S3 (XML-Wrapper) CM-Wrapper USER/Client USER/Client CM (Integrated View) Mediator Engine FL rule proc. LP rule proc. Graph proc. XSB Engine CM(S) = OM(S)+KB(S)+CON(S) GCM CM S1 GCM CM S2 GCM CM S3 CM Queries & Results (exchanged in XML) Domain Maps DMs Domain Maps DMs Domain Maps DMs Domain Maps DMs Domain Maps DMs Process Maps PMs “Glue” Maps GMs semantic context CON(S) Integrated View Definition IVD Model-Based Mediator Architecture First results & Demos: KIND prototype, formal DM semantics, PMs [SSDBM00] [VLDB00] [ICDE01] [NIH-HB01] (w/ Gupta, Martone)

27 Formalizing Glue Knowledge: Domain Map for SYNAPSE and NCMIR Domain Map = labeled graph with concepts ("classes") and roles ("associations") additional semantics: expressed as logic rules (F-logic) Domain Map = labeled graph with concepts ("classes") and roles ("associations") additional semantics: expressed as logic rules (F-logic) Domain Map (DM) Purkinje cells and Pyramidal cells have dendrites that have higher-order branches that contain spines. Dendritic spines are ion (calcium) regulating components. Spines have ion binding proteins. Neurotransmission involves ionic activity (release). Ion-binding proteins control ion activity (propagation) in a cell. Ion-regulating components of cells affect ionic activity (release). Domain Expert Knowledge DM in Description Logic

28 Source Contextualization & DM Refinement In addition to registering (“hanging off”) data relative to existing concepts, a source may also refine the mediator’s domain map...  sources can register new concepts at the mediator...

Example: ANATOM Domain Map

30 Browsing Registered Data with Domain Maps

31 Compilation : Domain Maps => F-Logic Rules  Domain Maps ~ Ontologies DMs have a formal semantics via a translation to Description Logics (fragments of first-order logic): C ==R=> D   x (C(x)  y D(y)  R(x,y) ) (*) Quiz: Neuron ==has=> Compartment  ? Translation to deductive rules: e.g. F-logic = Datalog + OO features => Declarative + “Executable” Specification query evaluation with deductive rules => (*) as an integrity check, or derived knowledge,... reasoning over decidable fragments: checking concept satisfiability, subsumption, equivalence

Query Processing “Demo” Query results in context Contextualization CON(Result) wrt. ANATOM. provided by the domain expert and mediation engineer deductive OO language (here: F-logic) provided by the domain expert and mediation engineer deductive OO language (here: F-logic)

Example: Inside Query Evaluation push X1 := select targets of “output from parallel fiber” ; determine source X2 := “find and situate” X1 in ANATOM Domain Map; compute region of interest (here: downward X3 := subregion-closure(X2); push X4 := select PROT-data(X3, Ryanodine Receptors); compute protein X5 := compute aggregate(X4); display in display X5 in context (ANATOM) "How does the parallel fiber output (Yale/SENSELAB) relate to the distribution of Ryanodine Receptors (UCSD/NCMIR)?”

34 Some Open Database & Knowledge Representation Issues Mix of Query Processing and Reasoning –e.g., FaCT, LOOM description logic reasoner for DMs? –reconcilation of DMs via argumentation-frameworks (“games”) using well-founded and stable models of logic programs??? [ICDT97,PODS97,TCS00] Modeling “Process Knowledge” => Process Maps –formal semantics? (dynamic/temporal/Kripke models?) –executable semantics? (Statelog?) Graph Queries over DMs and PMs –expressible in F-logic [InfSystem98] –scalability? (UMLS Domain Map has millions of entries)...

35 Process Maps with Abstractions and Elaborations: => From Terminological to Procedural Glue nodes ~ states edges ~ processes, transitions blue/red edges: processes in Src1/Src2 general form of edges: how about these?

36 Models and Formal Approaches: Relating Theory to the World ©2000 by John F. Sowa, Knowledge Representation: Logical, Philosophical, and Computational Foundations, Brooks/Cole, Pacific Grove, CA. Representation: Logical, Philosophical, and Computational Foundations All models are wrong, but some are useful!

37 Summary: Mediation Scenarios & Techniques Federated Databases XML-Based Mediation Model-Based Mediation One-World One-/Multiple-Worlds Complex Multiple-Worlds Common Schema Mediated Schema Common Glue Maps SQL, rules XML query languages DOOD query languages Schema Transformations Syntax-Aware Mappings Semantics-Aware Mappings Syntactic Joins Syntactic Joins “Semantic” Joins via Glue Maps DB expertDB expert KRDB + domain expert

38 PART III: Knowledge-Based Archives

39 From XML-Based to Knowledge-Based Archives Collection-based archival with XML: save data "as is" plus... –... separate content from presentation –... tag your data (and take a lift in the info hierarchy) –... use a self-describing, semistructured data format (XML) Knowledge-based archival: now add... –... conceptual level information –... integrity constraints –... explanations/derivation rules: archiving only results y=f(x) vs. archiving the rules/function "f" (e.g. f = “the Florida procedure”...) => employ knowledge representation languages

40 Knowledge-Based Archival: Senate Example Data provider says: “Please archive all records of legislative activities of the 106th senate!” Integrity constraints, eg: (1) {senators_with_file} = UNION (sponsor, cosponsors, submitted_by) (2) {senators} = {sponsors} = {co-sponsors} Violation: –the rhs is a SUPERSET of the lhs ! Exceptions: – (Chafee, John), (Gramm, Phil), (Miller, Zell) (Possible) Explanations: –senators who joined (Zell), passed away (Chafee), were forgotten (Gramm)!? Checking ICs: IF sponsor(X), not senator(X) THEN ADD(exception_log, missing_senator_info(X)) IF condition THEN action Action = LOG, WARN, ABORT,...

41 Maximizing “Self-Containedness”... Self-validating archives: add... –... "executable knowledge" (=rules) – "helping (bugging?) the data provider" => add the functionality and meaning of DTD (+Schema+IC+...) validation to the AIP => package the validator! Self-instantiating archives: add... –... "executable ingestion process" –“helping the archival engineer (aka archivist)” –…here is: looking over your shoulder… => add the functionality of database transformations to the AIP => package the transformers! BUT packaging validators and transformers increases infrastructure dependence!

42 Maximize “Self-Containedness”... …While Minimizing Infrastructure Dependence Basic Idea: use a language of executable specifications for self-validation and self-instantiation! => Use “Bootstrapping” for Self-Validating & Self- Instantiating Archives Example: DTD Validator in Logic (F-Logic, Datalog,…) % specify false IF P:X, not (P1.X):Y. false IF P:X, not (P2.X):Y. false IF P:X, not P[_-> _]. false IF P:X[N->_], not N=1, not N=2....

43 In Search of Semantics: What’s in a Rock? Name: –Basalt Description: –a hard, black volcanic rock with less than about 52 weight percent silica (SiO 2 ). Colour: –When fresh it is black or greyish black; often weathers to a reddish or greenish crust. Texture: –Usually dense with no minerals identifiable in hand specimen; a freshly broken surface is dull in appearance. May be porphyrithic. Structure: –Often vesicular and/or amygdaloidal. Xenoliths are relatively common and usually consists of olivine and pyroxene; they have a green colour... Mineralogy: – Phenocrysts are usually olivine (green, glassy), pyroxene (black, shiny) or plagioclase (white-grey, tabular). If olivine is present the rock is called olivine basalt. Microscopic examination show the groundmass to consist of plagioclase (usually labradorite), pyroxene, olivine and magnetite, with a wide range of accessory minerals... Field relations: –Lava flows and narrow dykes and sills. The edges of dykes or sills are often finer grained than the centers or even glassy, due to rapid cooling on intrusion...

44 In Search of Semantics: What’s in a Rock? material = basalt shape =... weight =... date.formed =... date.found = 1799 a.d. place.found= Rashid place.found.common-name = Rosetta date.created = 196 BC language1 = Hieroglyphic language2 = Demotic language3 = Greek And private individuals shall also be allowed to keep the festival and set up the aforementioned shrine and have it in their homes, performing the aforementioned celebrations yearly, in order that it may be known to all that the men of Egypt magnify and honour the GOD EPIPHANES EUCHARISTOS the king, according to the law. This decree shall be inscribed on a stela of Hard stone in sacred [i.e. hieroglyphic] and native [i.e. demotic] and Greek characters and set up in each of the first, second, and third [rank] temples beside the image of the ever living king. So where did you find the semantics?

45 Summary: Towards Bootstrapping Knowledge-Based Archives Baron von Münchhausen, pulling himself out of the swamp enable addition of semantic annotations ("knowledge") via logic rules to AIPs add executable specifications of semantics => AIP += KP (knowledge package, i.e., logic ules) => self-validating archive add executable specifications of the ingestion network => AIP += IN (ingestion network,...more logic rules) => self-instantiating archive => bootstrapping knowledge-based archive with DTD/Schema/IC validation and ingestion transformations all expressed in a declarative logic program Outlook from the 2do list: build a prototype BARON = Bootstrapping Archive of Rules, Ontologies, and Ingestion Networks

46 Questions? Queries?

47 References XML-Based and Model-Based Mediation: –MBM: Model-Based Mediation with Domain Maps, B. Ludäscher, A. Gupta, M. E. Martone, 17th Intl. Conference on Data Engineering (ICDE), Heidelberg, Germany, IEEE Computer Society,2001.Model-Based Mediation with Domain Maps(ICDE) –VXD/Lazy Mediaors: Navigation-Driven Evaluation of Virtual Mediated Views, B. Ludäscher, Y. Papakonstantinou, P. Velikhov, Intl. Conference on Extending Database Technology (EDBT), Konstanz, Germany, LNCS 1777, Springer, 2000.Navigation-Driven Evaluation of Virtual Mediated Views(EDBT) –DOOD: Managing Semistructured Data with FLORID: A Deductive Object-Oriented Perspective, B. Ludäscher, R. Himmeröder, G. Lausen, W. May, C. Schlepphorst, Information Systems, 23(8), Special Issue on Semistructured Data, 1998.Managing Semistructured Data with FLORID: A Deductive Object-Oriented Perspective Information Systems, 23(8), Special Issue on Semistructured Data STATELOG (Logic Programming with States) –On Active Deductive Databases: The Statelog Approach, G. Lausen, B. Ludäscher, and W. May. In Transactions and Change in Logic Databases, Hendrik Decker, Burkhard Freitag, Michael Kifer, and Andrei Voronkov, editors. LNCS 1472, Springer, 1998.On Active Deductive Databases: The Statelog ApproachTransactions and Change in Logic Databases

48 Towards Self-Validating Knowledge-Based Archives, Bertram Ludäscher, Richard Marciano, Reagan Moore, 11th Workshop on Research Issues in Data Engineering (RIDE), Heidelberg, IEEE Computer Society, April 2001, SDSC TR , January 18, 2001.(RIDE)SDSC TR , January 18, 2001 Knowledge-Based Persistent Archives, Reagan Moore, SDSC TR , January 18, 2001SDSC TR , January 18, 2001 The Senate Legislative Activities Collection (SLA): a Case Study Infrastructure Research to Support Preservation Strategies, Richard Marciano, Bertram Ludäscher, Reagan Moore, SDSC TR , January 18, 2001SDSC TR , January 18, 2001 Reference Model for an Open Archival Information System (OAIS), Draft Recommendation, Consultative Committee for Space Data Systems, CCSDS R-1, May Digital Rosetta Stone: A Conceptual Model for Maintaining Long-term Access to Digital Documents, Alan R. Heminger, Steven B. Robertson References