Target schema and domain evolution Source metadata preparation Source data preparation Metadata matching Target data instantiation Transformation and analysis.

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

1 Ontolog OOR Use Case Review Todd Schneider 1 April 2010 (v 1.2)
BAH DAML Tools XML To DAML Query Relevance Assessor DAML XSLT Adapter.
Classification & Your Intranet: From Chaos to Control Susan Stearns Inmagic, Inc. E-Libraries E204 May, 2003.
FT228/4 Knowledge Based Decision Support Systems Knowledge Engineering Ref: Artificial Intelligence A Guide to Intelligent Systems, Michael Negnevitsky.
National Digital Repository ® Preserving the imperfect: reflections from NDAD and elsewhere Kevin Ashley Head of Digital Archives Group ULCC.
SOFTWARE PRESENTATION ODMS (OPEN SOURCE DOCUMENT MANAGEMENT SYSTEM)
Using the Semantic Web to Construct an Ontology- Based Repository for Software Patterns Scott Henninger Computer Science and Engineering University of.
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
OWL-AA: Enriching OWL with Instance Recognition Semantics for Automated Semantic Annotation 2006 Spring Research Conference Yihong Ding.
Database Management System (DBMS)
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
U of R eXtensible Catalog Team MetaCat. Problem Domain.
Improving the Usability of e-Commerce Applications Using Business Processes Presenter: Te-Yen Liu 2008/02/05 Topic:
Page 1Prepared by Sapient for MITVersion 0.1 – August – September 2004 This document represents a snapshot of an evolving set of documents. For information.
CASE Tools And Their Effect On Software Quality Peter Geddis – pxg07u.
Chapter 12 Creating and Using XML Documents HTML5 AND CSS Seventh Edition.
Improving Data Discovery in Metadata Repositories through Semantic Search Chad Berkley 1, Shawn Bowers 2, Matt Jones 1, Mark Schildhauer 1, Josh Madin.
Form Builder Iteration 2 User Acceptance Testing (UAT) Denise Warzel Semantic Infrastructure Operations Team Presented to caDSR Curation Team March.
MDC Open Information Model West Virginia University CS486 Presentation Feb 18, 2000 Lijian Liu (OIM:
January, 23, 2006 Ilkay Altintas
HTML, XHTML, and CSS Chapter 12 Creating and Using XML Documents.
Publishing and Visualizing Large-Scale Semantically-enabled Earth Science Resources on the Web Benno Lee 1 Sumit Purohit 2
Information Extraction From Medical Records by Alexander Barsky.
Goal  Support “green field” profiling by providing a browsable, uniform representation of all data Strategy  Shred any source automatically.
© 2007 Tom Beckman Features:  Are autonomous software entities that act as a user’s assistant to perform discrete tasks, simplifying or completely automating.
Spoken dialog for e-learning supported by domain ontologies Dario Bianchi, Monica Mordonini and Agostino Poggi Dipartimento di Ingegneria dell’Informazione.
Ontology Summit2007 Survey Response Analysis -- Issues Ken Baclawski Northeastern University.
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
Reviewing Recent ICSE Proceedings For:.  Defining and Continuous Checking of Structural Program Dependencies  Automatic Inference of Structural Changes.
Chad Berkley NCEAS National Center for Ecological Analysis and Synthesis (NCEAS), University of California Santa Barbara Long Term Ecological Research.
Aude Dufresne and Mohamed Rouatbi University of Montreal LICEF – CIRTA – MATI CANADA Learning Object Repositories Network (CRSNG) Ontologies, Applications.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
Research Project on Metadata Extraction, Exploration and Pooling: Challenges and Achievements Ronald Steinhau (Entimo AG - Berlin/Germany)
Get your hands dirty cleaning data European EMu Users Meeting, 3rd June. - Elizabeth Bruton, Museum of the History of Science, Oxford
MD – Object Model Domain eSales Checker Presentation Régis Elling 26 th October 2005.
2007. Software Engineering Laboratory, School of Computer Science S E Web-Harvest Web-Harvest: Open Source Web Data Extraction tool 이재정 Software Engineering.
Data Staging Data Loading and Cleaning Marakas pg. 25 BCIS 4660 Spring 2012.
User Profiling using Semantic Web Group members: Ashwin Somaiah Asha Stephen Charlie Sudharshan Reddy.
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
ESIP Semantic Web Products and Services ‘triples’ “tutorial” aka sausage making ESIP SW Cluster, Jan ed.
Data Preprocessing Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
Semantic web Bootstrapping & Annotation Hassan Sayyadi Semantic web research laboratory Computer department Sharif university of.
Metadata By N.Gopinath AP/CSE Metadata and it’s role in the lifecycle. The collection, maintenance, and deployment of metadata Metadata and tool integration.
1 Open Ontology Repository initiative - Planning Meeting - Thu Co-conveners: PeterYim, LeoObrst & MikeDean ref.:
Ontology Evaluation, Metrics, and Metadata in NCBO BioPortal Natasha Noy Stanford University.
Information Integration 15 th Meeting Course Name: Business Intelligence Year: 2009.
UCI Expectation-Driven Event Monitoring (EDEM) David Hilbert, David Redmiles
Clinical research data interoperbility Shared names meeting, Boston, Bosse Andersson (AstraZeneca R&D Lund) Kerstin Forsberg (AstraZeneca R&D.
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
An Ontological Approach to Financial Analysis and Monitoring.
Semantic Data Extraction for B2B Integration Syntactic-to-Semantic Middleware Bruno Silva 1, Jorge Cardoso 2 1 2
IDigBio Train the Trainers Georeferencing Workshop Gainesville, FL 8-12, Oct 2012.
Lecture on Database Management System
Jason W. Karl, Ph.D. Jeffrey K. Gillan Jason W. Karl, Ph.D. Jeffrey K. Gillan 23 October 2013 Ty Montgomery Richard Bliss Ty Montgomery Richard Bliss
September 2003, 7 th EDG Conference, Heidelberg – Roberta Faggian, CERN/IT CERN – European Organization for Nuclear Research The GRACE Project GRid enabled.
Data Mining What is to be done before we get to Data Mining?
Semantic Web Technologies Readings discussion Research presentations Projects & Papers discussions.
The effort-saving, cost-cutting, low-overhead, cloud capture platform.
In-situ Visualization using VisIt
Big Data Quality the next semantic challenge
Lisa Ruff Business Productivity/Accessibility TS Microsoft Federal
ARCH-1: Application Architecture made Simple
ece 627 intelligent web: ontology and beyond
Big Data Quality the next semantic challenge
SDMX in the S-DWH Layered Architecture
Context-Aware Internet
Graphical Modeling of INFOD applications
AI Discovery Template IBM Cloud Architecture Center
9/8/ :03 PM © 2006 Microsoft Corporation. All rights reserved.
Presentation transcript:

Target schema and domain evolution Source metadata preparation Source data preparation Metadata matching Target data instantiation Transformation and analysis Hypothesis formation Determine the likely arson suspect - limited transportation - fondness for patterns “One person living at the center of a geographic pattern of historical and current incidents” Guess a task-specific schema {Name of person or incident: String, Incident cause: String, Location {street:string  P.O. street list, city:string  city list, zip: integer  P.O. list in area of interest, lat, long: float  in area of interest} } Find sources to fill in target Familiarization Clarify semantics and domains Current events (xml) Historical events(xls) Historical events(html) People (xls) Data assessment and profiling Find or build extension functions -spelling errors in cause field -“Twinford Drive” inconsistent with geo data -Geo data and park names switched on two other fires Extension functions ready: - split street, city, state, zip - street name to lat/long - zip code to lat/long - bin causes as “suspicious” or null - CSV  KML for map upload Map source schemas to target Fill in target relation, learning by example Remove extraneous data by projection De-duplicate entities & attributes Visualization Mapping of “inexpressible” data Theory formulation Verification Identify missing pieces T.name  People_info.name | historical_events.id | current_events.fire.dtg T.cause  bin(historical_events.cause) T.street,.city,.zip  split(People_info.address) T.lat,.long  getLatLong(split(People_info.address)) | current_events.fire.latitude,.longitude | historical_events.Lat,.Lng Create target instance using CHIME (acceleration via learning by example) Select people and suspicious events Project down to {name, lat, long} Resolve duplicate entities with CHIME Convert to CSV, then KML Load to Google Maps Set icon colors for visibility Make judgement about “pattern” An answer: Jimmy West

Arson Suspect: Target Schema and Solution Map

Rescue Order: Target Schema and Solution List John & Joan then Jenny then Jack

Target schema and domain evolution Source metadata preparation Source data preparation Metadata matching Target data instantiation Transformation and analysis Hypothesis formation A fly (or many?) in the ointment… Guess a task-specific schema - We don’t know how to compute or verify a task-specific schema automatically Find sources to fill in target Familiarization Clarify semantics and domains Data assessment and profiling Find or build extension functions Map source schemas to target Fill in target relation, learning by example Remove extraneous data by projection De-duplicate entities & attributes Visualization Mapping of “inexpressible” data Theory formulation Verification Identify missing pieces -Matching source to target requires semantic knowledge held only by humans - Partial attribute values and unstructured data lack semantics - must come from human knowledge (but the copy and paste action required is learnable by example!) -Entity & attribute resolution requires human-guided choices, e.g. “John and Joan Smith resolve to just one household, but which purchase year is right?” - Some things, like geometric and spatial recognition, require human interpretation - “What’s missing, where do I find it?” is not computable by a machine -No good language for semantics - Don’t know how to compute “Right” domains from source metadata - Data in diverse formats too complext for rapid human review -Need to determine keys and check FDs, then clean data - No language to describe semantics of inputs and outputs of extension functions  selection of functions cannot be automatic

Our approach: –Assist users in data familiarization data assessment and profiling Mapping entity/attribute resolution –Let human judgment make the call –Accelerate human effort via “learn-by-example” Our integration research projects –Quarry –Infosonde –CHIME

CHIME is… an information integration application to capture evolving human knowledge about task-specific data Evolving task-specific schema and entity sets Mapping diverse data to correct attributes and entities Learning by example to speed integration where possible Resolving entities and attributes, and recording user choices Navigating and revising the history of integration decisions made in a dataset

Information Integration Application Repository Pub/ Sub UI Mark API WebUI Editing/Markup Application Feed Browser Ontology Inference and Authoring Mark semantics review Schema creation Entity resolution Attribute resolution Literal mark creation Add to mark semantics Copy/ Paste Mark “headline” browsing Mark visit initiation Feed query creation View marks in context Mark semantics review Mark history review Mark creation Mark deployment to docs Context gathering Annotation Add to mark semantics Infer over mark semantics Mark browsing Mark searching Document submission Populated schemas New and updated documents Ontologies and thesauri Mark browsing Mark searching Mark semantics review Doc API Share mark references CHIME CHIME is… Part of an architecture for capturing and sharing semantics, annotations, and usage of sub-document data

Metrics Scale-up improvement Scale-out improvement % of target schema successfully integrated % of identified user tasks automated or assisted % of data discrepancies detected, corrected automatically “cold-start” to “warm-start” time-to-solution ratio