Predicting Missing Provenance Using Semantic Associations in Reservoir Engineering Jing Zhao University of Southern California Sep 19 th,

Slides:



Advertisements
Similar presentations
Mining User Similarity Based on Location History Yu Zheng, Quannan Li, Xing Xie Microsoft Research Asia.
Advertisements

Open Provenance Model Tutorial Session 2: OPM Overview and Semantics Luc Moreau University of Southampton.
Lecture-19 ETL Detail: Data Cleansing
Sequence Clustering and Labeling for Unsupervised Query Intent Discovery Speaker: Po-Hsien Shih Advisor: Jia-Ling Koh Source: WSDM’12 Date: 1 November,
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 8 Slide 1 System modeling 2.
Exploring Reduction for Long Web Queries Niranjan Balasubramanian, Giridhar Kuamaran, Vitor R. Carvalho Speaker: Razvan Belet 1.
Presented by: Pham Kien Cuong NUS Graduate School for Integrative Sciences and Engineering.
Research topics Semantic Web - Spring 2007 Computer Engineering Department Sharif University of Technology.
S11: Risk Based Audit Approach. Session Objectives  To define audit risks and establish the relationship between materiality and audit risk  To discuss.
Time-dependent Similarity Measure of Queries Using Historical Click- through Data Qiankun Zhao*, Steven C. H. Hoi*, Tie-Yan Liu, et al. Presented by: Tie-Yan.
Sensemaking and Ground Truth Ontology Development Chinua Umoja William M. Pottenger Jason Perry Christopher Janneck.
Author: Jason Weston et., al PANS Presented by Tie Wang Protein Ranking: From Local to global structure in protein similarity network.
Semantics For the Semantic Web: The Implicit, the Formal and The Powerful Amit Sheth, Cartic Ramakrishnan, Christopher Thomas CS751 Spring 2005 Presenter:
1 Cui Tao PhD Dissertation Defense Ontology Generation, Information Harvesting and Semantic Annotation For Machine-Generated Web Pages.
Semantic Web Technology Evaluation Ontology (SWETO): A test bed for evaluating tools and benchmarking semantic applications WWW2004 (New York, May 22,
Aligning Course Competencies using Text Analytics
A Semantic Workflow Mechanism to Realise Experimental Goals and Constraints Edoardo Pignotti, Peter Edwards, Alun Preece, Nick Gotts and Gary Polhill School.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
Copyright © 2010 Accenture All Rights Reserved. Accenture, its logo, and High Performance Delivered are trademarks of Accenture. Multiple Ontologies in.
Faculty of Informatics and Information Technologies Slovak University of Technology Personalized Navigation in the Semantic Web Michal Tvarožek Mentor:
Semantic Analytics on Social Networks: Experiences in Addressing the Problem of Conflict of Interest Detection Boanerges Aleman-Meza, Meenakshi Nagarajan,
PhishNet: Predictive Blacklisting to Detect Phishing Attacks Pawan Prakash Manish Kumar Ramana Rao Kompella Minaxi Gupta Purdue University, Indiana University.
Web Usage Mining with Semantic Analysis Date: 2013/12/18 Author: Laura Hollink, Peter Mika, Roi Blanco Source: WWW’13 Advisor: Jia-Ling Koh Speaker: Pei-Hao.
An Integrated Approach to Extracting Ontological Structures from Folksonomies Huairen Lin, Joseph Davis, Ying Zhou ESWC 2009 Hyewon Lim October 9 th, 2009.
An approach to Intelligent Information Fusion in Sensor Saturated Urban Environments Charalampos Doulaverakis Centre for Research and Technology Hellas.
Krishnaprasad Thirunarayan, Pramod Anantharam, Cory A. Henson, and Amit P. Sheth Kno.e.sis Center, Ohio Center of Excellence on Knowledge-enabled Computing,
Mining Interesting Locations and Travel Sequences from GPS Trajectories IDB & IDS Lab. Seminar Summer 2009 강 민 석강 민 석 July 23 rd,
Usage of `provenance’: A Tower of Babel Luc Moreau.
2014-May-07. What is the problem? What have others done? What is our solution? Does it work? Outline 2.
SWETO: Large-Scale Semantic Web Test-bed Ontology In Action Workshop (Banff Alberta, Canada June 21 st 2004) Boanerges Aleman-MezaBoanerges Aleman-Meza,
Extracting Semantic Constraint from Description Text for Semantic Web Service Discovery Dengping Wei, Ting Wang, Ji Wang, and Yaodong Chen Reporter: Ting.
Incident Threading for News Passages (CIKM 09) Speaker: Yi-lin,Hsu Advisor: Dr. Koh, Jia-ling. Date:2010/06/14.
INTERACTIVE ANALYSIS OF COMPUTER CRIMES PRESENTED FOR CS-689 ON 10/12/2000 BY NAGAKALYANA ESKALA.
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
A Context Model based on Ontological Languages: a Proposal for Information Visualization School of Informatics Castilla-La Mancha University Ramón Hervás.
Evaluating Semantic Metadata without the Presence of a Gold Standard Yuangui Lei, Andriy Nikolov, Victoria Uren, Enrico Motta Knowledge Media Institute,
Faculty of Informatics and Information Technologies Slovak University of Technology Personalized Navigation in the Semantic Web Michal Tvarožek Mentor:
A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.
Ontology-Based Computing Kenneth Baclawski Northeastern University and Jarg.
Truth Discovery with Multiple Conflicting Information Providers on the Web KDD 07.
Date : 2013/03/18 Author : Jeffrey Pound, Alexander K. Hudek, Ihab F. Ilyas, Grant Weddell Source : CIKM’12 Speaker : Er-Gang Liu Advisor : Prof. Jia-Ling.
Algorithmic Detection of Semantic Similarity WWW 2005.
LOGO 1 Corroborate and Learn Facts from the Web Advisor : Dr. Koh Jia-Ling Speaker : Tu Yi-Lang Date : Shubin Zhao, Jonathan Betz (KDD '07 )
Using Domain Ontologies to Improve Information Retrieval in Scientific Publications Engineering Informatics Lab at Stanford.
Introduction to the Semantic Web and Linked Data
ESIP Semantic Web Products and Services ‘triples’ “tutorial” aka sausage making ESIP SW Cluster, Jan ed.
Context Aware Semantic Association Ranking SWDB Workshop Berlin, September 7, 2003 Boanerges Aleman-MezaBoanerges Aleman-Meza, Chris Halaschek, I. Budak.
Dictionary based interchanges for iSURF -An Interoperability Service Utility for Collaborative Supply Chain Planning across Multiple Domains David Webber.
Semantic web Bootstrapping & Annotation Hassan Sayyadi Semantic web research laboratory Computer department Sharif university of.
Ontology Quality by Detection of Conflicts in Metadata Budak I. Arpinar Karthikeyan Giriloganathan Boanerges Aleman-Meza LSDIS lab Computer Science University.
Date: 2012/08/21 Source: Zhong Zeng, Zhifeng Bao, Tok Wang Ling, Mong Li Lee (KEYS’12) Speaker: Er-Gang Liu Advisor: Dr. Jia-ling Koh 1.
Data Management Support for Life Sciences or What can we do for the Life Sciences? Mourad Ouzzani
CONTEXTUAL SEARCH AND NAME DISAMBIGUATION IN USING GRAPHS EINAT MINKOV, WILLIAM W. COHEN, ANDREW Y. NG SIGIR’06 Date: 2008/7/17 Advisor: Dr. Koh,
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
A Framework to Predict the Quality of Answers with Non-Textual Features Jiwoon Jeon, W. Bruce Croft(University of Massachusetts-Amherst) Joon Ho Lee (Soongsil.
An Energy-Efficient Approach for Real-Time Tracking of Moving Objects in Multi-Level Sensor Networks Vincent S. Tseng, Eric H. C. Lu, & Kawuu W. Lin Institute.
Advanced Gene Selection Algorithms Designed for Microarray Datasets Limitation of current feature selection methods: –Ignores gene/gene interaction: single.
1 SWE Introduction to Software Engineering Lecture 14 – System Modeling.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Usefulness of Quality Click- through Data for Training Craig Macdonald, ladh Ounis Department of Computing Science University of Glasgow, Scotland, UK.
Ontology Evaluation and Ranking using OntoQA Samir Tartir and I. Budak Arpinar Large-Scale Distributed Information Systems Lab University of Georgia The.
Discovering and Ranking Semantic Associations over a Large RDF Metabase Chris Halaschek, Boanerges Aleman- Meza, I. Budak Arpinar, Amit P. Sheth 30th International.
1 Intelligent Information System Lab., Department of Computer and Information Science, Korea University Semantic Social Network Analysis Kyunglag Kwon.
Improving compound–protein interaction prediction by building up highly credible negative samples Toward more realistic drug-target interaction predictions.
Ontology Engineering and Feature Construction for Predicting Friendship Links in the Live Journal Social Network Author:Vikas Bahirwani 、 Doina Caragea.
Ontology Evaluation Outline Motivation Evaluation Criteria Evaluation Measures Evaluation Approaches.
Introduction to Ontology Introductions Alan Ruttenberg Science Commons.
Knowledge Discovery in the Semantic Web
Gong Cheng, Yanan Zhang, and Yuzhong Qu
A Graph-Based Approach to Learn Semantic Descriptions of Data Sources
Presentation transcript:

Predicting Missing Provenance Using Semantic Associations in Reservoir Engineering Jing Zhao University of Southern California Sep 19 th, 2011

Outline Background and Introduction Our Approach Annotation Association Detection Confidence Assignment Prediction Evaluation Conclusion and Future Work

Provenance Information The provenance of a piece of data is the process that led to that piece of data [1] Usage of provenance Data quality assessment Data auditing Repetition of data derivation [1] Moreau, L. (2010) The Foundations for Provenance on the Web. Foundations and Trends in Web Science, 2 (2--3). pp ISSN X

Incomplete Provenance in Reservoir Engineering Complicated domain dataset E.g., reservoir models Large amount of data items integrated from multiple data sources Provenance information for data auditing and data quality control Incomplete provenance Legacy tools not supporting provenance functionalities Manual provenance annotation Integrating operations Copy/Paste across reservoir models Predict missing provenance Immediate parent process

Our Observations Data items may share the same provenance Special semantic “connections” exist between data items with identical provenance

Semantic Associations Sequences of relationships connecting two entities in the ontology graph [2][3] Express special semantic connections explicitly Reveal hidden data generation patterns [2] B. Aleman-Meza, C. Halaschek, I. B. Arpinar, and A. Sheth, “Contextaware semantic association ranking,” in SWDB, [3] K. Anyanwu and A. Sheth, “p-queries: Enabling querying for semantic associations on the semantic web,” in WWW, 2003.

Problem Definition Date set Reservoir model Provenance of a data item: Provenance indicator function

Use Semantic Associations for Prediction

Outline Background and Motivation Our Approach Annotation Association Detection Confidence Assignment Prediction Evaluation Conclusion and Future Work

Bootstrapping

Annotation Domain ontology Domain classes Reservoir, Well, Region Relationships ReservoirContainsWell Domain entities Instances of domain classes Annotation function

Association Detection Historical datasets with complete provenance 1. Identify data items with identical provenance 2. Identify their annotation domain entities 3. Compute semantic associations in the ontology graph

Confidence of Association Probability that two data items have identical provenance, if their annotation domain entities are associated by association A. Conditional confidence Calculation

Prediction

Outline Background and Motivation Our Approach Annotation Association Detection Confidence Assignment Prediction Evaluation Conclusion and Future Work

Experiment Setup Use cases Two types of reservoir models Type 1: ~1000 data items in one dataset Type 2: ~500 data items Historical datasets ~2000 datasets Duplicate real dataset samples Use the pattern learnt from real dataset samples Test set 10% of historical datasets Randomly drop provenance

Baseline Approaches Baseline 1 For a data item annotated by an entity e, select the generation process which were most frequently used to create data items annotated by e in the historical datasets Baseline 2 Instead of using semantic associations, only consider provenance similarity between domain entity pairs

Results of Use Case 1: 500 historical datasets (a) 500 historical datasets

Results of Use Case 1: 1000 historical datasets (b) 1000 historical datasets

Results of Use Case 1: 2000 historical datasets (c) 2000 historical datasets

Results of Use Case 2 (c) 2000 (a) 500 (b) 1000

Conclusion and Future Work Predict missing provenance Semantic associations Hidden semantic “connections” between fine-grained data items sharing identical provenance Historical datasets analysis Dataset  ontology graph  dataset Future work Inconsistent provenance More complicated provenance Provenance integration framework