1 CIS607, Fall 2005 Semantic Information Integration Presentation by Dayi Zhou Week 4 (Oct. 19)

Slides:



Advertisements
Similar presentations
Data Mining and the Web Susan Dumais Microsoft Research KDD97 Panel - Aug 17, 1997.
Advertisements

IMAP: Discovering Complex Semantic Matches Between Database Schemas Ohad Edry January 2009 Seminar in Databases.
Lukas Blunschi Claudio Jossen Donald Kossmann Magdalini Mori Kurt Stockinger.
Data Mining Feature Selection. Data reduction: Obtain a reduced representation of the data set that is much smaller in volume but yet produces the same.
Amit Shvarchenberg and Rafi Sayag. Based on a paper by: Robin Dhamankar, Yoonkyong Lee, AnHai Doan Department of Computer Science University of Illinois,
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Relational Algebra Chapter 4, Part A Modified by Donghui Zhang.
Reasoning Methodologies in Information Technology R. Weber College of Information Science & Technology Drexel University.
Self-Tuning and Self-Configuring Systems Zachary G. Ives University of Pennsylvania CIS 650 – Database & Information Systems March 16, 2005.
1 CIS607, Fall 2004 Semantic Information Integration Presentation by Julian Catchen Week 3 (Oct. 13)
Query Rewrite: Predicate Pushdown (through grouping) Select bid, Max(age) From Reserves R, Sailors S Where R.sid=S.sid GroupBy bid Having Max(age) > 40.
Aki Hecht Seminar in Databases (236826) January 2009
1 CIS607, Fall 2005 Semantic Information Integration Presentation by Dong Hwi Kwak Week 5 (Oct. 26)
Chapter 8 File organization and Indices.
1 CIS607, Fall 2005 Semantic Information Integration Presentation by Paea LePendu Week 8 (Nov. 16)
This material in not in your text (except as exercises) Sequence Comparisons –Problems in molecular biology involve finding the minimum number of edit.
1 CIS607, Fall 2005 Semantic Information Integration Presentation by Enrico Viglino Week 3 (Oct. 12)
1 CIS607, Fall 2005 Semantic Information Integration Presentation by Jiawei Rong Week 10 (Nov. 30)
Discovering Complex Matchings across Web Query Interfaces: A Correlation Mining Approach Bin He, Kevin Chen-Chuan Chang, Jiawei Han Presented by Dayi Zhou.
1 CIS607, Fall 2005 Semantic Information Integration Presentation by Shiwoong Kim Week 9 (Nov. 23)
1 The Information School of the University of Washington Nov 29fit forms © 2006 University of Washington More Forms INFO/CSE 100, Fall 2006 Fluency.
1 CIS607, Fall 2005 Semantic Information Integration Presentation by Zebin Chen Week 7 (Nov. 9)
Finding Hidden Correlations and Filtering out Incorrect Matchings with Compatibility Detection across Web Query Interfaces Lei Lei June 11, 2004 June 11,
1 Statistical Schema Matching across Web Query Interfaces Bin He , Kevin Chen-Chuan Chang SIGMOD 2003.
1 CIS607, Fall 2004 Semantic Information Integration Presentation by Xiaofang Zhang Week 7 (Nov. 10)
Rutgers University Relational Algebra 198:541 Rutgers University.
1 CIS607, Fall 2005 Semantic Information Integration Presentation by Amanda Hosler Week 6 (Nov. 2)
Connecting Diverse Web Search Facilities Udi Manber, Peter Bigot Department of Computer Science University of Arizona Aida Gikouria - M471 University of.
Business Intelligence Instructor: Bajuna Salehe Web:
Information Extraction with Unlabeled Data Rayid Ghani Joint work with: Rosie Jones (CMU) Tom Mitchell (CMU & WhizBang! Labs) Ellen Riloff (University.
Data Mining Techniques
Modern Information Retrieval Chap. 02: Modeling (Structured Text Models)
GOOD MORNING! “Editing is the same as quarrelling with writers - same thing exactly. “ ~Harold Ross 14 Oct Please reclaim your English notebooks,
Automated Creation of a Forms- based Database Query Interface Magesh Jayapandian H.V. Jagadish Univ. of Michigan VLDB
1 Applying Collaborative Filtering Techniques to Movie Search for Better Ranking and Browsing Seung-Taek Park and David M. Pennock (ACM SIGKDD 2007)
Lecture 05 Structured Query Language. 2 Father of Relational Model Edgar F. Codd ( ) PhD from U. of Michigan, Ann Arbor Received Turing Award.
Minor Thesis A scalable schema matching framework for relational databases Student: Ahmed Saimon Adam ID: Award: MSc (Computer & Information.
Overviews of ITCS 6161/8161: Advanced Topics on Database Systems Dr. Jianping Fan Department of Computer Science UNC-Charlotte
Data Tagging Architecture for System Monitoring in Dynamic Environments Bharat Krishnamurthy, Anindya Neogi, Bikram Sengupta, Raghavendra Singh (IBM Research.
5-1 McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved.
M1G Introduction to Database Development 5. Doing more with queries.
Fall 2013, Databases, Exam 2 Questions for the second exam. Your answers are due by Dec. 18 at 4PM. (This is the final exam slot.) And please type your.
1 Relational Algebra and Calculas Chapter 4, Part A.
Google’s Deep-Web Crawl By Jayant Madhavan, David Ko, Lucja Kot, Vignesh Ganapathy, Alex Rasmussen, and Alon Halevy August 30, 2008 Speaker : Sahana Chiwane.
1 Relational Algebra Chapter 4, Sections 4.1 – 4.2.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Database Management Systems Chapter 4 Relational Algebra.
1 Context-Aware Internet Sharma Chakravarthy UT Arlington December 19, 2008.
Learning to Share Meaning in a Multi-Agent System (Part I) Ganesh Padmanabhan.
Finding frequent and interesting triples in text Janez Brank, Dunja Mladenić, Marko Grobelnik Jožef Stefan Institute, Ljubljana, Slovenia.
1 Choosing a Computer Science Research Problem. 2 Choosing a Computer Science Research Problem One of the hardest problems with doing research in any.
DeepDive Model Dongfang Xu Ph.D student, School of Information, University of Arizona Dec 13, 2015.
Fall 2001(c)opyright Brent M. Dingle 2001 Simple Sorting Brent M. Dingle Texas A&M University Chapter 10 – Section 1 (and some from Mastering Turbo Pascal.
1 Question Answering and Logistics. 2 Class Logistics  Comments on proposals will be returned next week and may be available as early as Monday  Look.
Making Holistic Schema Matching Robust: An Ensemble Approach Bin He Joint work with: Kevin Chen-Chuan Chang Univ. Illinois at Urbana-Champaign.
Organizing Structured Web Sources by Query Schemas: A Clustering Approach Bin He Joint work with: Tao Tao, Kevin Chen-Chuan Chang Univ. Illinois at Urbana-Champaign.
Discovering Complex Matchings across Web Query Interfaces: A Correlation Mining Approach Bin He Joint work with: Kevin Chen-Chuan Chang, Jiawei Han Univ.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
This was written with the assumption that workbooks would be added. Even if these are not introduced until later, the same basic ideas apply Hopefully.
PAIR project progress report Yi-Ting Chou Shui-Lung Chuang Xuanhui Wang.
Managing with Imperfect Information 1 6/14/2016. Perfect Information 2 1 – Source is WikipediaWikipedia Perfect information refers to the situation in.
My Favorite Top 5 Free Keyword Research Tools –
Lecture 2 Page 1 CS 236 Online Security Policies Security policies describe how a secure system should behave Policy says what should happen, not how you.
Harnessing the Deep Web : Present and Future -Tushar Mhaskar Jayant Madhavan, Loredana Afanasiev, Lyublena Antova, Alon Halevy January 7,
Linking Ontologies to Spatial Databases
Writing Inspirations, 2017 Aalto University
Statistical Schema Matching across Web Query Interfaces
Writing Inspirations, Spring 2016 Aalto University
Relational Algebra Chapter 4, Part A
Toward Large Scale Integration
Context-Aware Internet
Presentation transcript:

1 CIS607, Fall 2005 Semantic Information Integration Presentation by Dayi Zhou Week 4 (Oct. 19)

2 Questions from Homework 2 About the algorithm: – Can you go over exactly how the DCM picks possible positive and negative correlations between schemas? – Amanda – I would like to know more about what the “sparseness problem” and the “rare attribute problem”. – Paea – I think the matching selection is interesting and their algorithm is quite simple but seems to work. I wonder, though, if it would work quite as well in other applications besides Web- interfaces? It seems web interfaces do like real schemas which have more complex matching/mappings. But maybe with a user interface, combined with their heuristics, some very good results can be found? – Paea – What can be done with smaller data sets to improve the likelihood that truly semantically related matchings will result from choosing m n (M 1 ) > m n (M 2 )

3 Questions from Homework 2 (cont ’ d) About the data mining approach: – I don’t fully understand the data preparation part. Do the authors take form fields as attribute names and the query results as field data (i.e. domain data)? If yes, how can the authors explore all existing data – do they try all combinations in the web forms? -- Zebin – Another possibility is to take form fields as attribute names and the database data as domain data: but this is then dubious, why not directly study database schema matching then? Database schema should be closer to domain semantic. One justification for form extraction is that database schema is hard to achieve in practice (mostly hidden), but then the authors should explain how they achieve domain data through web forms, and why such data correspond to domain data. Subsequently, they can’t use database data to evaluate their matching algorithms (unless they assume a “perfect” data preparation algorithm). Moreover, apparently, some form data could be a view that is generated from several other fields, and the authors don’t mention such correlation. Furthermore, I am slightly dubious about the authors’ assumption that most data coexist in a page can’t be synonym. Finally, web form extraction might not be that direct, e.g. if we use multimedia data to represent the query result, how will the extraction tools tell the domain value? -- Zebin

4 Questions from Homework 2 (cont ’ d) About data mining approach: – Why not keep attribute groups that do not have a synonym, and ask the user to provide likely synonyms for future matching candidates? -- Shiwoong – Why the number of possible m:n matchings is exponential? Can we improve this algorithm-- DH – The threshold of positive and negative correlation measures are set manually, which is not so robust. Is there any way to apply some learner algorithm to DCM? -- Jiawei – What ’ s the challenge to match multi-domain schemas? -- Jiawei

5 Questions from Homework 2 (cont ’ d) Other questions about DCM: – Is there anything that uses DCM on the web to produce Dogpile-ish results? For example, something that could match Cuircut City's, BestBuy's, and Fry's' future deep-Web online product catalog, finding a certain user a specified product that may be at several different stores to bid for the best qualified product or lowest price. -- Amanda – Can the authors get better result with other methods than H- measure. Perhaps confidence measure? – Won’t the name-based matchings fall prey to homonyms?