Visualization of Heterogeneous Data Mike Cammarano Xin (Luna) Dong Bryan Chan Jeff Klingner Justin Talbot Alon Halevy Pat Hanrahan.

Slides:



Advertisements
Similar presentations
BY ANISH D. SARMA, XIN DONG, ALON HALEVY, PROCEEDINGS OF SIGMOD'08, VANCOUVER, BRITISH COLUMBIA, CANADA, JUNE 2008 Bootstrapping Pay-As-You-Go Data Integration.
Advertisements

Crawling, Ranking and Indexing. Organizing the Web The Web is big. Really big. –Over 3 billion pages, just in the indexable Web The Web is dynamic Problems:
Image Search Presented by: Samantha Mahindrakar Diti Gandhi.
Supervised by Prof. LYU, Rung Tsong Michael Department of Computer Science & Engineering The Chinese University of Hong Kong Prepared by: Chan Pik Wah,
CS335 Principles of Multimedia Systems Content Based Media Retrieval Hao Jiang Computer Science Department Boston College Dec. 4, 2007.
Visual Querying By Color Perceptive Regions Alberto del Bimbo, M. Mugnaini, P. Pala, and F. Turco University of Florence, Italy Pattern Recognition, 1998.
Edgar Allan Poe The Man & The Mystery. January , Boston, Massachusetts October , Baltimore, Maryland.
Problem: Extracting attribute set for classes (Eg: Price, Creator, Genre for class ‘Video Games’) Why?  Attributes are used to extract templates which.
Written By: Jamez McNutt Edgar Allan Poe.  Edgar Allen Poe was an American writer, a poet, editor and a literary critic.  He was best known for his.
Query session guided multi- document summarization THESIS PRESENTATION BY TAL BAUMEL ADVISOR: PROF. MICHAEL ELHADAD.
Edgar Allan Poe Master of the Macabre. Biography Born in Boston in 1809; died in Baltimore in 1849 at the age of 40. Lived in various cities including.
Supporting the Automatic Construction of Entity Aware Search Engines Lorenzo Blanco, Valter Crescenzi, Paolo Merialdo, Paolo Papotti Dipartimento di Informatica.
Authors: Bhavana Bharat Dalvi, Meghana Kshirsagar, S. Sudarshan Presented By: Aruna Keyword Search on External Memory Data Graphs.
Search Engines and Information Retrieval Chapter 1.
Accurately and Reliably Extracting Data from the Web: A Machine Learning Approach by: Craig A. Knoblock, Kristina Lerman Steven Minton, Ion Muslea Presented.
Keyword Search on External Memory Data Graphs Bhavana Bharat Dalvi, Meghana Kshirsagar, S. Sudarshan PVLDB 2008 Reported by: Yiqi Lu.
Harvesting Structured Summaries from Wikipedia and Large Text Corpora Hamid Mousavi May 31, 2014 University of California, Los Angeles Computer Science.
Tables to Linked Data Zareen Syed, Tim Finin, Varish Mulwad and Anupam Joshi University of Maryland, Baltimore County
IST SEWASIE SEWASIE 3rd Review March 14, 2005 SEWASIE Value Proposition and End User Demo Andreas Becks.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
EDGAR ALLAN POE The Tormented Life of a Disturbed Genius.
1 Applying Collaborative Filtering Techniques to Movie Search for Better Ranking and Browsing Seung-Taek Park and David M. Pennock (ACM SIGKDD 2007)
ROBUST RESOURCE ALLOCATION OF DAGS IN A HETEROGENEOUS MULTI-CORE SYSTEM Luis Diego Briceño, Jay Smith, H. J. Siegel, Anthony A. Maciejewski, Paul Maxwell,
Improving Web Spam Classification using Rank-time Features September 25, 2008 TaeSeob,Yun KAIST DATABASE & MULTIMEDIA LAB.
Gapped BLAST and PSI- BLAST: a new generation of protein database search programs By Stephen F. Altschul, Thomas L. Madden, Alejandro A. Schäffer, Jinghui.
Knowledge Representation and Indexing Using the Unified Medical Language System Kenneth Baclawski* Joseph “Jay” Cigna* Mieczyslaw M. Kokar* Peter Major.
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
Assembling Sequences Using Trace Signals and Additional Sequence Information Bastien Chevreux, Thomas Pfisterer, Thomas Wetter, Sandor Suhai Deutsches.
1 Efficient Search Ranking in Social Network ACM CIKM2007 Monique V. Vieira, Bruno M. Fonseca, Rodrigo Damazio, Paulo B. Golgher, Davi de Castro Reis,
A Probabilistic Graphical Model for Joint Answer Ranking in Question Answering Jeongwoo Ko, Luo Si, Eric Nyberg (SIGIR ’ 07) Speaker: Cho, Chin Wei Advisor:
January 19, 1809 – October 7,  American poet, short story writer, literary critic, and editor  Known for his tales of mystery and stories about.
Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington.
JJE: INEX XML Competition Bryan Clevenger James Reed Jon McElroy.
Date: 2012/4/23 Source: Michael J. Welch. al(WSDM’11) Advisor: Jia-ling, Koh Speaker: Jiun Jia, Chiou Topical semantics of twitter links 1.
Automatic Image Annotation by Using Concept-Sensitive Salient Objects for Image Content Representation Jianping Fan, Yuli Gao, Hangzai Luo, Guangyou Xu.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
Google’s Deep-Web Crawl By Jayant Madhavan, David Ko, Lucja Kot, Vignesh Ganapathy, Alex Rasmussen, and Alon Halevy August 30, 2008 Speaker : Sahana Chiwane.
Talk Schedule Question Answering from Bryan Klimt July 28, 2005.
Review: Tree search Initialize the frontier using the starting state While the frontier is not empty – Choose a frontier node to expand according to search.
Ranking CSCI 572: Information Retrieval and Search Engines Summer 2010.
LOGO 1 Corroborate and Learn Facts from the Web Advisor : Dr. Koh Jia-Ling Speaker : Tu Yi-Lang Date : Shubin Zhao, Jonathan Betz (KDD '07 )
EDGAR ALLAN POE´S LIFE By: Mª Ángeles Teruel Moreno 1º BHI.
ASSESSING LEARNING ALGORITHMS Yılmaz KILIÇASLAN. Assessing the performance of the learning algorithm A learning algorithm is good if it produces hypotheses.
Scalable Keyword Search on Large RDF Data. Abstract Keyword search is a useful tool for exploring large RDF datasets. Existing techniques either rely.
A Novel Visualization Model for Web Search Results Nguyen T, and Zhang J IEEE Transactions on Visualization and Computer Graphics PAWS Meeting Presented.
Edgar Allan Poe. Date: October 15 th, 2010 Standards: 1.9 Information, Communication, and Technology Literacy Objectives: To research information using.
Date: 2012/08/21 Source: Zhong Zeng, Zhifeng Bao, Tok Wang Ling, Mong Li Lee (KEYS’12) Speaker: Er-Gang Liu Advisor: Dr. Jia-ling Koh 1.
From Theory to Practice: Efficient Join Query Processing in a Parallel Database System Shumo Chu, Magdalena Balazinska and Dan Suciu Database Group, CSE,
The Beginning Born January 19, 1809 in Boston, Massachusetts. He was the second child born to traveling actors. Poe’s father abandoned them shortly after.
Edgar Allan Poe Biography. Place of birth Date of birth Boston, MA January 19, 1809.
Edgar Allan Poe. His Life Born Jan. 19, 1809 in Boston, MA Died Oct. 7, 1849 Orphaned after his mother’s death Adopted by John and Frances Allan.
Computer Vision Group Department of Computer Science University of Illinois at Urbana-Champaign.
1 Random Walks on the Click Graph Nick Craswell and Martin Szummer Microsoft Research Cambridge SIGIR 2007.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
BIOGRAPHY EDGAR ALLAN POE Born Edgar Poe January 19, 1809 Boston, Massach usetts, United States Died October 7, 1849 (aged 40) Baltimore, Ma ryland, United.
GoRelations: an Intuitive Query System for DBPedia Lushan Han and Tim Finin 15 November 2011
Edgar Allan Poe Biography. Place of birth Date of birth Boston, MA January 19, 1809.
Review: Tree search Initialize the frontier using the starting state
Edgar Allan Poe
Edgar Allan Poe 12th English Burleson.
Visualization of Heterogeneous Data
Edgar Allen Poe BY Kaleb Pitchford.
François Guimbretière
Iterative Optimization
Literary reference center
PageRank algorithm based on Eigenvectors
International Marketing and Output Database Conference 2005
Bidirectional Query Planning Algorithm
DARK ROMANTICISM Valued intuition over logic & reason
Connecting the Dots Between News Article
Presentation transcript:

Visualization of Heterogeneous Data Mike Cammarano Xin (Luna) Dong Bryan Chan Jeff Klingner Justin Talbot Alon Halevy Pat Hanrahan

Homogeneous data is easy. CompanyFoundedHeadquartersLogo Microsoft N, W Enron N, 95.3 W Google N, W

Homogeneous data is easy. CompanyFoundedHeadquartersLogo Microsoft N, W Enron N, 95.3 W Google N, W

Homogeneous data is easy. CompanyFoundedHeadquartersLogo Microsoft N, W Enron N, 95.3 W Google N, W

Multiple sources? Collaborative content Semi-structured data {{Infobox Writer | bgcolour = silver | name = Edgar Allan Poe | image = Edgar_Allan_Poe_2.jpg | caption = This [[daguerreotype]] of Poe was taken in | birth_date = {{birth date|1809|1|19|mf=y}} | birth_place = [[Boston, Massachusetts]] [[United States|U.S.]] | death_date = {{death date and age|1849|10|07|1809|01|19}} | death_place = [[Baltimore, Maryland]] [[United States|U.S.]] | occupation = Poet, short story writer, editor, literary critic | movement = [[Romanticism]], [[Dark romanticism]] | genre = [[Horror fiction]], [[Crime fiction]], [[Detective fiction]] | magnum_opus = The Raven | spouse = [[Virginia Eliza Clemm Poe]]...

DBpedia.org DBpedia is a community effort to extract structured information from Wikipedia and to make this information available on the Web. The DBpedia dataset currently provides information about more than 1.95 million “things”, including at least: 80,000 persons 70,000 places 35,000 music albums 12,000 films According to DBpedia.org:

Database size We use a subset of DBpedia, mostly infoboxes and geonames. 30 M triples 2.5 GB We currently use an in-memory database. Hardware is dual processor, dual core AMD opteron 280’s w/ 8GB RAM.

A glimpse inside DBpedia

Kerry: Poe: dbp: PLACE_OF_BIRTH dbp: latitude 39° 41´ 45˝ N dbp: birth_place w3c: owl#sameAs geonames: latitude

Heterogeneity Types Decimal vs. sexagesimal coordinates Names PLACE_OF_BIRTH vs. birth_place Paths dbp: PLACE_OF_BIRTH dbp: latitude vs. dbp: birth_place w3c: owl#sameAs geonames: latitude 39° 41´ 45˝ N39.70

Scenario / Demo

Vision: Self-configuring data

Contributions Visualize heterogeneous data represented as a graph of relationships between objects Describe inputs to a visualization: Visualization template Set of keywords per attribute Find attributes needed for a visualization by searching paths Within an iterative process of search, visualization, and refinement Present algorithm for finding and ranking paths based on keywords Efficiently enumerate paths A* Random sampling Rank according to: Keywords Heuristics about graph structure

Integrate searching and visualization Search for potentially desirable paths Refine path Visualize results selections in context

Matching problem Find the best path to a number for “state latitude” state capital latitude Dianne Feinstein 42.4 pop birth place spouse latitude 39.0 party house leader name color blue Harry Reid governor 4 children

state.capital.latitude state.pop spouse.birth_place.latitude state.governor.children state.capital.latitude state.pop spouse.birth_place.latitude state.governor.children Basic algorithm 1. Explore graph 2. Find paths ending in a number 3. Score and rank paths using TF/IDF Find the best path to a number for “state latitude” state capital latitude Dianne Feinstein 42.4 pop birth place spouse latitude 39.0 party house leader name color blue Harry Reid governor 4 children

Improving execution time New pruning techniques since the paper submission A* Bidirectional search on terms Random sampling

Pruning techniques state capital latitude Dianne Feinstein 42.4 pop birth place spouse latitude 39.0 party house leader name color blue Harry Reid governor 4 children Most paths do not correspond to a “state latitude” How can we avoid such bad paths? No mention of latitude Many unrelated terms No potential paths

state capital latitude Dianne Feinstein 42.4 pop birth place spouse latitude 39.0 party house leader name color blue Harry Reid governor 4 children Pruning techniques / A* Search Use a scoring function that penalizes unrelated terms Then an A* search ignores paths with many such terms Many unrelated terms

A* pruning results Senators on map Average # of edges examined at each depth, full enumeration: Average # of edges examined at each depth, using A*: 1234 Image Name latitude Image Name latitude

state capital latitude 42.4 pop birth place spouse latitude 39.0 party house leader name color blue Harry Reid governor 4 Dianne Feinstein children Pruning techniques / Random Sampling Do normal A* search for n randomly chosen nodes No potential paths A hit!

state capital latitude 42.4 pop birth place spouse latitude 39.0 party house leader name color blue Harry Reid governor 4 children Pruning techniques / Random Sampling Do normal A* search for n randomly chosen nodes No potential paths John Kerry Only search known hits for the remaining nodes Prevents repeatedly checking where there are likely no paths A hit!

Sampling results Average # edges examined at all depths: Total edges examined: without sampling 7360×99= with sampling 7360× ×89= Seed nodes (10)Others (89) Image92082 Name4035 State Latitude Longitude TOTAL

Performance Runtime for senators’ example: Runtime for astronauts’ example: Runtime for each field in countries’ example: Performance now interactive With new pruning techniques, ~100x faster than reported in paper. State latitudeState longitudeImageNameInstancestotal sec Mission launchMission insigniaNameInstancestotal sec GDP per capitaInflationFlagNameInstancestotal sec

Variations – senators’ flags versus birth places

Timeline of manned spaceflight

Scatterplot of inflation vs. GDP

Precision / Recall CorrectIncorrect 6434Accepted 10Rejected Senators – state latitude: CorrectIncorrect 20658Accepted 90Rejected Countries – gdp per capita: CorrectIncorrect 866Accepted 06Rejected Senators – image:

Summary Visualize heterogeneous data represented as a graph of relationships between objects Produce visualizations conforming to templates by searching for needed attributes Present algorithm for finding and ranking paths based on keywords Efficiently enumerate paths Rank Now fast enough for interactive use High precision and recall

Future work Improvements UI support for initial discovery and query refinement Robustness of terms / Improved ranking Automatic selection of visualization Visualizing missing data Visualizations that reflect result relevance (selective emphasis) Deploy on the web Wikipedia The whole web

Acknowledgements Funding sources: Boeing RVAC CALO Tools and data: DBpedia MIT SIMILE project timeline Tom Patterson’s map artwork

The end!

state capital latitude Dianne Feinstein 42.4 pop birth place spouse latitude 39.0 party house leader name color blue Harry Reid governor 4 children Pruning techniques Bidirectional Search Before A*, search one step back from each literal, following only edges that match keywords No mention of latitude This saves one step during forward A* search

Need for multiple paths