Download presentation
Presentation is loading. Please wait.
Published byImogene Barber Modified over 9 years ago
1
Data and Information Systems Laboratory University of Illinois Urbana-Champaign CS 512 Jan 18, 2010 WinaCS Project Web Entity Extraction and Mapping Discovering and Propagating Context Tim Weninger Department of Computer Science University of Illinois Urbana-Champaign, Urbana, IL
2
Data and Information Systems Laboratory University of Illinois Urbana-Champaign CS 512 Jan 18, 2010 Past, Present, Future Past – Entity search and retrieval is one of the dreams of the Web – TBL Present – Ranking and Retrieval bi-directional approach 1) Information Networks 2) Web mining and Information Extraction a) List Finding b) Entity-page Discovery c) Entity-page Mapping Future – InfoBase Project Information extraction via Schema Discovery
3
Data and Information Systems Laboratory University of Illinois Urbana-Champaign CS 512 Jan 18, 2010 Finding lists on the Web is Hard! (KDD Explorations Dec. 2010) 1. Google Sets 2. WebTables 3. Mining Data Records (MDR) 4. World Wide Tables (WWT) 5. Tag Path Clustering 6. RoadRunner 6. SEAL 7. Visual List Extraction 8. VIsual-based Page Segmentation (VIPS) 9. Visualized Element Nodes Table extraction (VENTex)
4
Data and Information Systems Laboratory University of Illinois Urbana-Champaign CS 512 Jan 18, 2010 Why is finding lists important? Jiawei Han ChengXiang Zhai Kevin Chang Dan Roth Marianne Winslett Jiawei Han ChengXiang Zhai Kevin Chang Dan Roth Marianne Winslett Sarita Adve Tarek Adelzaher Vikram Adve Gul Agha … Charu Aggarwal Deepayan Chakrabarti Ed Chang Kevin Chang Olivier Chapelle Chris Clifton Jiawei Han … C ORRECTION I NFERENCE D ISAMBIGUATION R ECOMMENDATION ETC
5
Data and Information Systems Laboratory University of Illinois Urbana-Champaign CS 512 Jan 18, 2010 Our list finding algorithm (Accepted: WWW 2011)
6
Data and Information Systems Laboratory University of Illinois Urbana-Champaign CS 512 Jan 18, 2010 List Finding for Entity Page Discovery
7
Data and Information Systems Laboratory University of Illinois Urbana-Champaign CS 512 Jan 18, 2010 Growing Parallel Paths (Accepted: WWW 2011) Result:
8
Data and Information Systems Laboratory University of Illinois Urbana-Champaign CS 512 Jan 18, 2010 Mapping Pages to Records (CIKM’10)
9
Data and Information Systems Laboratory University of Illinois Urbana-Champaign CS 512 Jan 18, 2010 Mapping Pages to Records (CIKM’10) Example A p1 ={People, Faculty, Dan Roth, Personal Site} A p2 ={Research, Data Mining, Dan Roth, Personal Site} Bag of Anchors: {Research:1, People:1, Faculty:1, Data Mining:1, Dan Roth:2, Personal Site:2} Sorted Bag of Anchors: A u;v1 ={Dan Roth:2/2=1, Research:1/2=0.5, Data Mining:1/2 =0.5, Personal Site:2/5=0.4, People:1/3=0.33, Faculty:1/3=0.33}
10
Data and Information Systems Laboratory University of Illinois Urbana-Champaign CS 512 Jan 18, 2010 CSMap Locations of top 25 computer science departments. Automatically generated by extracting and ranking 5 digit numbers from Entity Web pages.
11
Data and Information Systems Laboratory University of Illinois Urbana-Champaign CS 512 Jan 18, 2010 Next Steps: The hard part! Infer categories/schemas from a set of WebPages Example: What does these entities have in common? Name Address ZipCode Publications Collaborators Organizations How can we infer this schema? Wikipedia? How can we populate it?
12
Data and Information Systems Laboratory University of Illinois Urbana-Champaign CS 512 Jan 18, 2010 Idea! Propagating schemas
13
Data and Information Systems Laboratory University of Illinois Urbana-Champaign CS 512 Jan 18, 2010 Next Steps: The hardest part! NameAddressZipCodeOrganizationsCollaboratorsPublications Jiawei Han A1FK Tarek Adelzaher B2FK Gerald DeJong C3FK Michael Heath D4FK This can be modeled as a heterogeneous information network. Thus, Ranking and Clustering is possible So is semantic search, keyword search and typal search Cube operations are possible Given Inferred
14
Data and Information Systems Laboratory University of Illinois Urbana-Champaign CS 512 Jan 18, 2010 WinaCS – An information network based Web search engine
15
Data and Information Systems Laboratory University of Illinois Urbana-Champaign CS 512 Jan 18, 2010 Questions? Challenges?
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.