Download presentation
Presentation is loading. Please wait.
1
A Web of Concepts Dalvi, et al. Presented by Andrew Zitzelberger
2
Vision Transform hyperlinked bags of words into semantically rich aggregate view of information on the web.
3
Concept Things of interest – Searching for information – Accomplishing a task Reservations, etc.
4
Instances Record of a concept – Restaurant Gochi (19980 Homestead Rd Cupertino CA) – Academia? Publications, research institutions
5
Instance Representation Loosely-structured record (lrec) – Attribute-key, value pairs – Unique id field Entity matching problem – Metadata Attribute list
6
Domain Set of related concepts – Academic community domain = {publications, people, conferences}
7
Usage Study Instance vs. Concept Search yelp.com – Month of queries resulting in a click (restaurants) – 59% specific business URL – 19% search URL either specific business or group – 11% specific group URL
8
Usage Study Concept Attribute Search Remove restaurant name and location information from query Co-occuring words: – Menu (3%), coupons (1.8%), online, weekly specials, locations (1.5%) – Nutrition, to go, delivery, careers, cod
9
Usage Study Aggregation Value 59% clicked on at least one other URL 35% clicked on at least two other URLs Small manual evaluation indicates pages are often about the same business.
10
Usage Study Concepts vs. Browsing 42% of homepage visits are from search engine – Immediately following URL 11.5% location 9% menu 1% coupons 10.5% of user trails contain more than one distinct instance of the restaurant concept
11
Extraction Create new records from the web – Information extraction – Linking – Analysis Meta-data tagging (cuisine type)
12
Domain-centric vs. Site-centric Extraction Site-centric extraction – Wrappers for page structure – Probabilistic models (CRF) Domain-centric extraction – Fields of interest – Statistical properties (single zip code, etc.) – Structure components (lists, link relationships)
13
Domain-centric Extraction Aggregator mining – Learn from extracted knowledge (similar menus) Matching – Text is “about” a record (restaurant review)
14
Application Aggregation
15
Application Session Optimization User understanding – Historical modeling – Session modeling Content understanding Example: Birks – Birks and Mayors (luxury Jewelers) vs. Birk’s Steakhouse
16
Application Browse Optimization Alternatives: (Restaurants) – Similar type of cuisine – Similar location – Similar quality Augmentations: (Camera) – Batteries – Memory cards
17
Concept Search Result Pages – shows multiple records Concept Pages – information about an instance Article Pages – a piece of authored text
18
Advertising Increase in targeted advertisements Target concepts rather than keywords
19
Challenges Transfer learning – Transfer extractor knowledge Tracking uncertainty – Accuracy issues – “Web of concepts is not a one time affair” Wrapper problems Concept updates Relevance Measures – User satisfaction
20
Related Work Information Extraction/Integration Systems Dataspace Systems Semantic Web
21
Future Work Enrich representation model – Path storage to data – Provenance, versions, uncertainty – Hierarchal relationships (containment or inheritance) Ranking of disparate sources
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.