Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Cloud Yury Lifshits Yahoo! Research

Similar presentations


Presentation on theme: "Data Cloud Yury Lifshits Yahoo! Research"— Presentation transcript:

1 Data Cloud Yury Lifshits Yahoo! Research http://yury.name

2 My Beliefs The key challenge in web search is structured search Part 1: What is structured search? The key challenge in structured search is collecting data Part 2: Data distribution & idea of Data Cloud Part 3: Demo: numeric data distribution The key challenge in collecting data is incentive design Part 4: Economics of data distribution

3 Structured Search

4

5

6

7

8

9

10

11 Data Structured data Entity unit: Identifier Metadata: –Explicit key-value pairs –Relational properties –Evaluation Semi-structured data Content unit: Body: text, video, audio, or image Metadata: –Explicit key-value pairs –Relational properties –Evaluation Data = data of entities + data of content

12 Structured Search Factoid search “ what's the value of property X of object Y “ Entity hubs –Domain hubs Structured object search "all concerts this weekend in SF under 20$ sorted by popularity" –Time focus –Ranking focus –Relations focus Structured content search "all videos with Tom Brady" “ all comments and blog posts about Bing"

13 Yury ’ s Wishlist Business-generated data Products, services, news, wishlists, contact data Reality stream, sensors Where what have happened Expert knowledge Glossary, issues, typical solutions, object databases, related objects graph Events Sport, concerts, education, corporate, community, private Market graph & signals Like, interested, use, following, want to buy; votes and ratings

14 Search as a Platform App 4 Classic search App 1 App 2 App 3 Structured Data Web index Post analysis Query analysis

15 Data Cloud How to collect all structured data in one place?

16 Data Producers People: forums, wiki, mail groups, blogs, social networks Enterprizes: product profiles, corporate news, professional content Sensors: GPS modules, web cameras, traffic sensors, RFID Transactional data

17 Data Distributors Data distributor is any technical solution to accumulate, organize and provide access to structured and semi- structured data Data publisher: the original distributor of some data Data retailer: a consumer- facing distributor of some data

18 Data Consumers Humans –Email –Aggregators: news, friend feeds, RSS readers –Search –Browsing / random walks Intelligence projects –Recommendation systems –Trend mining

19 Data Cloud Data Cloud is a centralized fully-functional data distribution service Success metric for data cloud strategy = the total “ value ” of data on the cloud

20 To-Cloud Solutions Extraction – DBpedia.org, “ web tables ” Semantic markup, data APIs – Yahoo! SearchMonkey Feeds – Yahoo! Shopping – Disqus.com, js-kit.com, Facebook Connect Direct publishing

21 On-Cloud Solutions Ontology maintenance – Freebase Normalization, de-duplication, antispam Named entity recognition, metadata inference, ranking Data recycling (cross-references) – Amazon Public Data Sets – Viral license Hosted search –Yahoo! BOSS

22 From-Cloud Solutions Search, audience –Y! SearchMonkey, Google Base Data API, dump access, update stream Custom notifications –Gnip.com Data cloud as a primary backend Access control –Ad distribution. (AT&T and Yahoo! Local deal)

23 Demo: webNumbr.com Joint work with Paul Tarjan

24

25 webNumbr.com: Import Crawl numbers from the web URL + XPath + regex Create “ numbr pages ” Update their values every hour Keep the history Anyone can create a numbr http://webnumbr.com/create

26 webNumbr.com: Export Embed code Graphs Search & browse RSS

27 Economics of Data Distribution Joint work with Ravi Kumar and Andrew Tomkins

28 Network Effect in Two-Sided Markets Two sided market = every product serves consumers of two types A and B Cross-side network effect: the more type-A users product X has, the more attractive it is for type-B consumers and vice versa Examples: operating systems, credit cards, e-commerce marketplaces Two-sided network effects: A theory of information product design G. Parker, M.W. Van Alstyne, N. Bulkley, M. Van Alstyne

29 Basic model Distributors D1, … Dk Producer/consumer joins only one distributor Initial shares (p1,c1) … (pk,ck) New consumer selects a distributor with a probability proportional to pi New producer selects a distributor with probability proportional to ci

30 Basic model a1 a4 a2 a3 a1 a4 a3 a2

31 Market Shares Dynamics Theorem 1 Market shares will stabilize Theorem 2 With super-liner preference rule one of distributors will tip Theorem 3 With sub-liner preference rule market shares will flatten

32 External Factor Preference rule with external factor: ei+ci/(c1+ … +ck) Theorem 4 Market shares will stabilize on e1 : e2 : … : ek

33 Coalition Data Cloud

34 Coalitions Theorem 5 If all market shares are below 1/sqrt(k) coalition (sharing data) is profitable for all distributors Corollary Coalitions are not monotone Example: 5 : 4 : 1 : 1

35 Model Variations Same-side network effect Different p-to-c and c-to-p rules Multi-homing (overlapping audiences) n^2 vs. nlog n revenue models Mature market: newcomer rate = departing rate Diverse market (many types of producers and consumers) Newcoming and departing distributors Directed coalitions

36 Challenges

37 Marketing Data demand? Data offerings? Requirements for distribution technology?

38 Incentive design Incentives for data sharing? Centralized or distributed? –For profit or non-profit? Data licensing and ownership? Monetizing data cloud?

39 More Challenges Prototyping: Data marketplace: open data & data demand Search plugins: related objects, glossaries, object timelines Publishing tools for structured data Data client: structured news, bookmarking, notifications Tech design: Access management Namespace design User interface: Structured search UI Discovery UI

40 Thanks! Follow my research: http://twitter.com/yurylifshits http://yury.name/blog


Download ppt "Data Cloud Yury Lifshits Yahoo! Research"

Similar presentations


Ads by Google