Data Cloud Yury Lifshits Yahoo! Research

Slides:



Advertisements
Similar presentations
The Status of Technology Today (in 30 min) AmeriCorps National Best Practices Conference May 6, 2009 Galen Panger, Google for Non-Profits.
Advertisements

ECircle SLC Emerging Entrepreneurs. Topics  Google Panda  Google Push into Social / +1  Keyword Targeted Domain Name Dropping  Changes in Google Local.
Introducing Calais A Thomson Reuters initiative designed to make content interoperable on the Web A free API that anyone can use An easy way to automatically.
Social Media & the Enterprise, Part 3 Other Social Media Blogs, YouTube & Social Media Beyond Facebook & Twitter Presented by Sean Gallagher
RSS, real simple syndication Skills: subscribe to feeds, read feeds IT concepts: RSS feed, polling vs. publish- subscribe, stand-alone vs Web based reader,
The Web is perhaps the single largest data source in the world. Due to the heterogeneity and lack of structure, mining and integration are challenging.
Accelerate Business Success With CRM CRM Interoperability.
Overview of Web Data Mining and Applications Part I
SEO PACKAGES. Types of Plans Starter Plan Business Plan Enterprises Plan.
+ Beginning Blogging by Six Sisters’ Stuff. + Just start! What do you want to blog about? What are you an expert in? What makes you unique? What are you.
Creating Online Class Communities Jennifer Dorman Discovery Education
A Social Help Engine for Online Social Network Mobile Users Tam Vu, Akash Baid WINLAB, Rutgers University May 21,
Top 5 Facebook Tips Mark Smith Rosemary Turner. What is Facebook? Users create a personalised profile for themselves and then add people as friends to.
SEO Lunch How to Grow A Business in 3 Bites Akiva Ben-Ezra
Search Engine Optimization
Databases & Data Warehouses Chapter 3 Database Processing.
. Outline 1.About LinkedIn 2.Personal Profile 3.Make Connections 4.Communicate 5.Groups 6.Pages 7.Events 8.Answers 9.Applications 10.Direct Ads.
Apps VS Mobile Websites Which is better?. Bizness Apps Survey Bizness Apps surveyed over 500 small business owners with both a mobile app and a mobile.
Yahoo! Proprietary. Not for re-distribution. 0  Trip Planner is a tool to help consumers envision, research, plan, and share their travel experience 
Web 2.0: Concepts and Applications 2 Publishing Online.
Result presentation. Search Interface Input and output functionality – helping the user to formulate complex queries – presenting the results in an intelligent.
A Case Study in Success Online How to generate revenue through content marketing.
Configuring Social Media, Google Analytics, and Gadgets Lila Bronson Training Manager, OmniUpdate, Inc.
Building your brand as a recruiter using social media tools Esther Riesenbeck
Going where consumers are!
Version 1.0 Requirements.  PROstructor ◦ PROstructor is a community and service to finding, scheduling and paying professional for private, group lessons.
Discovery Login credentials Identity Given/family name User name(s) Assigned number(s) (e.g., governmental).... Addressing snail mail address address.
Copyright © 2014 Pearson Education, Inc. 1 It's what you learn after you know it all that counts. John Wooden Key Terms and Review (Chapter 5)
FACEBOOK MARKETING FOR BUSINESS. Facebook Optimize Facebook Page Build Audience Setup Facebook Advertisement Facebook Page Insight.
AVI/Psych 358/IE 340: Human Factors Web 2.0 November
OFF Page SEO Tips & Tricks Step By Step By IT Team of SlideLearn.com.
Interfaces of Attention Yury Lifshits (Yahoo! Research)
TWIRL Twinning virtual World (on- line) Information with Real world (off-Line) data sources Kick-Off Meeting Cassidian 08 & 09 October 2012, Paris - France.
PUBLISHING ONLINE Chapter 2. Overview Blogs and wikis are two Web 2.0 tools that allow users to publish content online Blogs function as online journals.
Module 3: Business Information Systems Chapter 8: Electronic and Mobile Commerce.
Why I LIKE the Facebook Database… Sharon Viente May 2010.
The Bonner Program: Serve 2.0 Initiative “Access to Education, A program of: The Corella & Bertram Bonner Foundation 10 Mercer Street, Princeton, NJ
Link building or “Off Page SEO” is more important than on page SEO. Google puts a lot of ranking weight on incoming links to your website. Links to your.
Another reason to post PLR content is to draw in search engine traffic. If you’re going to use your PLR content, then you will need to rewrite it! Rewrite.
© 2008 Convio, Inc. Social Media for Newbies Cynthia Balusek, Convio February 10, 2009.
Understanding and Using Social Media. Attention Overload.
Edwin Ombego Software Developer Web Portals Key Concepts Your Logo.
JOINT BUY PRE-ALPHA. Business Canvas Update – Last Week Customer Segments Cost Structure Value Propositions Revenue Streams ChannelsCustomer Relationships.
Social Media 101 An Overview of Social Media Basics.
Advanced PR Technology in Practice Bill Barnes Co-Founder & Executive Vice-President Enquiro Search Solutions, Inc.
Integrated Social Media Solutions for Leading Digital Destinations UT Online Journalism Conference 4/17/2009 Steve Semelsberger SVP & GM Pluck Corporation.
Features. Yahoo! Features My Yahoo! Flickr Delicious Yahoo! Pipes Yahoo! Maps Yahoo! Developer Network (YDN) Yahoo! Finance Yahoo! Mobile Yahoo! Hot.
WEB 2.0 PATTERNS Carolina Marin. Content  Introduction  The Participation-Collaboration Pattern  The Collaborative Tagging Pattern.
© 2011 Pearson Education, Inc., publishing as Longman Publishers. 1 Chapter 27 Blogs, Wikis, and Social Networks Technical Communication, 12 th Edition.
Copyright 2008, Near-Time, Inc. All other trademarks are property of their respective owners 1 Bringing Collaboration and Publishing Together.
Markus Müller Software Engineering for Business Information Systems (sebis) Department of Informatics Technische Universität München, Germany.
KMS & Collaborative Filtering Why CF in KMS? CF is the first type of application to leverage tacit knowledge People-centric view of data Preferences matter.
Yahoo! BOSS Open up Yahoo!’s Search data via web services Developer & Custom Tracks Big Goal – If you’re in a vertical and you perform a search, you should.
E-commerce Marketing & Advertising
Building a Social Media Presence Participants will look at the BCPS social media outlets (Twitter, Facebook, Flickr, Vimeo, Instagram, blogs) and relevant.
 Smartphones – iPhone, Android, Blackberries, etc  Tablets – iPad, Android, Windows, Google, etc.  Computers Basically anything that can connect to.
Social Media & Social Networking 101 Canadian Society of Safety Engineering (CSSE)
Dynamics Tech Conference 2015 PART 1. Dynamics R3 CU8 Retail warehouse Built-in modules for warehouse efficiency License plating.
Get Inspired: MARKETING WITH VIDEO ACROSS THE CUSTOMER LIFECYCLE.
Getting Started Telligent or SharePoint (or Hybrid)?
Chapter 8: Web Analytics, Web Mining, and Social Analytics
Social Media Tools Building a company blog presence presented by Tom Swift Wednesday Nov 18, 2009: 10:45 AM (PST) – 11:30 AM Building Websites and Web.
Frompo is a Next Generation Curated Search Engine. Frompo has a community of users who come together and curate search results to help improve.
Data mining in web applications
Chapter 7 E-commerce Marketing Communications. Chapter 7 E-commerce Marketing Communications.
User Characterization in Search Personalization
Publishing Communities
Overview The promotion of products or brands via Digital media Digital Media  Search Engine Marketing Search Engine Marketing  Social Media Marketing.
Overview of Social Computing in Microsoft SharePoint 2010
About Thetus Thetus develops knowledge discovery and modeling infrastructure software for customers who: Have high value data that does not neatly fit.
Presentation transcript:

Data Cloud Yury Lifshits Yahoo! Research

My Beliefs The key challenge in web search is structured search Part 1: What is structured search? The key challenge in structured search is collecting data Part 2: Data distribution & idea of Data Cloud Part 3: Demo: numeric data distribution The key challenge in collecting data is incentive design Part 4: Economics of data distribution

Structured Search

Data Structured data Entity unit: Identifier Metadata: –Explicit key-value pairs –Relational properties –Evaluation Semi-structured data Content unit: Body: text, video, audio, or image Metadata: –Explicit key-value pairs –Relational properties –Evaluation Data = data of entities + data of content

Structured Search Factoid search “ what's the value of property X of object Y “ Entity hubs –Domain hubs Structured object search "all concerts this weekend in SF under 20$ sorted by popularity" –Time focus –Ranking focus –Relations focus Structured content search "all videos with Tom Brady" “ all comments and blog posts about Bing"

Yury ’ s Wishlist Business-generated data Products, services, news, wishlists, contact data Reality stream, sensors Where what have happened Expert knowledge Glossary, issues, typical solutions, object databases, related objects graph Events Sport, concerts, education, corporate, community, private Market graph & signals Like, interested, use, following, want to buy; votes and ratings

Search as a Platform App 4 Classic search App 1 App 2 App 3 Structured Data Web index Post analysis Query analysis

Data Cloud How to collect all structured data in one place?

Data Producers People: forums, wiki, mail groups, blogs, social networks Enterprizes: product profiles, corporate news, professional content Sensors: GPS modules, web cameras, traffic sensors, RFID Transactional data

Data Distributors Data distributor is any technical solution to accumulate, organize and provide access to structured and semi- structured data Data publisher: the original distributor of some data Data retailer: a consumer- facing distributor of some data

Data Consumers Humans – –Aggregators: news, friend feeds, RSS readers –Search –Browsing / random walks Intelligence projects –Recommendation systems –Trend mining

Data Cloud Data Cloud is a centralized fully-functional data distribution service Success metric for data cloud strategy = the total “ value ” of data on the cloud

To-Cloud Solutions Extraction – DBpedia.org, “ web tables ” Semantic markup, data APIs – Yahoo! SearchMonkey Feeds – Yahoo! Shopping – Disqus.com, js-kit.com, Facebook Connect Direct publishing

On-Cloud Solutions Ontology maintenance – Freebase Normalization, de-duplication, antispam Named entity recognition, metadata inference, ranking Data recycling (cross-references) – Amazon Public Data Sets – Viral license Hosted search –Yahoo! BOSS

From-Cloud Solutions Search, audience –Y! SearchMonkey, Google Base Data API, dump access, update stream Custom notifications –Gnip.com Data cloud as a primary backend Access control –Ad distribution. (AT&T and Yahoo! Local deal)

Demo: webNumbr.com Joint work with Paul Tarjan

webNumbr.com: Import Crawl numbers from the web URL + XPath + regex Create “ numbr pages ” Update their values every hour Keep the history Anyone can create a numbr

webNumbr.com: Export Embed code Graphs Search & browse RSS

Economics of Data Distribution Joint work with Ravi Kumar and Andrew Tomkins

Network Effect in Two-Sided Markets Two sided market = every product serves consumers of two types A and B Cross-side network effect: the more type-A users product X has, the more attractive it is for type-B consumers and vice versa Examples: operating systems, credit cards, e-commerce marketplaces Two-sided network effects: A theory of information product design G. Parker, M.W. Van Alstyne, N. Bulkley, M. Van Alstyne

Basic model Distributors D1, … Dk Producer/consumer joins only one distributor Initial shares (p1,c1) … (pk,ck) New consumer selects a distributor with a probability proportional to pi New producer selects a distributor with probability proportional to ci

Basic model a1 a4 a2 a3 a1 a4 a3 a2

Market Shares Dynamics Theorem 1 Market shares will stabilize Theorem 2 With super-liner preference rule one of distributors will tip Theorem 3 With sub-liner preference rule market shares will flatten

External Factor Preference rule with external factor: ei+ci/(c1+ … +ck) Theorem 4 Market shares will stabilize on e1 : e2 : … : ek

Coalition Data Cloud

Coalitions Theorem 5 If all market shares are below 1/sqrt(k) coalition (sharing data) is profitable for all distributors Corollary Coalitions are not monotone Example: 5 : 4 : 1 : 1

Model Variations Same-side network effect Different p-to-c and c-to-p rules Multi-homing (overlapping audiences) n^2 vs. nlog n revenue models Mature market: newcomer rate = departing rate Diverse market (many types of producers and consumers) Newcoming and departing distributors Directed coalitions

Challenges

Marketing Data demand? Data offerings? Requirements for distribution technology?

Incentive design Incentives for data sharing? Centralized or distributed? –For profit or non-profit? Data licensing and ownership? Monetizing data cloud?

More Challenges Prototyping: Data marketplace: open data & data demand Search plugins: related objects, glossaries, object timelines Publishing tools for structured data Data client: structured news, bookmarking, notifications Tech design: Access management Namespace design User interface: Structured search UI Discovery UI

Thanks! Follow my research: