AnHai Doan University of Wisconsin Big Data, Big Knowledge, and Big Crowd.

Slides:



Advertisements
Similar presentations
© 2013 IBM Corporation Complexity must be conquered (Data is more than “just” big) Laura Haas IBM Almaden Research Center 1.
Advertisements

MongoDB PostgreSQL SaaS Quality Measure Storage
Creating Collaborative Partnerships
AnHai Doan University of Social Media, Data Integration, and Human
Sponsored by the U.S. Department of Defense © 2005 by Carnegie Mellon University 1 Pittsburgh, PA Dennis Smith, David Carney and Ed Morris DEAS.
A Berkeley View of Big Data Ion Stoica UC Berkeley BEARS February 17, 2011.
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
Recommendations via Collaborative Filtering. Recommendations Relevant for movies, restaurants, hotels…. Recommendation Systems is a very hot topic in.
Reconciling Schemas of Disparate Data Sources: A Machine-Learning Approach AnHai Doan Pedro Domingos Alon Halevy.
Reconciling Schemas of Disparate Data Sources: A Machine-Learning Approach AnHai Doan Pedro Domingos Alon Halevy.
Introduction Social Media Mining. 2 Measures and Metrics 2 Social Media Mining Introduction Facebook How does Facebook use your data? Where do you think.
2.3 Methods for Big Data What is “Big Data”? Summarizing Big Data.
This presentation was scheduled to be delivered by Brian Mitchell, Lead Architect, Microsoft Big Data COE Follow him Contact him.
Knowledge Web Area Managers and Managing Directors overview (2005)
1 Challenges Facing Modeling and Simulation in HPC Environments Panel remarks ECMS Multiconference HPCS 2008 Nicosia Cyprus June Geoffrey Fox Community.
Tyson Condie.
Chaitanya Gokhale University of Wisconsin-Madison Joint work with AnHai Doan, Sanjib Das, Jeffrey Naughton, Ram Rampalli, Jude Shavlik, and Jerry Zhu Corleone:
Author: William Tunstall-Pedoe Presenter: Bahareh Sarrafzadeh CS 886 Spring 2015.
Semantic Search: different meanings. Semantic search: different meanings Definition 1: Semantic search as the problem of searching documents beyond the.
Distributed Indexing of Web Scale Datasets for the Cloud {ikons, eangelou, Computing Systems Laboratory School of Electrical.
Making the most of social historic data Aleksander Kolcz Twitter, Inc.
Open PHACTS in a few slides. Why? Public Domain Drug Discovery Data: Pharma are accessing, processing, storing & re-processing each company x.
Data Science for DTIC Data Ecosystem Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
Streaming Knowledge Bases Onkar Walavalkar, Anupam Joshi Tim Finin and Yelena Yesha University of Maryland, Baltimore County 27 October 2008.
MICROSOFT AZURE ISV PROFILE: D-SCOPE SYSTEMS D-Scope Systems is an enterprise-level medical media product and integration specialist company. It provides.
Deploying a VGI application in one day Tom Brenneman.
Webarchivering in het Audiovisuele Domein Web archiving in the audiovisual Domain Julia Vytopil- Nederlands Instituut voor Beeld en Geluid Netherlands.
E-Science and Technology Infrastructure for Biodiversity and Ecosystem Research.
Emerging Markets: Globalization. Mobile Space: The Next Big Platform New cell phone technology E-Commerce Internet on-the-go.
Enterprise Content Management
Theme 2: Data & Models One of the central processes of science is the interplay between models and data Data informs model generation and selection Models.
Datalayer Notebook Allows Data Scientists to Play with Big Data, Build Innovative Models, and Share Results Easily on Microsoft Azure MICROSOFT AZURE ISV.
Page  1 Social Media in Business Session 6 Agenda Guest Speaker: Nestor Portillo, Bill Dean Microsoft Customer Service and Support Social Media in Customer.
Aisha M. Johnson PhD StudentCo-Director/Manager of Archival Services School of Library & Information StudiesBlack Archives, Research Center & Museum Florida.
C. Lee Giles David Reese Professor, College of Information Sciences and Technology Graduate Professor of Computer Science and Engineering Courtesy Professor.
Review.
Data Mining with Big Data IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2014 Xiangyu Cai ( )
Social Media and Networking: Academic Kristina Lerman USC Information Sciences Institute
What’s the Big Deal about Big Data? Jennifer Lewis Priestley, Ph.D. Professor of Statistics and Data Science.
Social Information Processing March 26-28, 2008 AAAI Spring Symposium Stanford University
Axis AI Solves Challenges of Complex Data Extraction and Document Classification through Advanced Natural Language Processing and Machine Learning MICROSOFT.
High Risk 1. Ensure productive use of GRID computing through participation of biologists to shape the development of the GRID. 2. Develop user-friendly.
Azure HDInsight And Excel Analyze unstructured data at scale, then visualize! George Walters Sr. Technical Solutions Professional, Data Platform Microsoft.
Smart Grid Big Data: Automating Analysis of Distribution Systems Steve Pascoe Manager Business Development E&O - NISC.
What’s the Big Deal about Big Data? 52 nd Annual ACMSE Conference Jennifer Lewis Priestley, Ph.D. Professor of Statistics and Data Science.
Big Data Javad Azimi May First of All… Sorry about the language  Feel free to ask any question Please share similar experiences.
CS 784: Advanced Topics in Data Management This semester’s focus: Data Science AnHai Doan.
Real-time criminal event detection system for live social data streams Jennifer Woodard sisu labs - semantic technology company.
Leverage Big Data With Hadoop Analytics Presentation by Ravi Namboori Visit
Exploring Strategies For Optimizing Knowledge Derivation From Imagery
Penn State Center for e-Design Site Vision and Capabilities
SAS users meeting in Halifax
How Artificial Intelligence is Changing the Supply Chain
Automated ad placement
Data Mining, Data Science, Big Data
Themes in Geosciences.
CCNT Lab of Zhejiang University
Tomáš Jurníček, Jakub Jůza, Lenka Kmeťová
Vocabulary Big Data - “Big data is a broad term for datasets so large or complex that traditional data processing applications are inadequate.”
Business Communication Dr. Aravind Banakar –
Business Communication
Built on the Powerful Microsoft Azure Platform, iSwarm Helps Businesses Analyze Social Media Conversations, then Connect with Individuals MICROSOFT AZURE.
Review: A Case of Crowdsourcing Gone Wrong
TruRating: Mass Point-of-Payment Customer Rating System Uses the Power of Microsoft Azure to Store and Analyze Millions of Ratings for Business Owners.
Clouds & Containers: Case Studies for Big Data
Data Analysis and R : Technology & Opportunity
Democracy and Information
Democracy and Information
Presentation transcript:

AnHai Doan University of Wisconsin Big Data, Big Knowledge, and Big Crowd

The world has changed; now everything is data centric –everyone collects, stores, analyzes TBs and PBs of data To manage data in this new world, need 3B technologies: –lot of data  need big data technologies to scale up algorithms –data is noisy, unstructured, heterogeneous  need a lot of domain knowledge to understand such knowledge is often captured in big knowledge bases –algorithms are imperfect, certain things humans do better, need humans in the loop, scale is such that there is not enough human developers  need crowdsourcing with big crowd 2

Examples Semantic analysis of the Twitter stream –process tweets per sec, need fast data infrastructure –to recognize entities, e.g., “go giant!”, need a big KB –KB being built in real time using crowdsourcing Product matching for e-commerce –build 500+ matchers to match products one matcher per category: toy, electronics, clothes, etc. –match 500K electronics products with 500K  need Hadoop –use a KB to match numerous synonyms: soft cover = paperback, etc. –use crowdsourcing to generate training and testing data 3

Big Knowledge Technologies Everyone is now building KBs –IT companies: Google, Microsoft, … –e-retailers: Amazon, Walmart, … –stodgy behemoths: Johnson Control, GE, … –tiny startups, academia, … User communities are building KBs (e.g., biomedical) There will be not just data centers, but also knowledge centers –KBs and tools that use such KBs –critical for understanding data (e.g., tweets) How do we help people build KBs? Knowledge centers? –a next important direction for data integration research 4

Big Crowd Technologies Industry has been doing these for years For us it’s not a fad, it’s fundamental –as data management increasingly involves semantic problems Have gotten off to a good start (platforms / problems) Need hands-off crowdsourcing –no developer in the loop, otherwise will not scale –e.g., crowdsourcing 500 product matching problems, one per category Need crowdsourcing for the masses –e.g., journalist wants to match two political lists of donors Need “grand challenges” for crowdsourcing? –e.g., something like Wikipedia? 5