Big Data What is it? Why is it important? How should we train for it?

Slides:



Advertisements
Similar presentations
Nokia Technology Institute Natural Partner for Innovation.
Advertisements

Setting Big Data Capabilities Free How to Make Business on Big Data? Stig Torngaard, Partner Platon.
Big Data Management and Analytics Introduction Spring 2015 Dr. Latifur Khan 1.
Tableau Visual Intelligence Platform
GPPC Connections 2011 | November 6-8 | Las Vegas, NV SharePoint 101: An Introduction to Microsoft SharePoint 2010 Joseph Tews, MCITP, MCT Summit Group.
Copyright © 2006 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Technology Education Copyright © 2006 by The McGraw-Hill Companies,
Tableau Visual Intelligence Platform
Welcome! Chicago Seminar Anton Hristov Sitefinity Product Strategy & Learn more at sitefinity.com Content Management System.
Chapter 2: Business Intelligence Capabilities
1 Building National Cyberinfrastructure Alan Blatecky Office of Cyberinfrastructure EPSCoR Meeting May 21,
This presentation was scheduled to be delivered by Brian Mitchell, Lead Architect, Microsoft Big Data COE Follow him Contact him.
SharePoint Server 2013 Features and Scenarios for IT Professionals First Lastname, Title March, 2014 Software Assurance Planning Services.
S EEQ C ORPORATION Big Data Oregon Connections Telecommunications Conference Dustin Johnson October 23, 2014.
Optimize your Open Data 5 Best Practices for Designing Data-Driven Apps ​ Glenn Hess ​ Federal Sales Engineer ​ Actuate, Inc.
Big Data Analytics Large-Scale Data Management Big Data Analytics Data Science and Analytics How to manage very large amounts of data and extract value.
Developer TECH REFRESH 15 Junho 2015 #pttechrefres h Understand your end-users and your app with Application Insights.
IT and Network Organization Ecommerce. IT and Network Organization OPTIMIZING INTERNAL COLLABORATIONS IN NETWORK ORGANIZATIONS.
IoT Meets Big Data Standardization Considerations
Copyright © 2016 – Curt Hill The Digital World Understanding the challenges of this world.
Big Data Analytics with Excel Peter Myers Bitwise Solutions.
@nmoneypenny Innovating New Products & Services with Enterprise Social Graphing: Naomi Moneypenny.
Microsoft Cognitive Services and Cortana Analytics
Lecture-6 Bscshelp.com. Todays Lecture  Which Kinds of Applications Are Targeted?  Business intelligence  Search engines.
András Benczúr Head, “Big Data – Momentum” Research Group Big Data Analytics Institute for Computer.
WMO WIS strategy – Life cycle data management WIS strategy – Life cycle data management Matteo Dell’Acqua.
Data Analytics (CS40003) Introduction to Data Lecture #1
Popular Database Management Systems
WHY VIDEO SURVELLIANCE
WHY VIDEO SURVELLIANCE
Digital Aerospace and Defense Build, Service, and Fly Better
Connected Living Connected Living What to look for Architecture
About Audience Products Success Stories Resources.
GEO WP 1. INFRASTRUCTURE (Architecture and Data Management)
BIG DATA IN ENGINEERING APPLICATIONS
Joseph JaJa, Mike Smorul, and Sangchul Song
Connected Living Connected Living What to look for Architecture
Wonderware Online Cost-Effective SaaS Solution Powered by the Microsoft Azure Cloud Platform Delivers Industrial Insights to Users and OEMs MICROSOFT AZURE.
Nicole Steen-Dutton, ClickDimensions
Federated & Meta Search
EasyMap Xplorer: See Your Business Data in a Map!
Microsoft Build /20/2018 5:17 AM © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY,
Built on the Powerful Microsoft Azure Platform, Lievestro Delivers Care Information, Capacity Management Solutions to Hospitals, Medical Field MICROSOFT.
Microsoft Azure Platform Powers New Elements Constellation Software Suite to Deliver Invaluable Insights From Your Data for Marketing and Sales MICROSOFT.
Stratus Innovations Group Intelligent Factory™ Solution Offering
Business Intelligence for Project Server/Online
The Sitecore® Experience Platform™ on Microsoft Azure
Yellowfin: An Azure-Compatible Business Intelligence Platform That Connects People with Their Data for Better Decision Making MICROSOFT AZURE APP BUILDER.
Customer Services Single view of the customer, enabling wide variety of customer requests to be dealt with at the point of contact Self-Service Portal.
Logsign All-In-One Security Information and Event Management (SIEM) Solution Built on Azure Improves Security & Business Continuity MICROSOFT AZURE APP.
Through the Microsoft Azure Platform, TARGIT Decision Suite Enables Organizations to Analyze Critical Data, Giving Them the Courage to Act MICROSOFT AZURE.
Today’s Business Pain Points
DeFacto Planning on the Powerful Microsoft Azure Platform Puts the Power of Intelligent and Timely Planning at Any Business Manager’s Fingertips Partner.
Carl Data Solutions Collects Utility Sensor and Meter Data to Provide Advanced Reporting, Alarming, and Analytics with Microsoft Azure MICROSOFT AZURE.
Microsoft Azure Enables Big-Data-as-a-Service Applications for Industry and Government Use “Microsoft Azure is the most innovative and robust suite of.
TruRating: Mass Point-of-Payment Customer Rating System Uses the Power of Microsoft Azure to Store and Analyze Millions of Ratings for Business Owners.
Keep Your Digital Media Assets Safe and Save Time by Choosing ImageVault to be Your Digital Asset Management Solution, Hosted in Microsoft Azure Partner.
Cloud Analytics for Microsoft Azure
Big Data.
NWT Centre for Geomatics
WIS Strategy – WIS 2.0 Submitted by: Matteo Dell’Acqua(CBS) (Doc 5b)
Big Data Young Lee BUS 550.
INNOvation in TRAINING BUSINESS ANALYSTS HAO HElEN Zhang UniVERSITY of ARIZONA
Microsoft Virtual Academy
Technical Capabilities
WHY VIDEO SURVELLIANCE
About Thetus Thetus develops knowledge discovery and modeling infrastructure software for customers who: Have high value data that does not neatly fit.
WHY VIDEO SURVELLIANCE
DevOps - Visual Studio Release Management Jump Start
Big DATA.
UNIT 6 RECENT TRENDS.
Presentation transcript:

Big Data What is it? Why is it important? How should we train for it? Raghu Ramakrishnan Technical Fellow, Microsoft Head, Big Data Engineering Head, Cloud Information Services Lab

Data is Now a Core Asset Ability to cost-effectively do things we couldn’t dream of before More data: more is observable, measurable Less cost: Capture, storage, compute Easier: Rent-and-go with cloud services Scalable analytic tools: Get more out of data Biggest bottleneck: Trained people!

Big Data Applications

Web Scale 1.5+M read rps 4 Key Platforms ~2B User Ids 10,000+ servers 10 Geo Zones ~750M Uniq Users* 175+M Users in US 285+M Mail users 102B emails/month 40+ Countries 13+B Ad serves/day 11B visits/month All Data from July/Aug. Worldwide unless indicated to the contrary * Yahoo!-branded sites 4 9/21/2018 4

Content Optimization http://visualize.yahoo.com Key Features Package Ranker (CORE) Ranks packages by expected CTR based on data collected every 5 minutes Dashboard (CORE) Provides real-time insights into performance by package, segment, and property Mix Management (Property) Ensures editorial voice is maintained and user gets a variety of content Package rotation (Property) Tracks which stories a user has seen and rotates them after user has seen them for a certain period of time Key Performance Indicators Lifts in quantitative metrics Editorial Voice Preserved +300% clicks vs. one size fits all +79% clicks vs. randomly selected +43% clicks vs. editor selected Recommended links News Interests Top Searches Instant My! add from D White 5

CORE Modeling Overview Online Learning Online regression models, time-series models Model the temporal dynamics Provide fast learning for per-item models Offline Modeling Exploratory data analysis Regression, feature selection, collaborative filtering (factorization) Seed online models & explore/exploit methods at good initial points Reduce the set of candidate items Data streams Hadoop pipelines PNUTS NoSQL Store Near real-time user feedback Hadoop Pig (SQL) Explore/Exploit Multi-armed bandits Find the best way of collecting real- time user feedback (for new items) Large amount of historical data (user event streams)

Extracted metadata affects search ranking

Traditional ecology Field work Experiments Theorizing (Slide courtesy Drew Purves, MSR)

Some huge questions we can’t answer What would we do if a new disease hit wheat? Or the pollinators died out? Will forests accelerate or slow climate change? How can we feed 9+ billion humans, with less water, less oil and less phosphorous? How can we safely genetically engineer crops? How can we optimize supply chains to minimize environmental impact? Is there enough to water for both agriculture, and industry, in the future? How many species are there on Earth? How can we predict them? Will the world become more fire prone as the climate changes? (Slide courtesy Drew Purves, MSR)

Enter a new kind of ecology? Traditional Ecology “Joined up” Ecology Qualitative insights Quantitative predictions Driven by academic curiosity Driven by society’s needs Fragmented into subdisciplines, divorced from other fields of study Integrated across subdisciplines, and with other fields of study Divorced from policy Connected to policy Huge shortage of all data Huge abundance of some data, huge shortage of other data Computation and statistics an afterthought – a necessary evil! Computational techniques and statistics central – and exciting! A disparate set of software tools An interoperable suite of software tools (Slide courtesy Drew Purves, MSR)

Data Created by People—Kinect The Kinect is an array of sensors. Depth, audio, RGB camera… SDK provides a 3D virtual skeleton. 20 points around the body. 30 frames per second. Inexpensive and ubiquitous. Between 60-70M sold by May 2013. (Slide courtesy Assaf Schuster, Technion)

Telemetry Data: Time series of logs from user activity and system probes (live and historical archives) CTP (Customer Touch Points) data STP (System Touch Points) data Goal: Determine possible causes of outages, particularly the long ones Predictive and forensic Planned steps: Identification of the team that can resolve an outage Visualization of time series to understand long outages that are difficult to resolve Discover and learn patterns associated with outage trends and use them to predict outages 

IoT: Connected Devices, Continuous Observations, Instantaneous Action, Rich History http://blogs.cisco.com/news/the-internet-of-things-infographic/ 2008: more things than people 2020: 50 billion Gartner: 50 billion intelligent devices by 2015 and 275 Exabytes per day of data being sent across the Internet by 2020

Big Data Links H.V. Jagadish et al., CACM (to appear) Big Data and its Technical Challenges D. Agrawal et al., CACM 56(6):92-101 (2013) Content Recommendation on Web Portals Gary Marcus, New Yorker April 3 2013 http://www.newyorker.com/online/blogs/elements/2013/04/steamrolled-by-big-data.html May 23 2013 http://www.newyorker.com/online/blogs/culture/2012/05/google-knowledge-graph.html Alan Feuer, NYT, Mar 23, 2013 http://www.nytimes.com/2013/03/24/nyregion/mayor-bloombergs-geek-squad.html?pagewanted=all&_r=0 Quentin Hardy, NYT, Nov 28, 2012 http://bits.blogs.nytimes.com/2012/11/28/jeff-hawkins-develops-a-brainy-big-data-company/ Steve Lohr, NYT, Feb 11, 2012 http://www.nytimes.com/2012/02/12/sunday-­review/big-­datas-­impact­‐in‐the-world.html Economist, Nov 2011 http://www.economist.com/blogs/dailychart/2011/11/big‐data‐0 Forbes special report on Big Data, February 2012 http://www.forbes.com/special‐report/data-driven.html T. Hey et al., (Eds) (2013) The Fourth Paradigm: Data-­Intensive Scientific Discovery. J. Manyika et al., McKinsey Global Institute, May 2011 Big Data: The Next Frontier for Innovation, Competition, and Productivity. NPR, Nov 2011 http://www.npr.org/2011/11/30/142893065/the‐search‐for‐analysts-to-make-sense-of-big-data