Download presentation
Presentation is loading. Please wait.
Published byGarey West Modified over 8 years ago
1
Learn Big Data Application Development on Windows Azure Wen-ming Ye (叶文铭 ) Sr. Technical Evangelist Microsoft Corporation
7
Characteristics of Big Data
10
Hadoop on Windows and Windows Azure “The next frontier is all about uniting the power of the cloud with the power of data to gain insights that simply weren’t possible even just a few years ago” – Ted Kummert
11
Server Files Server
13
MapReduce – Workflow
14
Hadoop Architecture
15
The Hadoop Ecosystem (simplified)
16
Scenario 1 Storing and Data mining logs Dealing with Data Exhaust
17
Excess Data Logs ETL Some Data Data Warehouse
18
Raw Data “Store it All” Cluster Raw Data “Store it All” Cluster Data Warehouse Logs
19
Demo 1 Log Analysis with Hive
20
Invisible devices Trillions of networked nodes Low bandwidth last- mile connection Mostly addressed by local schemes Machine-centricSensing-focus Global addressingUser-centric Communication- focus Laptops / tablets / smartphones Billions of networked devices High-bandwidth access
21
Scenario 2 Understand unstructured data using mathematical relations Computing the meaning of words
22
“Friends don’t let friends just do word count”
23
Words: cosine distances Keyword LSA Doctor—Doctor 1.0 1.0 Doctor—Physician 0.0 0.8 Doctor—Surgeon 0.0 0.7 Documents: Doctors operate on patients - Physicians do surgery 0.0 0.8 the radius of spheres - a circle's diameter0.00 0.55 the radius of spheres - the music of spheres0.75 0.01
24
Semantic Matrix Model c1: Human machine interface for ABC computer applications c2: A survey of user opinion of computer system response time c3: The EPS user interface management system c4: System and human system engineering testing of EPS c5: Relation of user perceived response time to error measurement m1: The generation of random, binary, ordered trees m2: The intersection graph of paths in trees m3: Graph minors IV: Widths of trees and well-quasi-ordering m4: Graph minors: A survey corpus Word-document matrix Document vector Word vector
25
= Singular Value Decomposition of the words by contexts matrix
26
Different Similarity Measures Emphasize Different Aspects
27
Demo 2 Interactive Analysis with PTVS Python Tools for Visual Studio
28
Examples of non-Text Applications
29
Scenario 3 Item based Recommendations Machine Learning
30
Item-based Recommendation Engine 13 Next recommended item for user 1 2
31
Demo 3 Hello Recommendation Engines using Mahout
32
Common Algorithms for Machine Learning and Search
33
Scenario 4 Collective Intelligence Social Media Data
35
Demo 4 Name Title Group Capturing and Storing Twitter Data
39
References
40
DBI210: Harnessing Big Data with Hadoop Hadoop Hands-on Labs in the Azure HOL Section (Get a code from Me) Product Demo Stations (demo station title and location) Related Certification Exam Find Me Later At… 10:30 – 1:00 Wed at the Windows Azure Booth
41
Meetwindowsazure.com @WindowsAzure @ms_teched DOWNLOAD Windows Azure Windowsazure.com/ teched Hands-On Labs
42
Connect. Share. Discuss. http://europe.msteched.com Learning Microsoft Certification & Training Resources www.microsoft.com/learning TechNet Resources for IT Professionals http://microsoft.com/technet Resources for Developers http://microsoft.com/msdn
43
Evaluations http://europe.msteched.com/sessions Submit your evals online
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.