Download presentation
Presentation is loading. Please wait.
Published byZoe Alexander Modified over 8 years ago
1
1 Divya Jain Oct 10 th, 2014 Big Data Products: Where do I start?
2
2 Agenda Example WebApp Problem Definition Big Data Stack Q&A
3
3 DinnerLite !! Awesome Healthy Dinner Recipes
4
4 DinnerLite !! Awesome Healthy Dinner Recipes BIG DATA PROBLEM No Usage Analytics or Stats No Personalized view 1 2
5
5 DinnerLite!! Understanding the problem Audience, Frequency, Output 12 Usage Statistics Internal Batch Mode Graphs/Reports Personalized View External Real-Time Recommendations
6
6 Building the Stack DinnerLite !! Usage Logs ? Internal External
7
7 What solutions are present? Source: http://sranka.files.wordpress.com/2014/01/bigdata.jpg
8
8 What are Big Data Technologies? IngestionPlatformProcessingConsumption
9
9 DinnerLite!! Ingestion PlatformProcessingConsumption Imports Data from Relational Database Kafka Distributed Realtime SqoopStorm/ Spark Streaming Distributed Publish- Subscribe Messaging System
10
10 DinnerLite !! 12 Usage Statistics Kafka Storm Personalized View Kafka Spark Streaming Ingestion PlatformProcessingConsumption
11
11 Building the Stack DinnerLite !! Usage Logs ? Internal External Pub-sub messaging (Kafka) Batch Consumer Real-Time Consumer
12
12 DinnerLite !! PlatformProcessingConsumptionIngestion Distributed Processing HDFS Analysis and Summarization MapReduce / Spark Hive / Shark Distributed File Systems
13
13 DinnerLite !! 12 Usage Statistics HDFS Map Reduce Hive Personalized View HDFS Spark Shark PlatformProcessingConsumptionIngestion
14
14 DinnerLite !! ZookeeperOozie PlatformProcessingConsumptionIngestion Yarn Distributed Workflow Management Distributed Resource Management Distributed Configuration Management
15
15 Building the Stack DinnerLite !! Usage Logs Hadoop / Spark Cluster Internal External Pub-sub messaging (Kafka) Batch ConsumerReal-Time Consumer Pig/Hive HDFS MapReduceSpark Shark Oozie, Yarn
16
16 DinnerLite !! R/ Matlab/ Scikit Mahout/ Weka/ MLLIB PlatformProcessingConsumptionIngestion Data Mining and Data Analysis Tools Data Analysis and Machine Learning Libraries
17
17 DinnerLite !! 12 Usage Statistics R Mahout Personalized View Scikit MLLib PlatformProcessingConsumptionIngestion
18
18 Building the Stack DinnerLite !! Usage Logs Hadoop / Spark Cluster Internal External Pub-sub messaging (Kafka) Batch ConsumerReal-Time Consumer Pig/Hive HDFS MapReduceSpark Shark Oozie, Yarn MahoutMLLib
19
19 DinnerLite!! MySql/PostgresOpenTSDBCassandra/HBase PlatformConsumptionIngestionProcessing NoSql, Distributed, Key Value, Column Stores Time-Series Databases Traditional Relational Databases
20
20 DinnerLite!! 12 Usage Statistics MySql OpenTSDB Personalized View Hbase Memcache PlatformConsumptionIngestionProcessing
21
21 Usage Logs HBase MySQL Real-Time ConsumerBatch Consumer Building the Stack DinnerLite !! Hadoop / Spark Cluster Internal External Pub-sub messaging (Kafka) Pig/Hive HDFS MapReduceSpark Shark Oozie, Yarn MahoutMLLib
22
22 DinnerLite !! Awesome Healthy Dinner Recipes
23
23 Thank You! Sounds Interesting? Box is Hiring!! Source https://www.sac.edu/StudentServices/InternationalStudents/Calendar%20of%20Events/questions-and-answers.jpg
24
24 Extra slides
25
25 What is Big Data? “In God we trust. All others must bring data.” – W. Edwards Deming BIG DATA VolumeVelocityVariabilityVeracity
26
26 Why does it matter? "Data is not information, information is not knowledge, knowledge is not understanding, understanding is not wisdom." - Clifford Stoll Big DataInformationKnowledge Better Products
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.