1 Modern Approaches of Customer’s Dream Distribution Across the Cluster Evgenij Kozhevnikov, Samara AUGUST 4, 2015.

Slides:



Advertisements
Similar presentations
The Datacenter Needs an Operating System Matei Zaharia, Benjamin Hindman, Andy Konwinski, Ali Ghodsi, Anthony Joseph, Randy Katz, Scott Shenker, Ion Stoica.
Advertisements

Can’t We All Just Get Along? Sandy Ryza. Introductions Software engineer at Cloudera MapReduce, YARN, Resource management Hadoop committer.
Nokia Technology Institute Natural Partner for Innovation.
EHarmony in Cloud Subtitle Brian Ko. eHarmony Online subscription-based matchmaking service Available in United States, Canada, Australia and United Kingdom.
Compuware Confidential. Do Not Duplicate THANK YOU APM in the cloud: Are you ready? By: Mike Taylor.
Spark in the Hadoop Ecosystem Eric Baldeschwieler (a.k.a. Eric14)
Introduction to Spark Shannon Quinn (with thanks to Paco Nathan and Databricks)
Resource Management with YARN: YARN Past, Present and Future
Hadoop in the Wild CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook.
APACHE GIRAPH ON YARN Chuan Lei and Mohammad Islam.
Adding scalability to legacy PHP web applications Overview Mario A. Valdez-Ramirez.
 Need for a new processing platform (BigData)  Origin of Hadoop  What is Hadoop & what it is not ?  Hadoop architecture  Hadoop components (Common/HDFS/MapReduce)
UNIVERSITY OF JYVÄSKYLÄ Distributed computing in peer-to-peer environment InBCT 3.2 Peer-to-Peer communication Cheese Factory -project
1 Community (Optimize both Yarn & Non Yarn Hadoop clusters)
Content Delivery Networks. History Early 1990s sees 100% growth in internet traffic per year 1994 o Netscape forms and releases their first browser.
Undergraduate Poster Presentation Match 31, 2015 Department of CSE, BUET, Dhaka, Bangladesh Wireless Sensor Network Integretion With Cloud Computing H.M.A.
Hadoop Ecosystem Overview
Platform as a Service (PaaS)
Tyson Condie.
Facebook (stylized facebook) is a Social Networking System and website launched in February 2004, operated and privately owned by Facebook, Inc. As.
HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.
EXPOSE GOOGLE APP ENGINE AS TASKTRACKER NODES AND DATA NODES.
W HAT IS H ADOOP ? Hadoop is an open-source software framework for storing and processing big data in a distributed fashion on large clusters of commodity.
f ACT s  Data intensive applications with Petabytes of data  Web pages billion web pages x 20KB = 400+ terabytes  One computer can read
Application of Hadoop to Proteomic Searches Steven Lewis 1, Attila Csordas 2, Sarah Killcoyne 1, Henning Hermjakob 2, John Boyle 1 1 Institute for Systems.
What does it mean to virtualize the Hadoop File System?
Apache Hadoop Daniel Lust, Anthony Taliercio. What is Apache Hadoop? Allows applications to utilize thousands of nodes while exchanging thousands of terabytes.
Virtualization and Databases Ashraf Aboulnaga University of Waterloo.
© Hortonworks Inc Hadoop: Beyond MapReduce Steve Loughran, Big Data workshop, June 2013.
What we know or see What’s actually there Wikipedia : In information technology, big data is a collection of data sets so large and complex that it.
Experiments in Utility Computing: Hadoop and Condor Sameer Paranjpye Y! Web Search.
1 HBASE – THE SCALABLE DATA STORE An Introduction to HBase XLDB Europe Workshop 2013: CERN, Geneva James Kinley EMEA Solutions Architect, Cloudera.
Apache Mesos What is it ? Beyond Hadoop Resource Sharing Mesos Intentions Architecture Users
Part III BigData Analysis Tools (YARN) Yuan Xue
Scaling up R computation with high performance computing resources.
This is a free Course Available on Hadoop-Skills.com.
Data Summit 2016 H104: Building Hadoop Applications Abhik Roy Database Technologies - Experian LinkedIn Profile:
BIG DATA/ Hadoop Interview Questions.
Data Science Hadoop YARN Rodney Nielsen. Rodney Nielsen, Human Intelligence & Language Technologies Lab Outline Classical Hadoop What’s it all about Hadoop.
Microsoft Partner since 2011
Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland t DBCF GT Our experience with NoSQL and MapReduce technologies Fabio Souto.
Microsoft Ignite /28/2017 6:07 PM
Hadoop in the Wild CMSC 491 Hadoop-Based Distributed Computing Spring 2016 Adam Shook.
Leverage Big Data With Hadoop Analytics Presentation by Ravi Namboori Visit
Hadoop Introduction. Audience Introduction of students – Name – Years of experience – Background – Do you know Java? – Do you know linux? – Any exposure.
Platform as a Service (PaaS)
Big Data is a Big Deal!.
SNS COLLEGE OF TECHNOLOGY
Platform as a Service (PaaS)
Machine Learning Library for Apache Ignite
Introduction to Distributed Platforms
Spark and YARN: Better Together
An Open Source Project Commonly Used for Processing Big Data Sets
Hadoop MapReduce Framework
Data Platform and Analytics Foundational Training
Hadoop Clusters Tess Fulkerson.
Ministry of Higher Education
MapReduce: Data Distribution for Reduce
Cloud DIKW based on HPC-ABDS to integrate streaming and batch Big Data
Introduction to Apache
Overview of big data tools
Execution Framework: Hadoop 2.x
Introduction Apache Mesos is a type of open source software that is used to manage the computer clusters. This type of software has been developed by the.
TIM TAYLOR AND JOSH NEEDHAM
Charles Tappert Seidenberg School of CSIS, Pace University
Big Data Analysis in Digital Marketing
Containerized Spark at RBC
Big-Data Analytics with Azure HDInsight
Big Data, Simulations and HPC Convergence
Twister2 for BDEC2 Poznan, Poland Geoffrey Fox, May 15,
Presentation transcript:

1 Modern Approaches of Customer’s Dream Distribution Across the Cluster Evgenij Kozhevnikov, Samara AUGUST 4, 2015

2 About me 1+ years of production experience in BigData – Edmunds.com – BigData CC 3+ years of development experience in BigData – Hadoop – Spark – Storm – Akka 6+ years of development experience – Java EE, IBM Websphere – Spring

3 Successful Business - Growing Business

4 Growing Business – Growing Load

5 Our software should be ready to grow with business 1. Pay for your needs, not for plans 2. Growth doesn’t require any changes in application 3. Where one growing app, there are some growing apps

6 Our software should be ready to grow with business

7 Caching Proxy CDN

8 Our software should be ready to grow with business Caching Proxy CDN NoSQL Distributed cache

9 Our software should be ready to grow with business Caching Proxy CDN NoSQL Distributed cache

10 Does Edmunds need cluster solution? 1.What trends we have now? 2.Is quality of the vehicle catalog is enough? 3.Is our ad efficient? 4.What results of A/B testing do we get? 5.What can we recommend to our clients? 6.Where is the car that client needs? 7.How many leads were sent to the dealer? 8.Is the dealer successful? 9.Are our visitors not robots? 10.What revenue do we have in this year? 11.Are we growing? 12.Are our dealers growing?

11 1.What trends we have now? 2.Is quality of the vehicle catalog is enough? 3.Is our ad efficient? 4.What results of A/B testing do we get? 5.What can we recommend to our clients? 6.Where is the car that client needs? 7.How many leads were sent to the dealer? 8.Is the dealer successful? 9.Are our visitors not robots? 10.What revenue do we have in this year? 11.Are we growing? 12.Are our dealers growing? It’s not a competitive advantage All competitors do that It’s not a competitive advantage All competitors do that Does Edmunds need cluster solution?

12 Need in fast access to the whole amount of data Historical data is important as a new one Support dynamically extended hardware resources Be able to run some independent applications on the same cluster Each application run require specific amount of resources Need in convenient monitoring tool and fault-tolerance of the system Code should be readable and distributed algorithms should be supportable Does Edmunds need cluster solution? Growing amount of data Amount of tasks growth

13 MAPREDUCE YARN Hadoop-based solutions

14 MapReduce across YARN Node

15 MapReduce across YARN Resource Manager Name Node Resource Manager Name Node Node

16 MapReduce across YARN Standby Resource Manager Active Resource Manager Hadoop Client MR Application Master Name Node Data Node MR Executor Data Node MR Executor

17 SPARK YARN Hadoop-based solutions

18 Spark across YARN Standby Resource Manager Active Resource Manager Spark Client MR Application Master Name Node Data Node Spark Executor Data Node Spark Executor

19 SPARK MESOS Mesosphere-based solutions

20 Spark across YARN Standby Mesos Master Active Mesos Master Spark Client Spark Scheduler Name Node Data Node Spark Executor Data Node Spark Executor

WHAT NEXT Myriad YARN on Mesos Efficient access to Hadoop resources Dynamic nature of Mesos Kubernetes Resource Manager for docker-based infrastructure Solution from Google Akka Cluster Efficient model for vertical and horizontal scaling Freedom of choosing the way of distribution Task-specific tools Apache Storm Hive/Pig/Cascading… NoSQL solutions Kafka/Sqoop/Flume… Chef/Puppet/Ansible… Docker/Rocket/CoreOS Data Science

22 Modern Approaches of Customer’s Dream Distribution Across the Cluster Evgenij Kozhevnikov, Samara