Hive @ Uber Mohammad Islam D A T A.

Slides:



Advertisements
Similar presentations
HBase and Hive at StumbleUpon
Advertisements

Grand Challenges for the Database Community Jim Gray Microsoft.
Query Methods (SQL). What is SQL A programming language for databases. SQL (structured Query Language) It allows you add, edit, delete and run queries.
Database Scalability, Elasticity, and Autonomy in the Cloud Agrawal et al. Oct 24, 2011.
©2014 LinkedIn Corporation. All Rights Reserved. Gobblin’ Big Data with Ease Lin Qiao Data Analytics LinkedIn.
An Information Architecture for Hadoop Mark Samson – Systems Engineer, Cloudera.
Cross-curricular Assignment Using your case study…
Putting the Sting in Hive Page 1 Alan F.
© Hortonworks Inc Secure SQL Standard based Authorization for Apache Hive Thejas Page 1.
Session-01. Hibernate Framework ? Why we use Hibernate ?
H-1 Network Management Network management is the process of controlling a complex data network to maximize its efficiency and productivity The overall.
Manage & Configure SQL Database on the Cloud Haishi Bai Technical Evangelist Microsoft.
IST Databases and DBMSs Todd S. Bacastow January 2005.
Introduction to the Enterprise Library. Sounds familiar? Writing a component to encapsulate data access Building a component that allows you to log errors.
Multiplicity – Progress Data Replication Methodologies.
ISV Innovation Presented by ISV Innovation Presented by Business Intelligence Fundamentals: Data Loading Ola Ekdahl IT Mentors 9/12/08.
JDBC and Hibernate Joshua Scotton. Connecting to Relational DBs.
DB Libraries: An Alternative to DBMS By Matt Stegman November 22, 2005.
Compiling Mappings to Bridge Applications and Databases Melnik, Adya and Research.
 Automates the process of Bill generation and bill payment  FRONT END:- ASP.NET  BACK END :- SQL SERVER.
DAT602 Database Application Development Lecture 12 C/S Model Database Application.
Introduction to Hadoop and HDFS
Hive Facebook 2009.
Introduction to Sqoop. Table of Contents Sqoop - Introduction Integration of RDBMS and Sqoop Sqoop use case Sample sqoop commands Key features of Sqoop.
Dr. Mohamed Osman Hegazi 1 Database Systems Concepts Database Systems Concepts Course Outlines: Introduction to Databases and DBMS. Database System Concepts.
Massively Distributed Database Systems - Distributed DBS Spring 2014 Ki-Joune Li Pusan National University.
Association of Computing Activities Computer Science and Engineering Indian Institute of Technology Kanpur.
Module 11: Programming Across Multiple Servers. Overview Introducing Distributed Queries Setting Up a Linked Server Environment Working with Linked Servers.
City of Chula Vista using query layers to map field work Bob Blackwelder City of Chula Vista.
Greg Janée chit-chat with CS database folks 10/26/01 Gazetteer database 4.5 million items, each having: –1+ names fair to good discriminator –1 geospatial.
Introduction FREE Application Performance Analysis Workload Performance Series Software Software Installation Procedure Initial Performance Review Process.
1 Apache Spark and Its Role in the Enterprise Data Hub Mike Olson, Chief Strategy Officer,
Kjell Orsborn UU - DIS - UDBL DATABASE SYSTEMS - 10p Course No. 2AD235 Spring 2002 A second course on development of database systems Kjell.
RDBMS MySQL. MySQL is a Relational Database Management System MySQL allows multiple tables to be related to each other. Similar to a Grandparent to a.
Monte-Carlo Event Database: current status Sergey Belov, JINR, Dubna.
Capybara Hive Integration Testing. Issues We’ve Seen at Hortonworks Many tests for different permutations –e.g. does it work with Orc, with Parquet, with.
Nov 2006 Google released the paper on BigTable.
MySQL An Introduction Databases 101.
ESG-CET Meeting, Boulder, CO, April 2008 Gateway Implementation 4/30/2008.
Page 1 © Hortonworks Inc – All Rights Reserved Hive: Data Organization for Performance Gopal Vijayaraghavan.
Page 1 © Hortonworks Inc – All Rights Reserved What's new in Hive 2.0 Sergey Shelukhin.
Please note that the session topic has changed
Lens Server REST API for querying and schema update JDBC Client Java Client CLI Applications – Reporting, Ad Hoc Queries OLAP Cube Metastore Hive (MR)
Introduction to Core Database Concepts Getting started with Databases and Structure Query Language (SQL)
Oracle Query VBA Tool (OQVT)
Introduction to MySQL  Working with MySQL and MySQL Workbench.
Azure SQL Database Lori Clark SQL Saturday 10/17/2015.
JDBC. Database is used to store data permanently. These days almost all Applications needs database to store its data persistently. Below are the most.
Retele de senzori Curs 1 - 1st edition UNIVERSITATEA „ TRANSILVANIA ” DIN BRAŞOV FACULTATEA DE INGINERIE ELECTRICĂ ŞI ŞTIINŢA CALCULATOARELOR.
CERN IT Department CH-1211 Genève 23 Switzerland M.Schröder, Hepix Vancouver 2011 OCS Inventory at CERN Matthias Schröder (IT-OIS)
The Holmes Platform and Applications
Databases and DBMSs Todd S. Bacastow January
How to tune your applications before moving your database to Microsoft Azure SQL Database (MASD) OK, you've jumped into your Azure journey by creating.
VII CLASS CHAPTER---->3 3. WORKING WITH MS-ACCESS
  Choice Hotels’ journey to better understand its customers through self-service analytics Narasimhan Sampath & Avinash Ramineni Strata Hadoop World |
Redis:~ Author Anil Sharma Data Structure server.
CS422 Principles of Database Systems Course Overview
Developing Production Quality SQL Code
Introduction to Database System Prof. Dr. Amany Sarhan Computer and Control Engineering Faculty of Engineering Tanta University.
Replication.
New Mexico State University
Lab #2 - Create a movies dataset
Know More About : Develop An App Like Uber Develop An App Like Uber.
Uber Like Taxi App Solution
Uber How to Stream Data with StorageTapper
Tiers vs. Layers.
MySQL Migration Toolkit
Analytics in the Cloud using Microsoft Azure
TASKMASTER Field Force Tracking
Presentation transcript:

Hive @ Uber Mohammad Islam D A T A

Data @ Uber Kafka Ingestion Layer HDFS Sharded MySQL DB

Data @ Uber Specialty in Uber data Out of order data arrival Duplicate records - machine failure/replay Highly nested structure Geo information Introduce Hive and our work

hDrone: Data registration service Registration includes Create new table Add a new partition Schema evolution Registration backfill Pros Central control Data producer does not need to handle the details Cons Yet another service to manage

hDrone: Data registration service INotify Hive Hive Registration Task HDFS ThreadPool Introduce next slide/Janus catchUp

Janus Janus: Unified query execution service Introduce expected feature

Expected Feature : Transaction Hive transaction support Update/delete/insert Required for incremental ingestion Issue: ORC only supports it!

Expected Feature : Geo Geo/spatial query support Uber business is inherently geo-aware City OPS may not be a techy (SQL experience) Esri library can be a good start but may need more

Hive (auto) Tuning Hive has bunch of knobs for better performance Not easy to remember for everybody Excellent if hive execution/planner engine can auto-set the best configurations

More.. HS2 stability Column-level security (for non-Hive App) Parquet performance Locking Memory HA

Q & A