Download presentation
Presentation is loading. Please wait.
1
Nipa Das, Ye Jee Kim, Murphy Potts, Sadaf Mirzai
Presto Nipa Das, Ye Jee Kim, Murphy Potts, Sadaf Mirzai
2
Agenda What is Presto? History of Presto Architecture
Pluggable Backends Applications & Business Opportunities Pros Cons Citations
3
What is Presto? Open source engine that uses Standard Query Language (SQL) Created by Facebook Runs queries for data sources ranging from gigabytes to petabytes Allows fast analytics Can combine data from multiple sources
7
Facebook Facebook’s warehouse data is stored in a few large Hadoop/HDFS-based clusters Development started Fall 2012 when their warehouse data grew to petabyte size Fully enrolled into the company by Spring 2013 Actively used by over a thousand employees 25 PB Warehouse AWS S3 for data warehouse Netflix Applications + Business Opportunities Over 1,000 Facebook employees use Presto daily to run more than 30,000 queries that in total scan over a petabyte each per day Massive 300 PB Data Warehouse Netflix: 25 PB Warehouse Query data in Amazon S3 bucket Over 350 active users and 3k queries daily -product decisions are very data driven, allows for consumer and product insight
8
Airbnb Airpal Launch Optional access control for users
Ability to search and find tables See metadata, partitions, schemas, and sample rows Write queries in an easy-to-read editor Submit queries through a web interface Track query progress Get the results back through the browser as a CSV Create new Hive table based on the results of a query Save queries once written Searchable history of all queries run within the tool Airpal: Web-query execution tool that leverages Facebook’s PrestoDB to facilitate data analysis (2014) 1.5 PB Warehouse No need to install Presto locally, is a web UI for PrestoDB Other businesses that use it: LinkedIn, Groupon, Uber, Twitter, Dropbox
9
Pros Interactive queries Optimized for latency
Joins with a large Fact table and many smaller Dimension tables Create Jobs
10
Cons Limitation on maximum amount, all data must be held in-memory, or process will fail Lacks ability to write output data back to tables If processing fails, entire query must be re-run
11
Presto is 10x better than Hive/MapReduce in terms of CPU efficiency and latency for most queries at Facebook.
12
Thank you! Questions?
13
Citations analysis-33c43265ed1f data-at-facebook/ / fd4 database/
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.