The BigFrame Team Duke University, Hong Kong Polytechnic University, and HP Labs
Analytics System Landscape
What does this mean for Big Data Practitioners?
Gives them a lot of power! From:
Even the mighty may need a little help
Challenges for Practitioners Which system to use for the app that I am developing? Features (e.g., graph data) Performance (e.g., claims like System A is 50x faster than B) Resource efficiency Growth and scalability Multi-tenancy App Developers, Data Scientists
Different parts of my app have different requirements Compose best of breed systems OR Use one size fits all system? Managing many systems is hard! System Admins Challenges for Practitioners Which system to use for the app that I am developing? App Developers, Data Scientists
Managing many systems is hard! Different parts of my app have different requirements Total Cost of Ownership (TCO)? CIO System Admins Challenges for Practitioners Which system to use for the app that I am developing? App Developers, Data Scientists
One Approach
Useful, But …
How a user uses BigFrame BigFrame Interface BigFrame Interface Benchmark Generator Benchmark Generator HBase Hive Map Reduce Benchmark Driver for System Under Test Benchmark Driver for System Under Test
bspec: Benchmark Specification HBase Hive Map Reduce 2. Data refresh pattern 3. Query streams 4. Evaluation metrics 1. Data for initial load
What does the user (want to) specify? BigFrame Interface BigFrame Interface
The 3 Vs
bigif: BigFrames InputFormat Data Variety Relational, text, array, graph Small, medium, large Data Volume Query Volume Query concurrency & classes Data Velocity At rest, slow, fast Micro, Macro Query Variety Exploratory, Continuous Query Velocity
Benchmark Generation Benchmark Generator Benchmark Generator
Application Domain Modeled Currently E-commerce sales, promotions, recommendations Social media sentiment & influence Social media sentiment & influence
Application Domain Modeled Currently Item Customer Web_sales Promotion Tweets Relationships
Application Domain Modeled Currently Item Web_sales Promotion
Application Domain Modeled Currently
Benchmark Generation Benchmark Generator Benchmark Generator
Use Case I: Exploratory BI Large volumes of relational data Mostly aggregation and few joins Can Sparks performance match that of an MPP DB?
Use Case II: Complex BI Large volumes of relational data Even larger volumes of text data Combined analytics
Large volume and velocity of relational and text data Use Case III: Dashboards Continuously-updated Dashboards
Use Case IV: Does One Size Fit All? Growing set of applications have to process relational, text, & graph data Compose best of breed systems or use a one size fits all system?
Use Case V: Multi-tenancy and SLAs Big data deployments are increasingly multi-tenant and need to meet SLAs
Working with the Community First release of BigFrame planned for August 2013 With feedback from benchmark developers (BigBench) Open-source with extensibility APIs Benchmark Drivers for more systems Utilities (accessed through the Benchmark Driver to drill down into system behavior during benchmarking) Instantiate the BigFrame pipeline for more app domains
Benchmarks shape a field (for better or worse) … -- David Patterson, Univ. of California, Berkeley Benchmarks meet different needs for different people End customers, application developers, system designers, system administrators, researchers, CIOs BigFrame helps users generate benchmarks that best meet their needs