Download presentation
Presentation is loading. Please wait.
1
A quick trip from code profiling to file formats
Daniel Lanza, CERN Database group tCSC 27th May 2016
2
Agenda Importing data to Big Data solutions
Tool for importing (Java profiler for distributed applications) Target file format (in numbers) Compatile applications Partitioning Backups Keep data up to date …
3
Java profiler for distributed applications
Parquet Parquet Parquet
4
Java profiler for distributed applications
CPU-bound! 5 MB/s 5 MB/s 5 MB/s Image source: T. White, The Definitive Hadoop Guide
5
Java profiler for distributed applications
Image source: T. White, The Definitive Hadoop Guide
6
Java profiler for distributed applications
FlameGraph
7
Java profiler for distributed applications
500% faster!
8
Java profiler for distributed applications
Still CPU-bound 11 MB/s 11 MB/s 11 MB/s Image source: T. White, The Definitive Hadoop Guide
9
Java profiler for distributed applications
Parquet Parquet Parquet
10
Java profiler for distributed applications
CERN DB blog entry Git repos
11
File formats in numbers
12
File formats in numbers
13
File formats in numbers
14
File formats in numbers
Joining columns from different tables (1400 columns in total)
15
Messages to take away Profile your application if you find performance issues Choose carefully file format and compression
16
Questions / Feedback
17
Acknowledgements Zbigniew Baranowski Maciej Grzybek Kacper Surdy
Joeri Hermans
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.