3
Hadoop? Cloud data warehousing? Machine learning? NoSQL?
Ecosystems around open source projects are very active Basis in commodity hardware Scale out, and cloud Change in economics of computing power Change in economics of storage
Employee IDAgeIncome Employee ID 123 Age Income Imagine if instead of: You have: Perf: values you wish to aggregate are adjacent Efficiency: great compression from identical or nearly-identical values in proximity Fast aggregation and high compression means huge volumes of data can be stored and processed, in RAM
mapper Input reducer Input Output Input K1K1 K2K2 K3K3 Output
Impala + Kafka
Store raw data, centrally in HDFS Use different processing engines for different analyses Data Lake
NO PURCHASE NECESSARY. Open only to event attendees. Winners must be present to win. Game ends May 9 th, For Official Rules, see The Cloud and Enterprise Lounge or myignite.com/challenge