Presentation is loading. Please wait.

Presentation is loading. Please wait.

Hadoop for Data Warehousing

Similar presentations


Presentation on theme: "Hadoop for Data Warehousing"— Presentation transcript:

1 Hadoop for Data Warehousing
Scott Person Senior Consultant Tiber Solutions

2 Who? Greg Jones Matt Roberts Scott Person

3 What? Hadoop is an open-source software framework for storing data and running applications on clusters of commodity hardware.

4 What?

5 Why use Hadoop? Highly scalable – faster access to more data
Built for analytics – better access to more data Modular Storage, query engines, meta-data, and scripting are distinct components Many query engines share the same “tables” in the form of the Hive metastore Quickly evolving This is the first time where I’ve regularly used the date parameter in a Google search. Anything over a year old is of suspect relevance.

6 “Figure it out…” Goal of this effort… Demystify Hadoop
Identify the pitfalls and challenges Build institutional knowledge Create a Hadoop data warehouse framework Has anyone read the book or seen the movie “The Big Short”? Steve Eisman’s (Steve Carell) right hand man was preoccupied with how their investment was going to go wrong. In quite colorful language he kept asking how they were being taken advantage of by Deutche Bank. I want to figure out if there’s a reason why no one else is doing this.

7 KUDU?! The crux Moving target
How do we model SCD type 1/2 data when updates are hard? KUDU?! Hard!

8 Where are we? Sample data set acquired – all domestic commercial flights from 2000-present (122M records) Candidate architecture created Exploring data modeling ideas – trading data size for performance Running benchmarks

9 What’s next? Basically everything…
We are on the front end of this effort. Everything that we’ve done can be done better.

10 Want to help? This is a relay You can be Hadooping in an hour Basecamp
Instructions on getting started (files area) To do items to work on Rock throwers welcome


Download ppt "Hadoop for Data Warehousing"

Similar presentations


Ads by Google