Presentation is loading. Please wait.

Presentation is loading. Please wait.

Big Data Anton Boyko. Agenda What is Big Data? Why Big Data? How to Big Data?

Similar presentations


Presentation on theme: "Big Data Anton Boyko. Agenda What is Big Data? Why Big Data? How to Big Data?"— Presentation transcript:

1 Big Data Anton Boyko

2 Agenda What is Big Data? Why Big Data? How to Big Data?

3 What is Big Data? Big data usually includes data sets with sizes beyond the ability of commonly used software tools to capture, manage, and process the data within a tolerable elapsed time. GigabytesTerabytesPetabytes…

4 Data growth Big Data Volume 10x Velocity 4.3 Variety 85%

5 How to process Big Data? Traditional way Appropriate way

6 Move data to compute

7 Move compute to data Fast storage vs. fast CPU and fast networking Linear scalability

8 Map/Reduce workflow File system Mappers (find matches) Reducers (combine matches) Mappers (inverse keys and values) Reducer (combine results) DFS temp

9 Map/Reduce – how it works public class NamespaceMapper : MapperBase { //Override the map method. public override void Map( string inputLine, MapperContext context) { var reg = new Regex(@"(using)\s[A-za-z0-9_\.]*\;"); var matches = reg.Matches(inputLine); foreach (Match match in matches) { //Just emit the namespaces. context.EmitKeyValue(match.Value,"1"); } } } public class NamespaceReducer : ReducerCombinerBase { //Accepts each key and count the occurrences public override void Reduce( string key, IEnumerable values, ReducerCombinerContext context) { //Write back context.EmitKeyValue(key,values.Count().ToString()); } }

10 Traditional RDBMS vs. Map/Reduce RDBMS Terabytes of data Static schema Interactive and batch access Nonlinear scaling Map/Reduce Exabytes of data (or more) Dynamic schema Batch access only Linear scaling

11 Hadoop – implementation of Map/Reduce engine

12 Hadoop ecosystem

13 Offering ODBC for Excel PowerPivot Windows Server or Windows Azure C#, Java, JavaScript

14 Demo

15 Pricing Head Node Single extra large instance (8 CPU 14 GB) $0.32 per hour $238 per month Compute Node One or more large instances (4 CPU 7 GB) $0.16 per hour $119 per month

16 Вопросы? Антон Бойко boyko.ant@live.com


Download ppt "Big Data Anton Boyko. Agenda What is Big Data? Why Big Data? How to Big Data?"

Similar presentations


Ads by Google