Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to Big Data -- and what it means to database professionals Haidong “Alex” Ji http://www.haidongji.com http://www.jimetrics.com http://github.com/haidong.

Similar presentations


Presentation on theme: "Introduction to Big Data -- and what it means to database professionals Haidong “Alex” Ji http://www.haidongji.com http://www.jimetrics.com http://github.com/haidong."— Presentation transcript:

1 Introduction to Big Data -- and what it means to database professionals
Haidong “Alex” Ji

2 About me Independent consultant Database Administration
SQL Server, MySQL Open source hacker, automation specialist

3 Big Data Intro Let’s start with certain use cases first
Please stop me any time for questions. Feel free to add your insights, I want to learn from you too!

4 Political campaign predication
Nate Silver vs TV pundits Accurate predication Shakespeare: [But] men may construe things after their fashion; Clean from the purpose of the things themselves The Obama campaign’s Data Game For more than a year, Chicago’s crack analytics team scoured voter files, Facebook, and consumer databases to create a revolutionary data-mining and modeling system that allowed them to predict which voters were persuadable, increase the potency of their ads and s, and run 66,000 election simulations every single night

5 Culinary research Flavor correlation
A hypothesis, which over the past decade has received attention among some chefs and food scientists, states that ingredients sharing flavor compounds are more likely to taste well together than ingredients that do not. Use Big Data technique to validate that Is the hypothesis true?

6 Netflix and Amazon recommendation systems
Using other users’ rankings Collaborative Filtering Using movie/book information Content Filtering Using clustering methods for data analysis Hierarchical K-means

7 DNA sequencing research
Moving from sequencing individual organisms to entire ecosystems.

8 Twitter public sentiment
Public sentiment toward a particular product Assign values to words Aggregate that to generate sentiment Natural language processing

9 Google Flu trend Flu trend research based on search terms
Traditional flu surveillance is very important, but most health agencies focus on a single country or region and only update their estimates once per week. Google Flu Trends is currently available for a number of countries around the world and is updated every day, providing a complement to these existing systems.

10 Hardware and Software technologies typically used by Big Data
Multicore CPUs GPU processing Memory locality Cloud computing Software Technologies involved: Hadoop/MapReduce, NoSQL, SQL database, ColumnStore database, Business Analytics, Statistics modeling, Machine Learning, Graph database, Visualization

11 Data Challenges Our ability of analyzing data is behind our ability of collecting them Scaling: scaling up or scaling out For small to medium sized companies, even storing them can be a challenge, which leads to…

12 Cloud computing Computing as utility Storage Processing power
Prebuilt special purpose computing Hadoop Databases: DynamoDB, ElastiCache… Content Delivery: CloudFront

13 Hadoop/MapReduce Non-structured data Converting that to JSON/XML
Feeding data into mapper Reducer producing aggregated result User Case: user sentiment research, search term analysis, DNA sequencing My opinion: it is over-rated!

14 NoSQL Key-Value store Less stringent Document-oriented
Mostly JSON Eventual consistency Mostly used in social network site

15 SQL Response Column Store databases In-memory OLTP, such as Heckton

16 Statistical analysis Applying stochastic models on set of data
A lot can be learned using simple methods: Sum and average (also moving average) can tell you a lot Linear regression model Logistical regression model NLP, clustering

17 Machine learning Given data, find a function that predicts y from x: no model of nature implied Automate automation!

18 Graph Database Node and edge Social network relationships

19 Data visualization Thousand words!
Human eyes has the highest-bandwidth channel to the brain Effective communication of information Clarity Integrity Stimulate viewer engagement

20 What does this mean to me?

21 1. Don’t panic Keep doing a good job at your current position

22 2. Keep learning Column-store databases In-memory OLTP
Current BI-complementary technologies: Data analytics: statistical packages (SAS, R, etc.) Data visualization and reporting tools Learn from good MOOC courses on edX, Coursera

23 Data Science Venn Diagram from Drew Conway

24 Q/A, discussion What does this mean for me? For us?


Download ppt "Introduction to Big Data -- and what it means to database professionals Haidong “Alex” Ji http://www.haidongji.com http://www.jimetrics.com http://github.com/haidong."

Similar presentations


Ads by Google