Presentation is loading. Please wait.

Presentation is loading. Please wait.

Tennessee Technological University1 The Scientific Importance of Big Data Xia Li Tennessee Technological University.

Similar presentations


Presentation on theme: "Tennessee Technological University1 The Scientific Importance of Big Data Xia Li Tennessee Technological University."— Presentation transcript:

1 Tennessee Technological University1 The Scientific Importance of Big Data Xia Li Tennessee Technological University

2 The Scientific Importance of Big Data  Financial benefits are the major motivation of big data research  The technical challenges brought by big data  The object of "data science"  The common question behind data -- relationship network  Causality and relationship  Big data in social science  Complexity in data processing  Changes in the way of thinking Tennessee Technological University2

3 Financial benefits  According to the statistics of IDC (International Data Corporation), the size of the created and copied data in 2011 is more than 1.8 Zettabyte (10^21)  75% of them are from individuals (mainly pictures, videos, and musics), more than the data size of all the printed data, 200 Pettabyte (10^15) Tennessee Technological University3

4 Financial benefits  Google uses very large scale computing clusters and MapReduce software to process 400 PB data in one month  In Facebook, registered users upload more than 1 billion photos; The log files generated in each day are more than 300 TB Tennessee Technological University4

5 The technical challenges  Six departments of US government started the big data research projects to "form a unique branch of learning including mathematics, statistics, computer algorithm"  Most of the research projects are focused on data engineering instead of data science  The focus include analysis algorithm and system efficiency Tennessee Technological University5

6 The technical challenges  Multiscale abnormal detection  Threat plan in network  Machine reading  Realtime analysis of streaming data  Non-linear random data compression  Extendable statistics analysis technique Tennessee Technological University6

7 The technical challenges  New data expression method If the data expression method is not suitable, analysis result is more prone to bias  Data combination Data from different locations need to be combined together to be processed  De-redundancy and high efficient low cost data storage Tennessee Technological University7

8 The object of "data science"  Big data research is about how to find new knowledge; the data itself is not the research object  As a research methodology, it is highly related to artificial intelligence algorithms like: data mining, statistic analysis, information search etc. Tennessee Technological University8

9 The object of "data science"  The complexity of traditional algorithm grows exponentially as the size and dimension of the problem grow  To big data at PB level, new method is needed  Traditional AI algorithm can accept O(NlogN) or even O(N^3)  To big data problem, O(NlogN) can hardly be accepted Tennessee Technological University9

10 The common question behind data -- relationship network  The big data is composed of individual data and scattered connections  After connection combination, it is a network Gene data becomes gene network World wide web data becomes social network  Big data exists in a complicatedly connected data network Tennessee Technological University10

11 The common question behind data -- relationship network  The distribution of world wide web  Can obtain scale free network Tennessee Technological University11

12 Causality and relationship  Correlation analysis is to find the mutual relationship hidden in data  Correlation factors: support degree, confidence degree, interest degree Tennessee Technological University12

13 Causality and relationship  A and B are related The values of A and B have mutual influence Cannot say A causes B Cannot say B causes A  Strictly speaking, statistics cannot prove the logic causality Tennessee Technological University13

14 Big data in social science  In Facebook, data is generated randomly  Researchers need to find valuable information from these data  Big data in social science has some unique characteristics like: multi-source heterogeneous, interactive, socialized, suddenness, high noise Tennessee Technological University14

15 Big data in social science  The future task is not to get more and more data  It is mining useful knowledge from the data  When a kid learns to distinguish animals and cars, tens of sample pictures will be enough  How to eliminate unnecessary data sampling becomes a problem Tennessee Technological University15

16 Complexity in data processing  Original theory Time complexity: time used in algorithm Space complexity: the memory used in algorithm  Data size complexity The problem can only be solved after the data size achieve a level The relationship between prediction confidence probability and data level Tennessee Technological University16

17 Changes in the way of thinking  The fourth paradigm Data intensive research  All models are wrong, and increasingly you can succeed without them  Data in PB level can help us to analysis without model and hypothesis  When data is correlated, statistics algorithm will find new patterns unknown to previous methods Tennessee Technological University17

18 Tennessee Technological University18 Thank you


Download ppt "Tennessee Technological University1 The Scientific Importance of Big Data Xia Li Tennessee Technological University."

Similar presentations


Ads by Google