Download presentation
Presentation is loading. Please wait.
Published byKevin Harrell Modified over 9 years ago
1
Tennessee Technological University1 The Scientific Importance of Big Data Xia Li Tennessee Technological University
2
The Scientific Importance of Big Data Financial benefits are the major motivation of big data research The technical challenges brought by big data The object of "data science" The common question behind data -- relationship network Causality and relationship Big data in social science Complexity in data processing Changes in the way of thinking Tennessee Technological University2
3
Financial benefits According to the statistics of IDC (International Data Corporation), the size of the created and copied data in 2011 is more than 1.8 Zettabyte (10^21) 75% of them are from individuals (mainly pictures, videos, and musics), more than the data size of all the printed data, 200 Pettabyte (10^15) Tennessee Technological University3
4
Financial benefits Google uses very large scale computing clusters and MapReduce software to process 400 PB data in one month In Facebook, registered users upload more than 1 billion photos; The log files generated in each day are more than 300 TB Tennessee Technological University4
5
The technical challenges Six departments of US government started the big data research projects to "form a unique branch of learning including mathematics, statistics, computer algorithm" Most of the research projects are focused on data engineering instead of data science The focus include analysis algorithm and system efficiency Tennessee Technological University5
6
The technical challenges Multiscale abnormal detection Threat plan in network Machine reading Realtime analysis of streaming data Non-linear random data compression Extendable statistics analysis technique Tennessee Technological University6
7
The technical challenges New data expression method If the data expression method is not suitable, analysis result is more prone to bias Data combination Data from different locations need to be combined together to be processed De-redundancy and high efficient low cost data storage Tennessee Technological University7
8
The object of "data science" Big data research is about how to find new knowledge; the data itself is not the research object As a research methodology, it is highly related to artificial intelligence algorithms like: data mining, statistic analysis, information search etc. Tennessee Technological University8
9
The object of "data science" The complexity of traditional algorithm grows exponentially as the size and dimension of the problem grow To big data at PB level, new method is needed Traditional AI algorithm can accept O(NlogN) or even O(N^3) To big data problem, O(NlogN) can hardly be accepted Tennessee Technological University9
10
The common question behind data -- relationship network The big data is composed of individual data and scattered connections After connection combination, it is a network Gene data becomes gene network World wide web data becomes social network Big data exists in a complicatedly connected data network Tennessee Technological University10
11
The common question behind data -- relationship network The distribution of world wide web Can obtain scale free network Tennessee Technological University11
12
Causality and relationship Correlation analysis is to find the mutual relationship hidden in data Correlation factors: support degree, confidence degree, interest degree Tennessee Technological University12
13
Causality and relationship A and B are related The values of A and B have mutual influence Cannot say A causes B Cannot say B causes A Strictly speaking, statistics cannot prove the logic causality Tennessee Technological University13
14
Big data in social science In Facebook, data is generated randomly Researchers need to find valuable information from these data Big data in social science has some unique characteristics like: multi-source heterogeneous, interactive, socialized, suddenness, high noise Tennessee Technological University14
15
Big data in social science The future task is not to get more and more data It is mining useful knowledge from the data When a kid learns to distinguish animals and cars, tens of sample pictures will be enough How to eliminate unnecessary data sampling becomes a problem Tennessee Technological University15
16
Complexity in data processing Original theory Time complexity: time used in algorithm Space complexity: the memory used in algorithm Data size complexity The problem can only be solved after the data size achieve a level The relationship between prediction confidence probability and data level Tennessee Technological University16
17
Changes in the way of thinking The fourth paradigm Data intensive research All models are wrong, and increasingly you can succeed without them Data in PB level can help us to analysis without model and hypothesis When data is correlated, statistics algorithm will find new patterns unknown to previous methods Tennessee Technological University17
18
Tennessee Technological University18 Thank you
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.