Data Mining, Data Science, Big Data
Data Science Data Science aims to extract insights from large data Less emphasis on algorithms More emphasis on ‘outreach’ Term Data Science is about 10 years old, very popular nowadays Many people reinvent themselves as Data Scientists data miners, statisticians, BI people, analysts, database developers
Data Mining & Data Science Data Mining fff Statistics Computational methods Dealing with large data Visualisation Involving domain knowledge Interpretable and interpreted results
Big Data Because you can… Administrative/financial reasons cheap storage Administrative/financial reasons Internet and social computing Internet of Things, ubiquitous computing 1980 1990 2000 2010 $0.01 $1 $100 $10,000 $1,000,000 cost per Gigabyte in dollars
Cheap Storage 350 million photos uploaded to Facebook per day almost 20 additional racks per day required 1956, IBM 350, 5 Mb 90 Tb
Big Data Many facets, often people focus on only one Very, very large data CERN, Google, Facebook, Twitter, … Analytics Internet-generated Social data Heterogeneous, unstructured data Large-scale technologies MapReduce, Hadoop
Size-complexity trade-off Technological restrictions produce a trade-off Many Big Data projects algorithmically not so complex Embarrassingly parallel complexity size CERN