Download presentation
Presentation is loading. Please wait.
Published byDayna Hutchinson Modified over 9 years ago
1
Large Scale Data Representation Erik Goodman Daniel Kapellusch Brennen Meland Hyunjae Park Michael Rogers
2
What is Data Representation? ● Hardware ● Communication ● Data Generation Process ● Input-Output ● Data Sparsity ● Noise Many considerations go into the choice of a particular data representation:
3
The Taxonomy of Data Representation Basic Data Structures ● Hash Tables, Inverted Indices, Tables/Relations, etc. Mathematical Structures ● Sets, Vectors, Matrices, Graphs, Metric Spaces Derived Mathematical Structures ● Clusters, Linear Projections, Data Samples, etc.
4
Design Goals On the data acquisition side: ● Construct a structure that is sufficiently close to the data. On the data analysis side: ● Construct a structure that has both a flexible description and is tractable algorithmically.
5
Challenge of Architecture and Algorithms ● Smaller scale data analysis techniques are not optimized to work with very large scale data. ● Critical step moving forward is determining the best way to represent the data. Challenge of Combining Algorithmic and Statistical Perspectives ● Incompatibility with researchers and data use. ● Correlation does not equal causation.
6
Challenge of Primitives ● Provide a framework for a broad scope of computations. ● Allow programming for a reasonable amount of abstraction. ● Applicable for a large range of platforms. Challenge of Manipulation and Integration of Heterogeneous Data ● The difficulty of merging different datasets into a common representation. ● Tabular data vs free-form text-based data and images. ● It can be difficult to create a meaningful data visualization when dealing with many variables.
7
Challenge of Heavy-Tailed and High-Variance Data ● Big data naturally leads to big and heavy tails at the ends of the table especially for social and information networks. ● Extracting valuable data from amongst background noise. ● Heavy tail is often where new scientific phenomena manifest themselves. http://math.stackexchange.com/questions/754972/probability-distribution-morphing-from-gaussian-to-heavy-tail
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.