Large Scale Data Representation Erik Goodman Daniel Kapellusch Brennen Meland Hyunjae Park Michael Rogers.

Slides:



Advertisements
Similar presentations
Object Specific Compressed Sensing by minimizing a weighted L2-norm A. Mahalanobis.
Advertisements

Efficient Run-Time Dispatching in Generic Programming with Minimal Code Bloat Lubomir Bourdev Advanced Technology Labs Adobe Systems Jaakko Järvi Computer.
Biointelligence Laboratory, Seoul National University
Uncertainty Representation. Gaussian Distribution variance Standard deviation.
11/14/05ELEC Fall Multi-processor SoCs Yijing Chen.
Introduction to Volume Visualization Mengxia Zhu Fall 2007.
Independent Component Analysis (ICA) and Factor Analysis (FA)
/department of mathematics and computer science Visualization of Transition Systems Hannes Pretorius Visualization Group
Disediakan oleh Suriati bte Sadimon GMM, FSKSM, UTM Graphics modeling.
GIS Definition & Key Elements 1 Geographic Information System [GIS] Continuum E. Atlas Thematic Mapper--CAC GIS.
55:148 Digital Image Processing Chapter 11 3D Vision, Geometry Topics: Basics of projective geometry Points and hyperplanes in projective space Homography.
CSC2535: 2013 Advanced Machine Learning Lecture 3a: The Origin of Variational Bayes Geoffrey Hinton.
C o n f i d e n t i a l Developed By Nitendra NextHome Subject Name: Data Structure Using C Title: Overview of Data Structure.
Digital Image Characteristic
ASCR Scientific Data Management Analysis & Visualization PI Meeting Exploration of Exascale In Situ Visualization and Analysis Approaches LANL: James Ahrens,
LÊ QU Ố C HUY ID: QLU OUTLINE  What is data mining ?  Major issues in data mining 2.
Framework for K-12 Science Education
PROGRAMMING LANGUAGES The Study of Programming Languages.
Multigenerational Analysis And Visualization of Large 3D Vascular Images Shu-Yen Wan Department of Information Management, Chang Gung University, Taiwan,
Time Table exchange QSAS / CL / CAA / AMDA CESR, 25/26 feb
Data Management Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition.
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
Texas A&M University Page 1 9/16/ :22:47 PM Wei Zhao Texas A&M University Is Computer Stuff Science, Engineering, or Something else?
Science & Technology Centers Program Center for Science of Information Bryn Mawr Howard MIT Princeton Purdue Stanford Texas A&M UC Berkeley UC San Diego.
K E Y : SW Service Use Big Data Information Flow SW Tools and Algorithms Transfer Application Provider Visualization Access Analytics Curation Collection.
Genetic network inference: from co-expression clustering to reverse engineering Patrik D’haeseleer,Shoudan Liang and Roland Somogyi.
An Integration Framework for Sensor Networks and Data Stream Management Systems.
OVERVIEW- What is GIS? A geographic information system (GIS) integrates hardware, software, and data for capturing, managing, analyzing, and displaying.
Geographic Information System GIS This project is implemented through the CENTRAL EUROPE Programme co-financed by the ERDF GIS Geographic Inf o rmation.
Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu.
MULTIPLE TRIANGLE MODELLING ( or MPTF ) APPLICATIONS MULTIPLE LINES OF BUSINESS- DIVERSIFICATION? MULTIPLE SEGMENTS –MEDICAL VERSUS INDEMNITY –SAME LINE,
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang 5-1 Chapter 5 Business Intelligence: Data.
A Comparison Between Bayesian Networks and Generalized Linear Models in the Indoor/Outdoor Scene Classification Problem.
Difference Between Raster and Vector Images Raster and vector are the two basic data structures for storing and manipulating images and graphics data on.
1 A Conceptual Framework of Data Mining Y.Y. Yao Department of Computer Science, University of Regina Regina, Sask., Canada S4S 0A2
 Three-Schema Architecture Three-Schema Architecture  Internal Level Internal Level  Conceptual Level Conceptual Level  External Level External Level.
What is Modeling?. Simplifying Complex Phenomena v We live in a complex world v Most of the scientific relationships we study are very complex v Understanding.
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
HPDC 2013 Taming Massive Distributed Datasets: Data Sampling Using Bitmap Indices Yu Su*, Gagan Agrawal*, Jonathan Woodring # Kary Myers #, Joanne Wendelberger.
Descriptive Statistics vs. Factor Analysis Descriptive statistics will inform on the prevalence of a phenomenon, among a given population, captured by.
JAVA AND MATRIX COMPUTATION
Digital Intuition Cluster, Smart Geometry 2013, Stylianos Dritsas, Mirco Becker, David Kosdruy, Juan Subercaseaux Welcome Notes Overview 1. Perspective.
K E Y : SW Service Use Big Data Information Flow SW Tools and Algorithms Transfer Transformation Provider Visualization Access Analytics Curation Collection.
General ideas to communicate Dynamic model Noise Propagation of uncertainty Covariance matrices Correlations and dependencs.
Chapter 4: Variability. Variability Provides a quantitative measure of the degree to which scores in a distribution are spread out or clustered together.
BOĞAZİÇİ UNIVERSITY DEPARTMENT OF MANAGEMENT INFORMATION SYSTEMS MATLAB AS A DATA MINING ENVIRONMENT.
Near Real-Time Verification At The Forecast Systems Laboratory: An Operational Perspective Michael P. Kay (CIRES/FSL/NOAA) Jennifer L. Mahoney (FSL/NOAA)
© 2010 Pearson Addison-Wesley. All rights reserved. Addison Wesley is an imprint of Designing the User Interface: Strategies for Effective Human-Computer.
CHAPTER 4 THE VISUALIZATION PIPELINE. CONTENTS The focus is on presenting the structure of a complete visualization application, both from a conceptual.
Boolean Algebra Computer Architecture. Digital Representation Digital is an abstraction of analog voltage –Voltage is a continuous, physical unit Typically.
Bayesian Optimization Algorithm, Decision Graphs, and Occam’s Razor Martin Pelikan, David E. Goldberg, and Kumara Sastry IlliGAL Report No May.
R Workshop #2 Basic Data Analysis. What we did last week: Understand the basics of how R works Generated objects (vectors, matrices, etc.) Read in data.
Ultra-high dimensional feature selection Yun Li
Learning Kernel Classifiers 1. Introduction Summarized by In-Hee Lee.
Optimizing Packet Lookup in Time and Space on FPGA Author: Thilan Ganegedara, Viktor Prasanna Publisher: FPL 2012 Presenter: Chun-Sheng Hsueh Date: 2012/11/28.
Chapter 6 Becoming Acquainted With Statistical Concepts.
Writing for Computer science ——Chapter 6 Graphs, figures, and tables Tao Yang
Multivariate statistical methods. Multivariate methods multivariate dataset – group of n objects, m variables (as a rule n>m, if possible). confirmation.
Big data classification using neural network
SNS COLLEGE OF TECHNOLOGY
Metafast High-throughput tool for metagenome comparison
OSE801 Engineering System Identification Spring 2010
Analyzing Redistribution Matrix with Wavelet
Basic machine learning background with Python scikit-learn
Introduction to Statistics
Data Science introduction.
Descriptive Statistics vs. Factor Analysis
Analytics – Statistical Approaches
CHAPTER 14: Information Visualization
Topic 1 Statistical Analysis.
Presentation transcript:

Large Scale Data Representation Erik Goodman Daniel Kapellusch Brennen Meland Hyunjae Park Michael Rogers

What is Data Representation? ● Hardware ● Communication ● Data Generation Process ● Input-Output ● Data Sparsity ● Noise Many considerations go into the choice of a particular data representation:

The Taxonomy of Data Representation Basic Data Structures ● Hash Tables, Inverted Indices, Tables/Relations, etc. Mathematical Structures ● Sets, Vectors, Matrices, Graphs, Metric Spaces Derived Mathematical Structures ● Clusters, Linear Projections, Data Samples, etc.

Design Goals On the data acquisition side: ● Construct a structure that is sufficiently close to the data. On the data analysis side: ● Construct a structure that has both a flexible description and is tractable algorithmically.

Challenge of Architecture and Algorithms ● Smaller scale data analysis techniques are not optimized to work with very large scale data. ● Critical step moving forward is determining the best way to represent the data. Challenge of Combining Algorithmic and Statistical Perspectives ● Incompatibility with researchers and data use. ● Correlation does not equal causation.

Challenge of Primitives ● Provide a framework for a broad scope of computations. ● Allow programming for a reasonable amount of abstraction. ● Applicable for a large range of platforms. Challenge of Manipulation and Integration of Heterogeneous Data ● The difficulty of merging different datasets into a common representation. ● Tabular data vs free-form text-based data and images. ● It can be difficult to create a meaningful data visualization when dealing with many variables.

Challenge of Heavy-Tailed and High-Variance Data ● Big data naturally leads to big and heavy tails at the ends of the table especially for social and information networks. ● Extracting valuable data from amongst background noise. ● Heavy tail is often where new scientific phenomena manifest themselves.