A computational tool for depth-based Statistical analysis Eynat Rafalin, Tufts University Computer Science Department.

Slides:



Advertisements
Similar presentations
Easily retrieve data from the Baan database
Advertisements

Michael A. Burr, Eynat Rafalin, and Diane L. Souvaine
WHAT IS ELINK? Thermoflow, Inc.
Components of GIS.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Relational Algebra Chapter 4, Part A.
3D Graphics Rendering and Terrain Modeling
31/1/2002Brunel University - BEng Final Year Project1 Hospital Radio Interactive Play-out System Jonathan Hart Supervisor: Dr John Cosmas.
Measures of Central Tendency. Central Tendency “Values that describe the middle, or central, characteristics of a set of data” Terms used to describe.
R for Research Data Analysis using R Day1: Basic R Baburao Kamble University of Nebraska-Lincoln.
Chapter 13 Conducting & Reading Research Baumgartner et al Data Analysis.
© , Michael Aivazis DANSE Software Issues Michael Aivazis California Institute of Technology DANSE Software Workshop September 3-8, 2003.
Automatic Evaluation Of Search Engines Project Presentation Team members: Levin Boris Laserson Itamar Instructor Name: Gurevich Maxim.
T T07-01 Sample Size Effect – Normal Distribution Purpose Allows the analyst to analyze the effect that sample size has on a sampling distribution.
ViaLogy Lien Chung Jim Breaux, Ph.D. SoCalBSI 2004 “ Improvements to Microarray Analytical Methods and Development of Differential Expression Toolkit ”
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition Chapter 2: Operating-System Structures Modified from the text book.
New Features of APV-SRS-LabVIEW Data Acquisition Program Eraldo Oliveri on behalf of Riccardo de Asmundis INFN Napoli [Certified LabVIEW Developer] NYC,
What is R By: Wase Siddiqui. Introduction R is a programming language which is used for statistical computing and graphics. “R is a language and environment.
Geometric and combinatorial issues in data depth
Theory and practice of computer technologies used for creating DEA software Eugene P. MORGUNOV Siberian State Aerospace University Krasnoyarsk, Russia.
Technology and Historical Overview. Introduction to 3d Computer Graphics  3D computer graphics is the science, study, and method of projecting a mathematical.
The following slides have been adapted from to be presented at the Follow-up course on Microarray Data Analysis.
STAT02 - Descriptive statistics (cont.) 1 Descriptive statistics (cont.) Lecturer: Smilen Dimitrov Applied statistics for testing and evaluation – MED4.
Unit 2: Engineering Design Process
Module 11: Standard Deviations and the Like This module describes measures of dispersion or unlikeness, including standard deviations, variances,
EÖTVÖS UNIVERSITY BUDAPEST Department of Physics of Complex Systems VO Spectroscopy Workshop, ESAC Spectrum Services 2007 László Dobos (ELTE)
Online aggregation Joseph M. Hellerstein University of California, Berkley Peter J. Haas IBM Research Division Helen J. Wang University of California,
Remote Sensing Supervised Image Classification. Supervised Image Classification ► An image classification procedure that requires interaction with the.
The Practice of Statistics Third Edition Chapter 1: Exploring Data 1.2 Describing Distributions with Numbers Copyright © 2008 by W. H. Freeman & Company.
Touchstone Automation’s DART ™ (Data Analysis and Reporting Tool)
School of Computer Science & Information Technology G6DICP Introduction to Computer Programming Milena Radenkovic.
Capabilities of Software. Object Linking & Embedding (OLE) OLE allows information to be shared between different programs For example, a spreadsheet created.
Chapter 3 Descriptive Statistics II: Additional Descriptive Measures and Data Displays.
Introduction to ArcGIS for Environmental Scientists Module 3 – GIS Analysis Model Builder.
View_hdf Kam-Pui Lee Science Applications International Corporation CERES Data Management Team Linda Hunt Computer Sciences Corporation Atmospheric Sciences.
1 Ch. 1: Software Development (Read) 5 Phases of Software Life Cycle: Problem Analysis and Specification Design Implementation (Coding) Testing, Execution.
OPERATING SYSTEM - program that is loaded into the computer and coordinates all the activities among computer hardware devices. -controls the hardware.
Computing Systems & Programming ECE Fundamental Concepts Chapter 1 Engineering Problem Solving.
© Tan,Steinbach, Kumar Introduction to Data Mining 8/05/ Data Mining: Exploring Data Lecture Notes for Chapter 3 Introduction to Data Mining by Tan,
Our project main purpose is to develop a tool for a combinatorial game researcher. Given a version of combinatorial puzzle game and few more parameters,
Marcin Płóciennik Poznan Supercomputing and Networking Center OGF23, Barcelona, Spain, June 3rd, 2008 Use case of NMR spectrometry in Virtual Laboratory.
IST 220 – Intro to Databases Lecture 2 Touring Microsoft Access.
Graphical Query Tools 5-ABAP.1 This is PricewaterhouseCoopers PROPRIETARY MATERIAL (hereafter, the Material) intended for internal use only. You may not.
SIMULATION OF MULTIPROCESSOR SYSTEM AND NETWORK Manish Patel Nov 8 th 2004 Advisor: Dr. Chung-E-Wang Department of Computer Science California State University,
MODULE 3: DESCRIPTIVE STATISTICS 2/6/2016BUS216: Probability & Statistics for Economics & Business 1.
The KOSMOSHOWS What is it ? The statistic inside What it can do ? Future development Demonstration A. Tilquin (CPPM)
Collecting and Processing Information Foundations of Technology Collecting and Processing Information © 2013 International Technology and Engineering Educators.
Simple Machine Systems: Lever, Pulleys, and Incline Plane By Natthapol Pongthaipat Artprecha Rugsachart Thanakorn Sithanukul Debdhanit Yupho.
Filters– Chapter 6. Filter Difference between a Filter and a Point Operation is that a Filter utilizes a neighborhood of pixels from the input image to.
VIEWS b.ppt-1 Managing Intelligent Decision Support Networks in Biosurveillance PHIN 2008, Session G1, August 27, 2008 Mohammad Hashemian, MS, Zaruhi.
1 Programming and problem solving in C, Maxima, and Excel.
Week 2 Creating Outputs with Excel COMPUTER LAB CLASS Use of calculators, computers, data bases, spreadsheets and the display of domestic level quantitative.
 is a set of instructions that tell the computer what to do. Software can be categorized into: 1. Operating system software 2. Applications software.
EQuIS and Tableau Getting the most out of your tools.
Mail Call Us: , , Data Science Training In Ameerpet
Overview Modern chip designs have multiple IP components with different process, voltage, temperature sensitivities Optimizing mix to different customer.
Statistics in SPSS Lecture 3
IST 220 – Intro to Databases
Dynamic management of segmented structures in 3D Slicer
Descriptive Statistics: Numerical Methods
J. Michael, M. Shing M. Miklaski, J. Babbitt Naval Postgraduate School
Quantitative Data Analysis
3D Graphics Rendering PPT By Ricardo Veguilla.
EPANET-MATLAB Toolkit An Open-Source Software for Interfacing EPANET with MATLAB™ Demetrios ELIADES, Marios KYRIAKOU, Stelios VRACHIMIS and Marios POLYCARPOU.
Project Implementation for ITCS4122
Project Title This is a sample slide layout
CSCI N207 Data Analysis Using Spreadsheet
Agenda About Excel/Calc Spreadsheets Key Features
Copyright ©2008 by Pearson Education, Inc
Data exploration and visualization
Multichannel Link Path Analysis
Presentation transcript:

A computational tool for depth-based Statistical analysis Eynat Rafalin, Tufts University Computer Science Department

The tool Easy to use, efficient and expandable interface, for statistical research, based on the notion of data depth. For scientists with no computer science background.

Our goal Present the tool to the community Code\software available on request Run on real data Get feedback Is such a tool needed? Additions\improvements?

General C++ based software (no additional tools\software needed) Simple interface. Should allow to enter data files, sort the data points and filter unwanted data perform calculations present the results in an easy to understand graphical interface Save and output data for future use Fast Portable code

General description Data filter Contours display and selection Statistical modules output txt, excel files Geomview

Data filter Graphical user interface developed in C++ Used to crop\manipulate a data set before it is fed into the statistical modules Fast and light Convenient and easy to use user interface Portable code (UNIX, Solaris, Linux, Win)

Data filter

Statistical modules Depth contours (2D) Half-space (location) depth contours optimal O(n 2 ) time Supports two approaches for defining contours Including Tukey median and the bagplot Including contours’ parameters (size, etc..) Convex hull peeling depth contours Simplicial depth contours Tukey median computation (O(nlog 3 n)) Locating a new point in a set of depth contours (O(log n) query time)

Approaches for defining depth contours P. Rousseeuw et al. The k-th depth contour is the boundary of the set of points in the plane with depth  k R. Liu et al. (based on order statistics) The sample p-th central hull is the convex hull containing the most central fraction p sample points.

Half-space (location) depth contours module Depth contours for a sample set with 8 data points Depth contours for a data set describing diabetic patients

Statistical modules – cntd. Plots DD (Depth vs. Depth) plots O(n 2 ) time Shrinkage plots Fan plots

DD (Depth vs. Depth) plots module Two 2D data sets of 50 points each, created from normal distribution, centered at (0,0), with different covariance matrices (1 and 4 id). Depth according to set A Depth according to set B

Fan plots 50 data points, created from a random distribution, with covariance matrix 4 times identity. The fans are created for data sets containing the 1/6, 2/6,..central regions. For each region the area of the CH of 2, 4, 6,…% of the points is computed. Relative area (CH of p%/CH) Percentile of points

Graphical contour selection tool Plots depth contours and selects data ranges. Actions Import\export Select points Depth slider Filter

Future work Run the tool on existing data sets Distribute preliminary versions and get users feedback Data filter Group by row\column Filter by row\column Interactions between rows\columns (addition, substitution, logical operations) Statistical modules Implement additional modules Improve running times

Contributors Prof. Diane Souvaine Prof. Alva Couch Eynat Rafalin Michael Burr Joe Handelman James Hayes Ori Taka Alok Lal Janet Luan Kim Miller Tim Mitchell Nikolai Shvertner