How: Evaluation of visualization techniques for EDA: F+C Explain challenges of displaying large data set (how to see a million+ data points? ie. our age.

Slides:



Advertisements
Similar presentations
Presentation at Society of The Query conference, Amsterdam November 13-14, 2009 (original title: Learning from Google: software design as a methodology.
Advertisements

Critical Reading Strategies: Overview of Research Process
The Robert Gordon University School of Engineering Dr. Mohamed Amish
Descriptive Statistics. Descriptive Statistics: Summarizing your data and getting an overview of the dataset  Why do you want to start with Descriptive.
PolyAnalyst Data and Text Mining tool Your Knowledge Partner TM www
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
USABILITY AND EVALUATION Motivations and Methods.
Chapter 12: Web Usage Mining - An introduction
The art and science of measuring people l Reliability l Validity l Operationalizing.
Visualization CSC 485A, CSC 586A, SENG 480A Instructor: Melanie Tory.
Live Re-orderable Accordion Drawing (LiveRAC) Peter McLachlan, Tamara Munzner Eleftherios Koutsofios, Stephen North AT&T Research Symposium August, 2007.
Recognizing User Interest and Document Value from Reading and Organizing Activities in Document Triage Rajiv Badi, Soonil Bae, J. Michael Moore, Konstantinos.
Data Mining – Intro.
CS157A Spring 05 Data Mining Professor Sin-Min Lee.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Huimin Ye.
WPI Center for Research in Exploratory Data and Information Analysis From Data to Knowledge: Exploring Industrial, Scientific, and Commercial Databases.
Microsoft ® Official Course Monitoring and Troubleshooting Custom SharePoint Solutions SharePoint Practice Microsoft SharePoint 2013.
Information Retrieval – and projects we have done. Group Members: Aditya Tiwari ( ) Harshit Mittal ( ) Rohit Kumar Saraf ( ) Vinay.
© Ramesh Jain Ramesh Jain CTO, PRAJA inc. and Professor Emeritus, UCSD Emergent Semantics and Experiential Computing.
The Tutorial of Principal Component Analysis, Hierarchical Clustering, and Multidimensional Scaling Wenshan Wang.
Overview G. Jogesh Babu. Probability theory Probability is all about flip of a coin Conditional probability & Bayes theorem (Bayesian analysis) Expectation,
Unit 2: Engineering Design Process
Research Terminology for The Social Sciences.  Data is a collection of observations  Observations have associated attributes  These attributes are.
Enterprise & Intranet Search How Enterprise is different from Web search What to think about when evaluating Enterprise Search How Intranet use is different.
Tennessee Technological University1 The Scientific Importance of Big Data Xia Li Tennessee Technological University.
A New Generation GIS for the Classroom ArcGIS 9.0 A New Generation GIS for the Classroom.
Lecture 01: Introduction September 5, 2012 COMP Visual Analytics and Provenance.
Visual User Interfaces David Rashty. “Grasping the whole is a gigantic theme. Arguably, intellectual history’s most important. Ant-vision is humanity’s.
Tomek Strzalkowski & Sharon G. Small ILS Institute, SUNY Albany LAANCOR May 22, 2010 (Tacitly) Collaborative Question Answering Utilizing Web Trails 5/22/10.
Web Analytics Basic 6-Step Process Based on content from: /od/loganalysis/a/web_analy tics.htm.
1 Knowledge Discovery Transparencies prepared by Ho Tu Bao [JAIST] ITCS 6162.
A Model for Fast Web Mining Prototyping Nivio Ziviani UFMG – Brazil Álvaro Pereir a Ricardo Baeza-Yates Jesus Bisbal UPF – Spain.
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
Social Science Research Design and Statistics, 2/e Alfred P. Rovai, Jason D. Baker, and Michael K. Ponton Chi-Square Goodness-of-Fit Test PowerPoint Prepared.
Principles of Data Mining. Introduction: Topics 1. Introduction to Data Mining 2. Nature of Data Sets 3. Types of Structure Models and Patterns 4. Data.
Text Based Information Retrieval Text Based Information Retrieval H02C8A H02C8B Marie-Francine Moens Karl Gyllstrom Katholieke Universiteit Leuven.
Chapter 12: Web Usage Mining - An introduction Chapter written by Bamshad Mobasher Many slides are from a tutorial given by B. Berendt, B. Mobasher, M.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
Document Clustering for Forensic Analysis: An Approach for Improving Computer Inspection.
EXPERIMENTAL DATA PROCESSING Ilya I. Ogol. There are three kinds of lies: lies, damned lies, and statistics.
CS157B Fall 04 Introduction to Data Mining Chapter 22.3 Professor Lee Yu, Jianji (Joseph)
MK346 – Undergraduate Dissertation Preparation Part II - Data Analysis and Significance Testing.
Unit 5—HS 305 Research Methods in Health Science
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
Human Computer Interaction
Guide to MCSE , Enhanced1 Activity 11-1: Using Task Manager to Manage Applications and Processes Objective: To explore managing applications and.
Steps of a Design Brief V  Is a Plan of work A written step-by- step process by which the goal is to be accomplished The plan can include expected.
JISC/NSF PI Meeting, June Archon - A Digital Library that Federates Physics Collections with Varying Degrees of Metadata Richness Department of Computer.
CHEMISTRY OF AEROSOLS Analyzing the composition of aerosols collected on Mt. Pico Photo: L. Harkness.
Principals of Research Writing. What is Research Writing? Process of communicating your research  Before the fact  Research proposal  After the fact.
Identifying Needs and Establishing Requirements Presenters: Veronica Gasca Jennifer Rhough.
Institute for Visualization and Perception Research 1 © Copyright 1998 Haim Levkowitz Perception for visualization: From design to evaluation Victoria.
Oxlip+. What is Oxlip+? A tool for finding & linking to databases – Online collections of (scholarly) materials – Includes full text / indexes / range.
Data Visualization as a Tool for Communicating Ocean Science Rob Bochenek Information Architect Axiom Consulting & Design.
 The goal is scientific objectivity, the focus is on data that can be measured numerically.
Workflow Manager Demo. Login Page Annotator Group Main Page ‘Details’: link to see the workflow detail Filter selection Click to view interface question.
GRAPH ANALYSIS AND VISUALIZATION PART 1. History of Graph 1735.
First Principle Data Analysis Database Use and Design – Spring 2016 © Philippe Bonnet 2014.
Zaap Visualization of web traffic from http server logs.
Book web site:
Data mining in web applications
Data Mining – Intro.
Guangbing Yang Presentation for Xerox Docushare Symposium in 2011
An Introduction to the IVC Software Framework
Weichuan Dong Qingsong Liu Zhengyong Ren Huanyang Zhao
כריית מידע -- מבוא ד"ר אבי רוזנפלד.
CSc4730/6730 Scientific Visualization
Writing a Research Proposal
Introduction to Visual Analytics
My version of the research process
Presentation transcript:

How: Evaluation of visualization techniques for EDA: F+C Explain challenges of displaying large data set (how to see a million+ data points? ie. our age old screen space challenge) Explain the guideline of "overview, filter and detail-on-demand" in design. Relate this to EDA. For example, how to generate an overview if the data is not previously known to be hierarchical (ie. what to display, what to leave out if total number of data points exceeds the total available pixels). How does one provide detail-on-demand, PNZ, O+D, F+C? Is that task/data dependent? Explain how much do we know about the effectiveness and applicability of these visualization techniques, and how we contribute to existing knowledge by evaluating these techniques Theory General F+C Evaluations: lit review Text Graphical Evaluations: real-life tasks Tree (Adam/Dmitry study [CHI 2006] ++??)Adam/Dmitry studyCHI 2006 Time-series data/xy data (LGE study)LGE study Evaluations: human perceptual studies IT4 IT5 How: EDA Systems Apply study results to an application domain Tools for data analysis Line Graph Explorer (XY data)Line Graph Explorer Session Viewer (Web log data) Information retrieval/management music (MusicLand, MusicLand++ ??)MusicLandMusicLand++ (Evita [CPSC534a report])CPSC534a report Previous literature search on personal information management for the Memoplex projectpersonal information management for the Memoplex project InfoVis'06: paper rejected;paper rejected CHI 2007CHI 2007? VSSVSS, APGV06APGV06 Venue?: paperpaper AVI’06 Poster: InfoVis’05 CHI 2007 Visualization for EDA What: Data Explain what large means using the Agilent high through- put instrument data, and Google keynote study clickstream data Explain that the data sets are not only large, but growing due to the relative ease in data collection Explain the data itself: multivariate (?) with metadata (both continuous and categorical). Data may have hierarchical structure. In our examples, they are both time-series data. The instrument data is continuous (subject to sampling errors), while the clickstream data is discrete (by click events) What: EDA Explain what analysts have been doing to analyze such large datasets. Essentially, they have been using some sort of statistics (descriptive, clustering, factor analysis, PCA; Machine learning). Explain the pros (scalable) and cons of such approach (some are hypothesis instead of data-driven, ie. confirmatory rather than exploratory. As a result, it can be difficult to discover data pattern. No cross-checking with data regarding results?) Explain what EDA is, and how people explore data visually, and why is that needed in the analysis. Why: Visualization Explain how visualization can help by working with established statistical methods (EDA -> generate hypothesis -> checked with statistics/further analyze data with statistical methods -> visually verify analysis results -> visually display results for presentation/report) Explain existing EDA visualization for large data set (TJ, space-filling...) Other related documents and sites: Time line (Year 2) GPPF roadmap v.1 Other related documents and sites: Time line (Year 2) GPPF roadmap v.1