Developing a Visual Analytics Approach to Analytic Problem- Solving William Ribarsky UNC Charlotte.

Slides:



Advertisements
Similar presentations
1 Senn, Information Technology, 3 rd Edition © 2004 Pearson Prentice Hall James A. Senns Information Technology, 3 rd Edition Chapter 7 Enterprise Databases.
Advertisements

Data Mining and the Web Susan Dumais Microsoft Research KDD97 Panel - Aug 17, 1997.
Taxonomy & Ontology Impact on Search Infrastructure John R. McGrath Sr. Director, Fast Search & Transfer.
Introduction Lesson 1 Microsoft Office 2010 and the Internet
Chapter 10: Designing Databases
XProtect ® Professional Efficient solutions for mid-sized installations.
Fostering Learners’ Collaborative Problem Solving with RiverWeb Roger Azevedo University of Maryland Mary Ellen Verona Maryland Virtual High School Jennifer.
By: Mr Hashem Alaidaros MIS 211 Lecture 4 Title: Data Base Management System.
VALTChessVA IntroAppsWrap-up 1/25 User-Centric Visual Analytics Remco Chang Tufts University Department of Computer Science.
Text mining Extract from various presentations: Temis, URI-INIST-CNRS, Aster Data …
WireVis Visualization of Categorical, Time-Varying Data From Financial Transactions Remco Chang, Mohammad Ghoniem, Robert Kosara, Bill Ribarsky, Jing Yang,
Research to Reality William Ribarsky Remco Chang University of North Carolina at Charlotte.
1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Introduction to Data Mining Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential.
Live Re-orderable Accordion Drawing (LiveRAC) Peter McLachlan, Tamara Munzner Eleftherios Koutsofios, Stephen North AT&T Research Symposium August, 2007.
Supervised by Prof. LYU, Rung Tsong Michael Department of Computer Science & Engineering The Chinese University of Hong Kong Prepared by: Chan Pik Wah,
Data Mining.
Chapter 14 The Second Component: The Database.
BUSINESS DRIVEN TECHNOLOGY
Data Mining – Intro.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Huimin Ye.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Drew DeHaas.
 MODERN DATABASE MANAGEMENT SYSTEMS OVERVIEW BY ENGINEER BILAL AHMAD
DASHBOARDS Dashboard provides the managers with exactly the information they need in the correct format at the correct time. BI systems are the foundation.
Overview of Search Engines
WHT/ HPCC Systems Flavio Villanustre VP, Products and Infrastructure HPCC Systems Risk Solutions.
Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization.
Data Mining Techniques
The SEASR project and its Meandre infrastructure are sponsored by The Andrew W. Mellon Foundation SEASR Overview Loretta Auvil and Bernie Acs National.
Data Mining Chun-Hung Chou
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
Data Management Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition.
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
Fundamentals of Information Systems, Fifth Edition
1 Xiaoyu Wang UNC Charlotte Erin Miller START Center, U. Maryland Kathleen Smarick START Center, U Maryland William Ribarsky UNC Charlotte Remco Chang.
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang 5-1 Chapter 5 Business Intelligence: Data.
David S. Ebert David S. Ebert Visual Analytics to Enable Discovery and Decision Making: Potential, Challenges, and.
1 / 14 Integrated Visual Analysis of Global Terrorism Remco Chang Charlotte Visualization Center UNC Charlotte.
VISUAL ANALYTICS: VISUAL EXPLORATION, ANALYSIS, AND PRESENTATION OF LARGE COMPLEX DATA Remco Chang, PhD (Charlotte Visualization Center) (Tufts University)
VALTVA IntroAppsWrap-up 1/34 User-Centric Visual Analytics Remco Chang Tufts University Department of Computer Science.
C6 Databases. 2 Traditional file environment Data Redundancy and Inconsistency: –Data redundancy: The presence of duplicate data in multiple data files.
1 / 17 Visualization of GTD and Multimedia Remco Chang Charlotte Visualization Center UNC Charlotte.
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
Introduction – Addressing Business Challenges Microsoft® Business Intelligence Solutions.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
Integration of Visual Analytics and Discrete Sciences to COEs William Ribarsky Remco Chang University of North Carolina at Charlotte.
Data and Applications Security Developments and Directions Dr. Bhavani Thuraisingham The University of Texas at Dallas Lecture #15 Secure Multimedia Data.
MIS2502: Data Analytics Advanced Analytics - Introduction.
Integrated Visual Analysis of Global Terrorism
Achieving Semantic Interoperability at the World Bank Designing the Information Architecture and Programmatically Processing Information Denise Bedford.
Discuss how researchers analyze data obtained in observational research.
IntroGoalCrowdPredictionWrap-up 1/26 Learning Debugging and Hacking the User Remco Chang Assistant Professor Tufts University.
Chapter 3 Building Business Intelligence Chapter 3 DATABASES AND DATA WAREHOUSES Building Business Intelligence 6/22/2016 1Management Information Systems.
Managing Data Resources File Organization and databases for business information systems.
Data mining in web applications
Data Mining – Intro.
MIS2502: Data Analytics Advanced Analytics - Introduction
Automatic Video Shot Detection from MPEG Bit Stream
Personalized Social Image Recommendation
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Big Data Visual Analytics: Challenges and Opportunities
Data Warehousing and Data Mining
Introduction to Visual Analytics
Web Mining Department of Computer Science and Engg.
About Thetus Thetus develops knowledge discovery and modeling infrastructure software for customers who: Have high value data that does not neatly fit.
Evaluation.
Data Warehousing Data Mining Privacy
PolyAnalyst™ text mining tool Allstate Insurance example
Presentation transcript:

Developing a Visual Analytics Approach to Analytic Problem- Solving William Ribarsky UNC Charlotte

Two Key Statements The purpose of visualization is insight (and practical knowledge building) not pictures. Visual analytics is the integration of interactive visualization and analyses to solve complex reasoning problems.

The Future Environment: The Data Problem & the Complexity Problem The amount of data generated or observed will continue to outstrip the ability to analyze it in a deep way. The amount of data will continue to outstrip the ability to store it comprehensively. Comprehensive data sharing will become more and more difficult. Databases and warehouses are becoming opaque. Simulations and models will become even more complex and integrated.

The process cannot be entirely automated. Providing meaning and direction in the analysis process requires human involvement. o Data, simulations, and simulation results are becoming so complex and large that their content is not completely knowable. They must be probed, explored, discovered. Humans (and many times expert humans) are a very expensive and/or limited resource. So, a significant aspect of the data and complexity problems is how to involve the human in an intimate partnership with the computer even when the problem becomes very complex and large. Yet… The Future Environment: The Data Problem & the Complexity Problem

What Can Visual Analytics Provide? It provides a human-centered approach to attack the human reasoning bottleneck. Visual analytics provides an approach that starts from integration of computer-based analysis methods and interactive visualization to support: Reasoning and evidence gathering at scale Exploration in context and uncovering of unforeseen relationships. Insight discovery. A main goal of visual analytics over the next 5-10 years will be to begin attacking the data and complexity problems and resolving the human reasoning bottleneck.

Financial Transaction Data Financial transactional data warehouses for large banks are very big (billions of records over many years). -Knowing what to query for is a big problem. No transaction, by itself, is risky or fraudulent. Although data records tend to be structured or semi- structured, items can be missing, mis-categorized, have spelling or abbreviation variations, etc. There may be unstructured free text that can be valuable.

Size –More than 200,000 transactions per day No transaction by itself is suspicious Lack of International Wire Standard –Loosely structured data with inherent ambiguity Indonesia Charlotte, NC Singapore London Challenges with Wire Fraud Detection (Bank of America Example)

No Standard Form… –When a wire leaves Bank of America in Charlotte… –The recipient can appear as if receiving at London, Indonesia or Singapore Vice versa, if receiving from Indonesia to Charlotte –The sender can appear as if originating from London, Singapore, or Indonesia Indonesia Charlotte, NC Singapore London Challenges with Wire Fraud Detection

WireVis: Financial Transaction Analysis This work is supported by Bank of America and DHS. (Significantly wider deployment to other banks and financial analysts now under discussion.) Current practice has been to do database queries filtered by keywords, amounts, date, etc. and investigate using spreadsheets. This process is inadequate and inefficient because patterns of interest (e.g., fraud or risk) will change in unpredictable ways, it is difficult to be exploratory using query methods (especially for very large transactional databases), and analysts cannot see patterns over longer time periods.

The Pipeline for Financial Anomaly Analysis Identify Prioritize Investigate Report All transaction activity Interactive Visualization Google

WireVis: Using Keywords Keywords… –Words that are used to filter all transactions Only transactions containing keywords are flagged –Highly secretive –Typically include Geographical information (country, city names) Business types Specific goods and services Etc –Updated based on intelligence reports –Ranges from words –Could reduce the number of transactions by up to 90% –Most importantly, gives useful meaning (label) to each transaction

WireVis: Financial Transaction Analysis System Overview Heatmap View (Accounts to Keywords Relationship) Strings and Beads (Relationships over Time) Search by Example (Find Similar Accounts) Keyword Network (Keyword Relationships) For full projects and publications, go to Work by Remco Chang et al.

Scalability –We have connected to the data warehouse at Bank of America with millions of records, for wire transactions alone, over the course of a rolling year (13 months). –Connecting to a database makes interactive visualization tricky. Unexpected Results (Access through the VA interface!) –go to where the data is – operations relating to the data are pushed onto the database (e.g, clustering). Database Raw Data Stored Procedure Temp Tables SQL JDBC WireVis Client WireVis: Integrated with Full Transaction Database

Performance Measurements –Data-driven operations such as re-clustering, drilldown, transaction search by keywords require worst case of 1-2 minutes. –All other interactions remain real time No pre-computation / caching Single CPU desktop computer WireVis is in deployment with James Prices and the WireWatch team for testing and evaluation. It is the foundation for substantial new project on risk analysis. WireVis: Integrated with Full Transaction Database

WireVis is a general tool. Though it was developed to investigate money-laundering and fraud, it can be applied to everything from risk analysis to financial business intelligence. WireViss power is due to: –Contextualizing in terms that are meaningful to the analyst. The context may be in terms keywords that encapsulate knowledge or tradecraft, specific procedures that describe types of transactions, or some other way. –Organizing and discriminating among data using MDS, discriminating cluster analysis, filtering based on keywords, and other methods (but all based on the cognitive or conceptual space of the analysts). –Supporting highly interactive exploration from overview to particular case. Some General Conclusions

Multimedia: Automated Video Content Analysis Work by Jianping Fan et al.

Audio and Video Analysis: Story Boundary Detection Multimedia: Automated Video Content Analysis

News Topic Detection: Video Analysis Video Scene Understanding and Search by Example

News Interestingness Prediction News Story Collection User Preference Usage History Predictor Set of news stories Interestingness Multimedia: Automated Video Content Analysis Result: analysis can automatically find news (or potentially other content) in unstructured media regardless of language.

EventRiver: Determining Events An event is an occurrence that happens at a specific time and draws continuous attention. Events are derived from a cluster of multimedia documents that have closely related content and coincide in time. Events are characterized by the semantics of their related documents, namely a group of interrelated significant keywords summarizing the major themes in the cluster, and the temporal information describing how the cluster strength changes over time.. Work by Jing Yang et al.

EventRiver - Visually Exploring Broadcast News Videos The figure shows major CNN news from August 1 to 24 in 2006 (right) and a shoebox for examining an event in details (left). Features: Automatic incremental event extraction, Event browsing and inspection A rich set of navigation, search, and analysis tools.

EventRiver EventRiver Exploration and Filtering Search by Example

50 RSS News Feeds featuring the US Presidential Election in 2008 (10/9/2008 – 11/8/2008) Sentiment Analysis on RSS Feeds Work by Daniel Keim and his team

EventRiver: Expanded Capabilities Geographic/Temporal Entity Extraction Comparative Event Trend Analysis Sentiment Analysis 24

A Data Model for News Streams Joint work between the U. Kontanz and UNC Charlotte teams 25

A Data Model for News Streams A (bursty)Event: temporal divided portions of a story based on time series analysis of the statistics of clustered news. Event A A B B E E C C D D A News Story Date Cluster Size 26

Are there any correlations between Story 1 and Story 2 ? A Data Model for News Streams Story 1 Story 2Story n …… Clustered News …… Clustered News are local, missing temporal information 27

Are there any correlations between Story 1 and Story 2 ? A Data Model for News Streams Story 1 Story 2Story n …… Clustered News …… Events contain both Semantic and temporal information; act like routers to connect different news stories E E E E E E E E E E E E E E E E 28

JRC European Media Monitor News Stream monitoring about 4000 sources from 1600 portal in 43 languages geo-tagged multilingual clustered (event detection) and categorized extracted entities Work by Daniel Keim and his team

What is a Probe? Pair consisting of: - Region-of-Interest - Coordinated Visualization & Some visual connection Rendered directly within the main visualization Can be directly interacted with Powerful in multiples

Why Probes? More massive simulations –Computer experiments, requiring experimental probing of data collection & exploration of the simulation space. Massive observational networks –Again, must be probed experimentally.

UrbanVis, Before Work by Tom Butkiewicz, Remco Chang et al.

UrbanVis, After

Multitouch ProbeVis

Large scale urban land use simulation Difficult to see & understand details in context Difficult to compare & understand trends in different areas

Evaluation Learning-based Evaluation Describe and measure knowledge gain and insights discovered. Must separate out 3 types of learning: about the system, the data, and the cognitive task(s) at hand. New evaluation strategies and results have emerged.

A Few Words about Knowledge and Insight…. Knowledge is compact. Knowledge begets knowledge. Knowledge is flexible, reusable, and generalizable. There are two types of insight –Spontaneous insight –Knowledge-building insight

Long-Term Research Goals Establish design principles for visual analytics systems. Develop a predictive human cognitive model. Create a theory of interaction. Develop a process for evaluation of exploratory, investigative, insight discovery, and knowledge-building systems. Successfully attack large, complex real-world problems.

Questions?