1 Powerset Viewer: A Datamining Application Jordan Lee.

Slides:



Advertisements
Similar presentations
Location Recognition Given: A query image A database of images with known locations Two types of approaches: Direct matching: directly match image features.
Advertisements

McGraw-Hill/Irwin ©2008 The McGraw-Hill Companies, All Rights Reserved TECHNOLOGY PLUG-IN T4 PROBLEM SOLVING USING EXCEL Goal Seek, Solver & Pivot Tables.
Minimizing Seed Set for Viral Marketing Cheng Long & Raymond Chi-Wing Wong Presented by: Cheng Long 20-August-2011.
TI: An Efficient Indexing Mechanism for Real-Time Search on Tweets Chun Chen 1, Feng Li 2, Beng Chin Ooi 2, and Sai Wu 2 1 Zhejiang University, 2 National.
Tutorial 10: Performing What-If Analyses
Architecture for Exploring Large Design Spaces John R. Josephson, B. Chandrasekaran, Mark Carroll, Naresh Iyer, Bryon Wasacz, Qingyuan Li, Giorgio Rizzoni,
Concepts of Database Management Seventh Edition
MOBILE BUDGET PLANNER & SHOPPING ASSISTANT Intuitive and on-the-go approach to budgeting Andrew Okoh Dr. Marko Popovic Kwan Hong Lee.
1 Powerset Explorer: A Datamining Application Jordan Lee.
Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.
Steven Lau Academic Solutions Specialist Microsoft.
© Steven Alter, 2007, all rights reserved Database concepts Difference between a database and the Internet Reason for having a defined data structure Relational.
Abstract There is an existing program that allows students to visualize geometric shapes that a hand-drawn illustration just can’t match. Professor Cervone.
Project Access Hope BIS 412 Systems Analysis and Design Applications Spring 2004 Project Developers: Philip Kelly Joseph Shaughnessy Daniel Vickers Project.
Object Tracking for Retrieval Application in MPEG-2 Lorenzo Favalli, Alessandro Mecocci, Fulvio Moschetti IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR.
RETAIL STORE MANAGEMENT ERP SOFTWARE BY COPY RIGHT BY
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
File System. NET+OS 6 File System Architecture Design Goals File System Layer Design Storage Services Layer Design RAM Services Layer Design Flash Services.
Microsoft ® Office Project Portfolio Server 2007.
1 ProActive performance evaluation with NAS benchmarks and optimization of OO SPMD Brian AmedroVladimir Bodnartchouk.
Sudoku Project: SDS Taryn, Jin, Jehsang, Phil and Matt.
Data Modeling using Entity Relationship Diagramming (ERD)
Tutorial 10: Performing What-If Analyses
Institute for Mathematical Modeling RAS 1 Visualization in distributed systems. Overview. Remote visualization means interactive viewing of three dimensional.
Rendering Adaptive Resolution Data Models Daniel Bolan Abstract For the past several years, a model for large datasets has been developed and extended.
1 Chapter 7 Query-By-Example by Monica Chan CS157B Professor Lee.
Module 6: Quantifying gaps and measuring coverage ILO, 2013.
Scalable Analysis of Distributed Workflow Traces Daniel K. Gunter and Brian Tierney Distributed Systems Department Lawrence Berkeley National Laboratory.
1 Scalable Exploratory Data Mining of Distributed Geoscientific Data Authors : E.C Shek, R.R Muntz, E. Mesrobian and K. Ng by Sona Srinivasan.
Unit 4 – Design and produce multimedia products AO1 – Review several existing multimedia products Mr Farmer.
An Efficient Approach to Learning Inhomogenous Gibbs Models Ziqiang Liu, Hong Chen, Heung-Yeung Shum Microsoft Research Asia CVPR 2003 Presented by Derek.
Dakota State University.
Evolutionary Art with Multiple Expression Programming By Quentin Freeman.
Tracker data quality monitoring based on event display M.S. Mennea – G. Zito University & INFN Bari - Italy.
Expert Systems with Applications 34 (2008) 459–468 Multi-level fuzzy mining with multiple minimum supports Yeong-Chyi Lee, Tzung-Pei Hong, Tien-Chin Wang.
The Fundamentals: Algorithms, the Integers & Matrices.
$aveZone Milestone 3 $aveZone Milestone 3 Fifth team: Dima Reshidko Oren Gafni Shiko Raboh.
 Review the principles of cost-volume-profit relationships  Discuss Excel what-if analysis tools 2.
$aveZone Milestone 2 - Update $aveZone Milestone 2 - Update Fifth team: Dima Reshidko Oren Gafni Shiko Raboh Harel Cohen.
Rainbow: XML and Relational Database Design, Implementation, Test, and Evaluation Project Members: Tien Vu, Mirek Cymer, John Lee Advisor:
Powerpoint Templates Page 1 Powerpoint Templates Scalable Text Classification with Sparse Generative Modeling Antti PuurulaWaikato University.
Indexes and Views Unit 7.
VizDB A tool to support Exploration of large databases By using Human Visual System To analyze mid-size to large data.
AESuniversity NPI Reporting.  Session Overview  Terms and Definitions  Where It Fits Within MIS  Setup Process  User Process  Questions & Answers.
1 Circuitscape Design Review Presentation Team Circuitscape Mike Schulte Sean Collins Katie Rankin Carl Reniker.
Client-Server Paradise ICOM 8015 Distributed Databases.
MySQL and GRID status Gabriele Carcassi 9 September 2002.
Written by Changhyun, SON Chapter 5. Introduction to Design Optimization - 1 PART II Design Optimization.
1 An Arc-Path Model for OSPF Weight Setting Problem Dr.Jeffery Kennington Anusha Madhavan.
Minesweeper Solver Marina Maznikova Artificial Intelligence II Fall 2011.
Bitwise Sort By Matt Hannon. What is Bitwise Sort It is an algorithm that works with the individual bits of each entry in order to place them in groups.
CFR 250/590 Introduction to GIS, Autumn 1999 Raster Analysis I © Phil Hurvitz, raster1.ppt 1  Overview Grid themes Setting grid theme and analysis.
Getting Control GIS Projects Can “Spiral Out of Control” –Can’t find the data –Analysis can’t be reproduced –Takes too long to complete –Maps don’t match.
DATABASE CONTROLS Chapter 14. Access Controls Discretionary Access Controls Discretionary Access Controls Types of Restrictions : 1. Name-dependent restrictions.
1 Circuitscape Capstone Presentation Team Circuitscape Katie Rankin Mike Schulte Carl Reniker Sean Collins.
Description and exemplification use of a Data Dictionary. A data dictionary is a catalogue of all data items in a system. The data dictionary stores details.
THREE DIMENSIONAL-PALLET LOADING PROBLEM BY ABDULRHMAN AL-OTAIBI.
CEG 2400 FALL 2012 Windows Servers Network Operating Systems.
Maximizing Performance – Why is the disk subsystem crucial to console performance and what’s the best disk configuration. Extending Performance – How.
CO – CART Project Status Protocol Revision Subcommittee Update 08/17/2006.
5. CSA data mining: Alpha version Only key datasets Some combined Time periods returned Combined with time CSA profiles All data used will be downloadable.
Develop Schedule is the Process of analyzing activity sequences, durations, resource requirements, and schedule constraints to create the project schedule.
Visual Annotation of Clinically Important Anatomical Landmarks for VitreoRetinal Surgery Project Update Vincent Ng.
ADAMS/VIEW INTERFACE OVERVIEW
ADAMS/VIEW INTERFACE OVERVIEW
Potter’s Wheel: An Interactive Data Cleaning System
Microsoft Office Illustrated
COmmon Reference Environment
EMIS 8374 Arc-Path Formulation Updated 2 April 2008
Count on 2 (Over the bridge)
Presentation transcript:

1 Powerset Viewer: A Datamining Application Jordan Lee

2 Update

3 Completed Tools and Features – And relevant GUI widgets

4 Update Completed Tools and Features – And relevant GUI widgets Implemented animation between zoom states and automatic zooming

5 Update Completed Tools and Features – And relevant GUI widgets Implemented animation between zoom states and automatic zooming Increased alphabet size from 14 to 30 – Optimized calculations

6 Update Completed Tools and Features – And relevant GUI widgets Implemented animation between zoom states and automatic zooming Increased alphabet size from 14 to 30 – Optimized calculations Increased alphabet size from 30 to 45 – Realized set cardinality is, in practice, low – Using max set size of 10

7 Milestones Status Update #1Completion of the basic visualization of a randomized database of small set size (~10)

8 Milestones Status Update #1Completion of the basic visualization of a randomized database of small set size (~10) #2Addition of a single level of “marking”. #3Addition of multiple levels of “marking” (6) #4Addition of background marking to demarcate areas of sets containing different amounts of items.

9 Milestones Status Update #1Completion of the basic visualization of a randomized database of small set size (~10) #2Addition of a single level of “marking”. #3Addition of multiple levels of “marking” (6) #4Addition of background marking to demarcate areas of sets containing different amounts of items. #5Implement multiple constraints

10 Milestones Status Update #1Completion of the basic visualization of a randomized database of small set size (~10) #2Addition of a single level of “marking”. #3Addition of multiple levels of “marking” (6) #4Addition of background marking to demarcate areas of sets containing different amounts of items. #5Implement multiple constraints #6Increase maximum possible dataset size to at least 100.

11 Difficulties BigInteger solution to increase maximum alphabet caused massive slow-down – Recall: required BigIntegers to support > 30 alphabet size – Solution: redesign keys to use integers and create a bridge to map integers to BigInteger positions

12 BEFORE BRIDGE Incoming Set (Position = 982) Success! Incoming Set (Position = 2^32 + 1) CRASH! – Integer too large

13 AFTER BRIDGE Incoming Set (Position = 982) – Encode to Key #1 Success! Incoming Set (Position = 2^32 + 1) – Encode to Key #2 Success! Incoming Set (Position = arbitrarily large) – Encode to Key #3 Success!

14 Difficulties BigInteger solution to increase maximum alphabet caused massive slow-down – Recall: required BigIntegers to support > 30 alphabet size – Solution: redesign keys to use integers and create a bridge to map integers to BigInteger positions Expensive initial costs Grid size limited by integer restrictions – Solution: create grid on the fly

15 Benchmarks Low Cardinality First MEMORY (MB)SET COUNT 7610M 751M 74100, , ,000

16 Figure: Low Cardinality (10000 sets) 73 MB

17 Benchmarks (cont’d) Random Generated MEMORY (MB)SET COUNT

18 Figure: Random (176 sets) 71 MB

19 Questions and Comments