1 Powerset Explorer: A Datamining Application Jordan Lee.

Slides:



Advertisements
Similar presentations
Reporting Status or Progress. Define the Subject n Break the subject into areas of discussion n List the main subject components here.
Advertisements

Analysis of Algorithms: time & space Dr. Jeyakesavan Veerasamy The University of Texas at Dallas, USA.
© 2006 Pearson Addison-Wesley. All rights reserved10 A-1 Chapter 10 Algorithm Efficiency and Sorting.
Transactions and Recovery Checkpointing Souhad Daraghma.
Scalable Visualization with Accordion Drawing Tamara Munzner University of British Columbia Department of Computer Science joint work with James Slack,
Computer science is a field of study that deals with solving a variety of problems by using computers. To solve a given problem by using computers, you.
Mani-CS34311 CS3431 – Database Systems I Introduction Instructor: Murali Mani
Introduction to Analysis of Algorithms
Operating Systems CMPSCI 377 Lecture 11: Memory Management
© 2006 Pearson Addison-Wesley. All rights reserved10-1 Chapter 10 Algorithm Efficiency and Sorting CS102 Sections 51 and 52 Marc Smith and Jim Ten Eyck.
Explorations in Image Partition Encoding Sameer Agarwal Department of Computer Science and Engineering University of California, San Diego.
1 Powerset Viewer: A Datamining Application Jordan Lee.
Introduction to computational genomics – hands on course Gene expression (Gasch et al) Unit 1: Mapper Unit 2: Aggregator and peak finder Solexa MNase Reads.
© 2006 Pearson Addison-Wesley. All rights reserved10 A-1 Chapter 10 Algorithm Efficiency and Sorting.
Knight’s Tour Distributed Problem Solving Knight’s Tour Yoav Kasorla Izhaq Shohat.
Coursenotes CS3114: Data Structures and Algorithms Clifford A. Shaffer Yang Cao Department of Computer Science Virginia Tech Copyright ©
Introduction - The Need for Data Structures Data structures organize data –This gives more efficient programs. More powerful computers encourage more complex.
IE 607 Constrained Design: Using Constraints to Advantage in Adaptive Optimization in Manufacturing.
Introduction to Database Systems Motivation Irvanizam Zamanhuri, M.Sc Computer Science Study Program Syiah Kuala University Website:
Integrating Fine-Grained Application Adaptation with Global Adaptation for Saving Energy Vibhore Vardhan, Daniel G. Sachs, Wanghong Yuan, Albert F. Harris,
Algorithm Analysis. Algorithm An algorithm is a clearly specified set of instructions which, when followed, solves a problem. recipes directions for putting.
Campus Day, Week 0 January 14, Looking Ahead Enrollment update Associate Dean search BIT meetings Shelter in place drills and door locks.
SmartStruxure™ Lite Solution
This is Step 2 of a Mishap Notification that gathers expanded information and shall be completed within 48 hours of the mishap occurrence. Information.
DEVELOPING A FRAMEWORK FOR PROBLEM SOLVING COMPUTER COACHES Evan Frodermann 1, Qing (Xu) Ryan 1, Kristin Crouse 1, Ken Heller 1, Leon Hsu 2, Bijaya Aryal.
Scalable Analysis of Distributed Workflow Traces Daniel K. Gunter and Brian Tierney Distributed Systems Department Lawrence Berkeley National Laboratory.
Introduction  Client/Server technology is seen by many as the solution to the difficulty of linking together the various departments of corporation.
COMP 103 Hashing 2013-T2 Lecture 28 Thomas Kuehne School of Engineering and Computer Science, Victoria University of Wellington  Marcus Frean, Lindsay.
1 Wenguang WangRichard B. Bunt Department of Computer Science University of Saskatchewan November 14, 2000 Simulating DB2 Buffer Pool Management.
© 2011 Pearson Addison-Wesley. All rights reserved 10 A-1 Chapter 10 Algorithm Efficiency and Sorting.
CSC 230: C and Software Tools Rudra Dutta Computer Science Department Course Introduction.
111 © 2002, Cisco Systems, Inc. All rights reserved.
1 CSE 326: Data Structures: Hash Tables Lecture 12: Monday, Feb 3, 2003.
Framework for MDO Studies Amitay Isaacs Center for Aerospace System Design and Engineering IIT Bombay.
CSE 241 Computer Engineering (1) هندسة الحاسبات (1) Lecture #3 Ch. 6 Memory System Design Dr. Tamer Samy Gaafar Dept. of Computer & Systems Engineering.
Geoff Holmes and Bernhard Pfahringer COMP206-08S General Programming 2.
CS 206 Introduction to Computer Science II 09 / 18 / 2009 Instructor: Michael Eckmann.
2015-T2 Lecture 30 School of Engineering and Computer Science, Victoria University of Wellington  Lindsay Groves, Marcus Frean, Peter Andreae, and Thomas.
More DFDs Class 12.
Spring 2016 COMP 2300 Discrete Structures for Computation Donghyun (David) Kim Department of Mathematics and Physics North Carolina Central University.
Onlinedeeneislam.blogspot.com1 Design and Analysis of Algorithms Slide # 1 Download From
BITS Pilani Pilani Campus Data Structure and Algorithms Design Dr. Maheswari Karthikeyan Lecture1.
CS223: Software Engineering Lecture 19: Unit Testing.
DBS Monitor and DAN CD Projects Report July 9, 2003.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Database Management Systems Chapter 1.
Representing Tabular Data in Alfresco Share “Smooth Like Butter” Gary Cox Blue Fish Development Group.
1 CENG 351 CENG 351 Introduction to Data Management and File Structures Department of Computer Engineering METU.
Class Scheduling Using Constraint Satisfaction Victoria Donelson Garrett Grimsley.
BUFFALO: Bloom Filter Forwarding Architecture for Large Organizations Minlan Yu Princeton University Joint work with Alex Fabrikant,
Mohammed I DAABO COURSE CODE: CSC 355 COURSE TITLE: Data Structures.
Brute Force II.
FAST STUDENT Your Chance to Learn!. FAST STUDENT Your Chance to Learn!
Algorithm Efficiency and Sorting
CPSC-310 Database Systems
In quest of the operational database for real-time environmental monitoring and early warning systems Bartosz Baliś, Marian Bubak, Daniel Harezlak, Piotr.
CS422 Principles of Database Systems Course Overview
Phil Tayco Slide version 1.0 Created Nov. 26, 2017
Bin Packing Optimization
Unit Test Pattern.
Automation in IMS Can it help the shrinking talent pool
COT 5611 Operating Systems Design Principles Spring 2014
Mathematics Common Math Approaches
EECE 310 Software Engineering
Data Structures and Algorithms
Arrays .
Visual Studio 2010 SharePoint Development Tools Overview
Chapter 22, Part
Database administration
Alex Bolsoy, Jonathan Suggs, Casey Wenner
Presentation transcript:

1 Powerset Explorer: A Datamining Application Jordan Lee

2 Background

3 PAST – Datamining accomplished with human intuition

4 Background PAST – Datamining accomplished with human intuition PRESENT – Computer aided with AI and brute force CPU cycles

5 Background PAST – Datamining accomplished with human intuition PRESENT – Computer aided with AI and brute force CPU cycles FUTURE – Enter PowersetViewer….

6 Dataset

7 Alphabet – Items that can be found in transactions – Eg. Apples, bread, chips

8 Dataset Alphabet – Items that can be found in transactions – Eg. Apples, bread, chips Transaction – Sets of items (unordered) – Eg.Tx1 = { Apples, Chips } – Eg.Tx2 = { Bread }

9 Dataset Alphabet – Items that can be found in transactions – Eg. Apples, bread, chips Transaction – Sets of items (unordered) – Eg.Tx1 = { Apples, Chips } – Eg.Tx2 = { Bread } Transaction database – Collection of transactions (unordered, possibly repetitive) – Eg. Walmart transaction logs

10 Example Dataset Student enrollment database

11 Example Dataset Student enrollment database – Alphabet = courses { CPSC124, CPSC126, PHIL120, ANTH100, ENGL112 }

12 Example Dataset Student enrollment database – Alphabet = courses { CPSC124, CPSC126, PHIL120, ANTH100, ENGL112 } – Transaction = courses student is enrolled in # > { CPSC 124, PHIL120, ENGL112 }

13 Example Dataset Student enrollment database – Alphabet = courses { CPSC124, CPSC126, PHIL120, ANTH100, ENGL112 } – Transaction = courses student is enrolled in # > { CPSC 124, PHIL120, ENGL112 } – Transaction DB = list of student course schedules

14 Example Dataset (cont’d)

15 Why? Why is this interesting?

16 Why? Why is this interesting? – Consumer transaction logs -> trends in consumer buying

17 Why? Why is this interesting? – Consumer transaction logs -> trends in consumer buying – Student enrollment database -> trends in enrollment What electives do most undergrad computer science students take? Departments can determine which joint majors would fit the student population.

18 Why? (cont’d) Dataset sizes growing exponentially

19 Why? (cont’d) Dataset sizes growing exponentially – Human intuition has reached its limits

20 Why? (cont’d) Dataset sizes growing exponentially – Human intuition has reached its limits – Require computers and AI (expensive)

21 Why? (cont’d) Dataset sizes growing exponentially – Human intuition has reached its limits – Require computers and AI (expensive) – Information visualization can scale the power of human intuition

22 Powerset Explorer Code base from TreeJuxtaposer (Munzner) – AccordianDrawer package

TreeJuxtaposer

24 Powerset Explorer Code base from TreeJuxtaposer (Munzner) – AccordianDrawer package Goals

25 Powerset Explorer Code base from TreeJuxtaposer (Munzner) – AccordianDrawer package Goals – Focus + context exploration using grids

26 Powerset Explorer Code base from TreeJuxtaposer (Munzner) – AccordianDrawer package Goals – Focus + context exploration using grids – Guaranteed visibility

27 Powerset Explorer Code base from TreeJuxtaposer (Munzner) – AccordianDrawer package Goals – Focus + context exploration using grids – Guaranteed visibility – Marking of groups

28 Milestones Status Update

29 Milestones Status Update #1Completion of the basic visualization of a randomized database of small set size (~10)

30 Milestones Status Update #1Completion of the basic visualization of a randomized database of small set size (~10) #2Addition of a single level of “marking”.

31 Milestones Status Update #1Completion of the basic visualization of a randomized database of small set size (~10) #2Addition of a single level of “marking”. #3Addition of multiple levels of “marking” (6)

32 Milestones Status Update #1Completion of the basic visualization of a randomized database of small set size (~10) #2Addition of a single level of “marking”. #3Addition of multiple levels of “marking” (6) #4Addition of background marking to demarcate areas of sets containing different amounts of items.

33 Milestones Status Update #1Completion of the basic visualization of a randomized database of small set size (~10) #2Addition of a single level of “marking”. #3Addition of multiple levels of “marking” (6) #4Addition of background marking to demarcate areas of sets containing different amounts of items. #5Implement multiple constraints

34 Milestones Status Update #1Completion of the basic visualization of a randomized database of small set size (~10) #2Addition of a single level of “marking”. #3Addition of multiple levels of “marking” (6) #4Addition of background marking to demarcate areas of sets containing different amounts of items. #5Implement multiple constraints #6Increase maximum possible dataset size to at least 100.

35 Difficulties

36 Difficulties Multiple constraints difficult to implement on current server-side dataminer

37 Difficulties Multiple constraints difficult to implement on current server-side dataminer Can not enumerate a powerset of alphabet size greater than 14 elements (integer = 32 bits) – Solution: use java class BigInteger

38 Difficulties Multiple constraints difficult to implement on current server-side dataminer Can not enumerate a powerset of alphabet size greater than 14 elements (integer = 32 bits) – Solution: use java class BigInteger High CPU and memory usage – Solultion: upgrade computer!  hack

39 Current Status Reduced database

40 Current Status Property file – 0 CPSC PHIL ANTH MATH PSYC ENGL APSC MECH STAT SPAN FREN ECON LING EECE

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67 Questions?