Scalability for High Cardinality in Steerable MDS CPSC 533C Project Presentation Allan Rempel December 19, 2005.

Scalability for High Cardinality in Steerable MDS CPSC 533C Project Presentation Allan Rempel December 19, 2005

Scalability Ability to run on very large data sets Both theoretical and practical efficiency Both time and space efficiency Effective use of resources – memory, disk, db  Avoid waste and thrashing Cardinality and dimensionality

Review - multidimensional scaling (MDS) Display multivariate abstract point data in 2D  Data from bioinformatics, financial sector, etc.  No inherent mapping in 2D space  p-dim embedding of q-dim space (p < q) where inter-object relationships are approximated in low-dimensional space Proximity in high-D -> proximity in 2D  High-dim distance between points (similarity) determines relative (x,y) position  Absolute (x,y) positions are not meaningful Want to see clusters, curves, etc.  Features that stand out from the noise

Review - multidimensional scaling (MDS) Refinement of algorithms  O(N 3 ), O(N 2 ), O(N log N)  1996, 2002, 2003, 2004  Chalmers, Morrison, Ross, Jourdan, Melançon Progressive steerable MDS  Munzner et al, 2004, 2005

MDSteer++ MDS visualization system developed by T. Munzner, M. Tory, D. Westrom, M. Williams C++ with Qt GUI, OpenGL Runs on Windows and Linux Test platform – P-III, 256 MB, Linux

MDSteer++

MDSteer++ limitations and proposal Theoretically, handle 300+ dimensions and 1,000,000+ points Practically, limited by system memory  ~100,000 points calculating distance on the fly  ~10,000 points with precomputed distance matrix  ~5000 points exhausted memory on test machine Big Question: Can we raise these limits by using a database? At what cost?

Scaling issues Time complexity matters – Chalmers Time complexity doesn’t matter - Jourdan Lots of things matter  Actual vs. perceived speed – progressiveness  What are the limits of the hardware?

Experimental results – Chalmers et al 3-D data sets: 5000 – 50,000 points 13-D data sets: 2000 – 24,000 points Took less than 1/3 the time of the O(N 2 ) Achieved lower stress when done Also compared against original O(N 3 ) model  9 seconds vs. 577; and 24 vs. 3642  Achieved much lower stress (0.06 vs. 0.2)

Experimental results – Chalmers et al

Comparison – Jourdan & Melançon Chalmers et al is better for N < 5500 Main diff is in parent-finding, represented by Fig. 3

Comparison – Jourdan & Melançon Experimental study confirms theoretical results This technique becomes better for N > 70,000

Multiscale MDS – Jourdan & Melançon Recursively defining the initial kernel set of points can yield much better real-time performance Same time complexity, but much faster execution

Experimental results Memory usage with/without 5000 node MDS run

Offloading distance matrix Step 1: Write to file instead of database  Database uses files anyway – save db overhead  Use /tmp for high speed/capacity – 3GB available  Similar speed and memory usage as before… Probably because file is being cached Probably similar scenario to memory being swapped  … until 2000 of 5000 points placed – thrashing  2000 points = 16 MB; 5000 points = 100 MB

Offloading distance matrix Step 2: Write to database  Offload some work to database server  Tried to connect to db over network  Tried to run MDSteer++ on db server  Tried to modify MDSteer++ for MDS only, no GUI  Suspect that database overhead, network comm., NFS vs. local disks would cause significant slowdown

MDSteer++ code modifications Build and run on Linux Fixed some bugs Class to handle file-based matrix Class to handle database-based matrix Restructured code to accomodate modes Partial separation of GUI from MDS back end

Future work Use.ui files for GUI specification Separate MDS from GUI functionality  What does MDSteer++ do if user not interactive? Continue examination of database use Library – provide hooks for use in other s/w Not require recompile for diff modes Clean compile w/o warnings Cross-platform with single code base

Scalability for High Cardinality in Steerable MDS CPSC 533C Project Presentation Allan Rempel December 19, 2005.

Similar presentations

Presentation on theme: "Scalability for High Cardinality in Steerable MDS CPSC 533C Project Presentation Allan Rempel December 19, 2005."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Scalability for High Cardinality in Steerable MDS CPSC 533C Project Presentation Allan Rempel December 19, 2005.

Similar presentations

Presentation on theme: "Scalability for High Cardinality in Steerable MDS CPSC 533C Project Presentation Allan Rempel December 19, 2005."— Presentation transcript:

Similar presentations

About project

Feedback