Uniform Distributions on Integer Matrices. The Problem  How many ways can I fill in a matrix with specified row and column sums? ???2 ???2 ???3 223.

Slides:



Advertisements
Similar presentations
Hash Tables and Sets Lecture 3. Sets A set is simply a collection of elements Unlike lists, elements are not ordered Very abstract, general concept with.
Advertisements

Recursion Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Introduction to Matlab
1 Using Blind Search and Formal Concepts for Binary Factor Analysis Aleš Keprt
Analysis of Algorithms
CSCI-455/552 Introduction to High Performance Computing Lecture 11.
Partitioning and Divide-and-Conquer Strategies ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson, Jan 23, 2013.
Parallel Strategies Partitioning consists of the following steps –Divide the problem into parts –Compute each part separately –Merge the results Divide.
Advanced Topics in Algorithms and Data Structures Lecture 6.1 – pg 1 An overview of lecture 6 A parallel search algorithm A parallel merging algorithm.
1 2 Extreme Pathway Lengths and Reaction Participation in Genome Scale Metabolic Networks Jason A. Papin, Nathan D. Price and Bernhard Ø. Palsson.
Quick Review of Apr 15 material Overflow –definition, why it happens –solutions: chaining, double hashing Hash file performance –loading factor –search.
I/O Systems and Storage Systems May 22, 2000 Instructor: Gary Kimura.
Dynamic Programming Introduction to Algorithms Dynamic Programming CSE 680 Prof. Roger Crawfis.
Binary Number Systems.
The Binary Number System
Data Representation Number Systems.
(c) University of Washingtonhashing-1 CSC 143 Java Hashing Set Implementation via Hashing.
Introduction to Parallel Programming MapReduce Except where otherwise noted all portions of this work are Copyright (c) 2007 Google and are licensed under.
JPEG C OMPRESSION A LGORITHM I N CUDA Group Members: Pranit Patel Manisha Tatikonda Jeff Wong Jarek Marczewski Date: April 14, 2009.
Exercise problems for students taking the Programming Parallel Computers course. Janusz Kowalik Piotr Arlukowicz Tadeusz Puzniakowski Informatics Institute.
ICS220 – Data Structures and Algorithms Lecture 10 Dr. Ken Cosh.
Illuminating Computer Science CCIT 4-6Sep
Number Systems Part 2 Numerical Overflow Right and Left Shifts Storage Methods Subtraction Ranges.
Parallel Algorithms Sorting and more. Keep hardware in mind When considering ‘parallel’ algorithms, – We have to have an understanding of the hardware.
Higher Computing Computer Systems S. McCrossan 1 Higher Grade Computing Studies 1. Data Representation Data Representation – Why do we use binary? simplicity,
Lecture 7 – Data Reorganization Pattern Data Reorganization Pattern Parallel Computing CIS 410/510 Department of Computer and Information Science.
CSC 41/513: Intro to Algorithms Linear-Time Sorting Algorithms.
Linear Equations and Inequalities. Much education today is monumentally ineffective. All too often we are giving young people cut flowers when we should.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
A Level Computing#BristolMet Session Objectives#U2 S7 MUST understand the difference between an array and record SHOULD be able to estimate the size of.
INTRODUCTION. What is an algorithm? What is a Problem?
Storage and Retrieval Structures by Ron Peterson.
CSC 221: Recursion. Recursion: Definition Function that solves a problem by relying on itself to compute the correct solution for a smaller version of.
Can’t provide fast insertion/removal and fast lookup at the same time Vectors, Linked Lists, Stack, Queues, Deques 4 Data Structures - CSCI 102 Copyright.
Hashing Hashing is another method for sorting and searching data.
AMB HW LOW LEVEL SIMULATION VS HW OUTPUT G. Volpi, INFN Pisa.
MapReduce Algorithm Design Based on Jimmy Lin’s slides
Digital Logic Lecture 3 Binary Arithmetic By Zyad Dwekat The Hashemite University Computer Engineering Department.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Reactive and Output-Only HKOI Training Team 2006 Liu Chi Man (cx) 11 Feb 2006.
Radix Sort and Hash-Join for Vector Computers Ripal Nathuji 6.893: Advanced VLSI Computer Architecture 10/12/00.
Computer Science I Storing data. Binary numbers. Classwork/homework: Catch up. Do analysis of image types.
Representing Integer Data
1 Hashing by Adlane Habed School of Computer Science University of Windsor May 6, 2005.
Dynamic Programming David Kauchak cs161 Summer 2009.
Week 14 - Monday.  What did we talk about last time?  Heaps  Priority queues  Heapsort.
1 Discrete Structures – CNS2300 Text Discrete Mathematics and Its Applications Kenneth H. Rosen (5 th Edition) Chapter 2 The Fundamentals: Algorithms,
Lec 5 part2 Disk Storage, Basic File Structures, and Hashing.
Big O David Kauchak cs302 Spring Administrative Assignment 1: how’d it go? Assignment 2: out soon… Lab code.
CSC 143T 1 CSC 143 Highlights of Tables and Hashing [Chapter 11 p (Tables)] [Chapter 12 p (Hashing)]
The language of computers Before we start you need to know an important fact. Anything to the power 0 is worth 1. You will need to remember this for later!
By: Megan Funk. I will: 1. Explain the binary number system How to: -Generate binary from a number -Add binary 2. Explain the base-b number system 3.
Fundamentals of Computer Science
Algorithm Design Techniques, Greedy Method – Knapsack Problem, Job Sequencing, Divide and Conquer Method – Quick Sort, Finding Maximum and Minimum, Dynamic.
Objectives Today: P4 Data Types – Floating Points P4 Variable Quiz P3 Iteration and Selection Practical Are you logged on? Then come around the table Unit.
Design & Analysis of Algorithm Hashing
Digital Logic and Computer Organization
15-121: Introduction to Data Structures
Binary Addition and Subtraction
Other Kinds of Arrays Chapter 11
Some Basics for Problem Analysis and Solutions
Teaching Computing to GCSE
MapReduce Algorithm Design Adapted from Jimmy Lin’s slides.
Teaching Computing to GCSE
Parallel Sorting Algorithms
How Computers Store Data
Arrays.
1.6) Storing Integer: 1.7) storing fraction:
Presentation transcript:

Uniform Distributions on Integer Matrices

The Problem  How many ways can I fill in a matrix with specified row and column sums? ???2 ???2 ???3 223

A Quick Example ???2 ???2 ???3 223

Who Cares?  Theoretical motivation:  Computer scientists are interested in optimal algorithms for enumeration – this is a test bed  Practical applications:  Anything that’s a network or might be made to look like one

The Social Network  I have a bunch of friends and you have a bunch of friends and our friends have a bunch of friends and so on  This can be represented as a binary matrix

The Social Network  Suppose we see some patterns (structures) in this matrix  Question:  Are the structures simply a result of the fact that some people have more friends than others?  Or is something else going on?

The Social Network  Solution:  Fix the row and column sums (how many friends people have)  Sample uniformly from the matrices satisfying these sums  See whether the structures of interest are likely to occur by chance

Darwin’s Finches  Biologists are also interested in these sorts of matrices  Though no longer obviously network problems  For example: Darwin observed some a number of species of finches distributed across a series of islands

Darwin’s Finches  Question:  Is the distribution of finches explained by the fact that some islands are bigger and can support more species than others and some finches are more common than others?  Or are competitive dynamics at work?

Darwin’s Finches  Solution:  Test by uniformly sampling with fixed number of species per island (column sums) and fixed number of islands on which a species is found (row sums)  Can test how likely observed combinations of species are to occur by chance

Computational Challenge  A small example…  How many 8x8 matrices with row & column sums = 8 do you think are there?

Computational Challenge  Answer:  1,046,591,482,728,408,076,517,376  (a.k.a. 1 septillion plus)

Computational Challenge  As many as that is, this is whittled down from more than 40^64 possibilities  Which is > 10^100 or about 10 times the age of the universe  Even though our algorithm doesn’t actually require us to sort though all these possibilities, we do have to handle the fact that many many (sub)problems lead to the same place  … parallel sort & removal of duplicates = lots of headaches

Strategies: Pros & Cons  Level-by-level vs. depth first + hashing  rank synchronization  Storage: distributed vs. master-worker model  Sorting:  Merge sort  Radix sort  Pre-computing coefficients?

Our Algorithm  Serial algorithm  multimaps, new maps, and more  Class to hold “Problems”  Not the most interesting from a parallel computing standpoint, but several days of coding…

Our Algorithm  Parallel step by step  Rank 0 gets initial problem, calculates random matrix used in load balancing for sort, and broadcasts  For each row, each rank goes though:  Step 0. Calculate new maps for my problems  Step 1. Package the new problems  Step 2. Send and receive all the info (6 all to all calls)  Step 3a. Unpack the received data from all ranks  Step 3b. Merge sort & deduplicate problems received  Step 4. Convert to array which stores final answer (portion of DAG) for this row  Print final DAG & total solutions to file

Did we get the right answer?  Multiple checks of the final answer …  Against small problems we could compute by hand  Against serial code written by Jeff Miller in Python  And lots of printouts & testing individual function performance * Though note answers are floating point … if we really wanted correct down to the last decimal place would need some sort of arbitrary precision implementation

Strong Scaling

Weak Scaling(?)

Future Directions  Improve parallel sort  Better randomization & load balancing  Improve serial algorithm  Can we do a better job of anticipating duplicates?  Deal with really big numbers (arbitrary precision arithmetic)  Start using for real sampling problems!