Computational Statistics with Application to Bioinformatics

Slides:

Advertisements

Similar presentations

Generating Random Numbers

Advertisements

Unit 0, Pre-Course Math Review Session 0.2 More About Numbers

Chapter 8 – Introduction to Number Theory. Prime Numbers prime numbers only have divisors of 1 and self –they cannot be written as a product of other.

Random number generation Algorithms and Transforms to Univariate Distributions.

Prof. Ramin Zabih (CS) Prof. Ashish Raj (Radiology) CS5540: Computational Techniques for Analyzing Clinical Data.

Random Number Generators. Why do we need random variables? random components in simulation → need for a method which generates numbers that are random.

Probability Densities

The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press IMPRS Summer School 2009, Prof. William H. Press 1 4th IMPRS Astronomy.

The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press 1 Computational Statistics with Application to Bioinformatics Prof. William.

Computer ArchitectureFall 2008 © August 25, CS 447 – Computer Architecture Lecture 3 Computer Arithmetic (1)

Chapter 8 – Introduction to Number Theory Prime Numbers

Chapter 8 – Introduction to Number Theory Prime Numbers  prime numbers only have divisors of 1 and self they cannot be written as a product of other numbers.

Random Number Generation Pseudo-random number Generating Discrete R.V. Generating Continuous R.V.

ETM 607 – Random Number and Random Variates

MATH 224 – Discrete Mathematics

Probability theory 2 Tron Anders Moger September 13th 2006.

Analysis of Algorithms

CPSC 531: RN Generation1 CPSC 531:Random-Number Generation Instructor: Anirban Mahanti Office: ICT Class Location:

Chapter 7 Random-Number Generation

Basic Concepts in Number Theory Background for Random Number Generation 1.For any pair of integers n and m, m  0, there exists a unique pair of integers.

Random Number Generators 1. Random number generation is a method of producing a sequence of numbers that lack any discernible pattern. Random Number Generators.

Monte Carlo Methods.

The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press IMPRS Summer School 2009, Prof. William H. Press 1 4th IMPRS Astronomy.

Chapter 10 Hashing. The search time of each algorithm depend on the number n of elements of the collection S of the data. A searching technique called.

Validating a Random Number Generator Based on: A Test of Randomness Based on the Consecutive Distance Between Random Number Pairs By: Matthew J. Duggan,

More on Efficiency Issues. Greatest Common Divisor given two numbers n and m find their greatest common divisor possible approach –find the common primes.

The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press 1 Computational Statistics with Application to Bioinformatics Prof. William.

Modular (Remainder) Arithmetic n = qk + r (for some k; r < k) eg 37 = (2)(17) + 3 Divisibility notation: 17 | n mod k = r 37 mod 17 = 3.

4.5 Inverse of a Square Matrix

DATA & COMPUTER SECURITY (CSNB414) MODULE 3 MODERN SYMMETRIC ENCRYPTION.

The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press 1 Computational Statistics with Application to Bioinformatics Prof. William.

Chapter 6: Continuous Probability Distributions A visual comparison.

Kernel Regression Prof. Bennett Math Model of Learning and Discovery 1/28/05 Based on Chapter 2 of Shawe-Taylor and Cristianini.

Chapter 6: Continuous Probability Distributions A visual comparison.

1.  How does the computer generate observations from various distributions specified after input analysis?  There are two main components to the generation.

Biostatistics Class 3 Probability Distributions 2/15/2000.

Page : 1 bfolieq.drw Technical University of Braunschweig IDA: Institute of Computer and Network Engineering  W. Adi 2011 Lecture-5 Mathematical Background:

Kernel Regression Prof. Bennett

Linear Algebra Review.

Chapter 7. Classification and Prediction

CSE565: Computer Security Lecture 7 Number Theory Concepts

Hexadecimal, Signed Numbers, and Arbitrariness

Great Theoretical Ideas in Computer Science

Normal Distribution.

Analysis of Algorithms

Introduction to Number Theory

CS1371 Introduction to Computing for Engineers

Lecture 2. Deviates from Other Distributions

Random Number Generators

Chapter 10: Solving Linear Systems of Equations

CSCE 411 Design and Analysis of Algorithms

LECTURE 05: THRESHOLD DECODING

Motif p-values GENOME 559: Introduction to Statistical and Computational Genomics Prof. William Stafford Noble.

Parallel Quadratic Sieve

Chapter 7 Random Number Generation

Stream Ciphers Day 18.

CS 475/575 Slide set 4 M. Overstreet Old Dominion University

Chapter 7 Random-Number Generation

LECTURE 05: THRESHOLD DECODING

Validating a Random Number Generator

Factoring RSA Moduli: Current State of the Art J

LECTURE 05: THRESHOLD DECODING

#22 Uncertainty of Derived Parameters

Random Number Generation

Mathematical Background for Cryptography

What we learn with pleasure we never forget. Alfred Mercier

Biologically Inspired Computing: Operators for Evolutionary Algorithms

Generating Random Variates

Mathematical Background: Extension Finite Fields

Presentation transcript:

Computational Statistics with Application to Bioinformatics Prof. William H. Press Spring Term, 2008 The University of Texas at Austin Unit 5: More on Random Deviate Generation The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press

Unit 5: More on Random Deviate Generation (Summary) Derive the theory behind Xorshift random generators matrix representation large matrix powers by successive squaring Look at two good empirical tests of randomness GCD test Gorilla test Learn several methods for generating random deviates from arbitrary distributions transformation method rejection method ratio-of-uniforms method (“teardrops”) especially when combined with squeezes, e.g. Leva’s Normal algorithm The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press

Theory of Xorshift generators General design philosophy: Full period 2N or 2N-1 Bit mixing on every bit Then pass a bunch of empirical tests Represent a word as a vector of N bits. Then the elementary operations are matrix multiplies, mod 2. 1 0 1 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 S-2 = 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 1 0 0 0 1 S+4 = So a generator is M = Sk3 Sk2 Sk1 The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press

The biggest period we can hope for is 2N-1, since the zero vector is its own cycle of length 1. Test for this “full period” by M 2 N ¡ 1 = M ( 2 N ¡ 1 ) = f 6 for every prime factor f of 2N-1 (Why just these?) These huge powers of M are actually easily done by successive squaring M 2 ; 4 8 : and then assembling from the binary representation of the desired power. (Show an example.) The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press

Now, finally, just a brute-force loop over all the possible values k1, k2, k3. Select full period ones. Do empirical tests. (Can use this same method to find full-period linear shift register taps = primitive polynomials mod 2.) The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press

Bit mixing for an actually good (N=64) Xorshift: [i,j] = ndgrid(1:n,1:n); S = @(k) eye(n,n) + (i-j == k) spy(S(17)*S(-31)*S(8)) The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press

k » N o r m a l ( ¹ ; ¾ ) ¼ : 8 4 n + 6 5 3 1 P ( j ) = 6 ¼ Examples of powerful empirical tests of randomness (Marsaglia and Tsang, 2002) The gcd test: Compute the gcd of successive random pairs Euclid’s algorithm for gcd: 366 = 1*297 + 69 297 = 4*69 + 21 69 = 3*21 + 6 21 = 3*6 + 3 6 = 2*3 + 0 k=5 j=3 empirical result (what you actually do is measure using a known-good generator) k » N o r m a l ( ¹ ; ¾ 2 ) ¼ : 8 4 n + 6 5 3 1 P ( j ) = 6 ¼ 2 theoretically calculable count in ~100 bins each for j and k chi-square test the results The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press

P ( ) = e M » B i n o m a l ( 2 ; e ) ¹ = ¾ p The gorilla test: Example: N=4 on a 12-bit generator, testing position 10 The gorilla test: 010010100101 101001010001 011010101110 111010111010 101000010101 010101001010 010100100101 001001000010 101010101010 101010010101 010010001001 000001111100 101010100101 010100101010 010000100101 011110100101 From a random sequence of 2N words, form the sequence of 2N bits from some particular bit position. The number of times each unique N-bit word should appear somewhere in this sequence is ~Poisson(1) So and the number of words that don’t appear is these bits add 1 to bin 4 P ( ) = e ¡ 1 M » B i n o m a l ( 2 N ; e ¡ 1 ) ¹ = ¾ p this is actually not quite right. Computer scientists can you see why? (Think Boyer-Moore!) (a realistic N might be 20 to 30, 106 to 109 bins) Editorial: In the “old days” it was good enough to convert the N bit integer to a float value between 0 and 1, then test randomness of the resulting “real” values (in various ways). Nowadays, people depend on genuine bitwise randomness, so modern tests look much more number-theoretical, even when their basis is still empirical. The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press

Random deviates from other univariate distributions There are lots of nice algorithms specific to common distributions, but there are also some fairly general methods: You must be able to calculate the inverse of the cdf of your distribution! Transformation method: The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press

Rejection method: You must have a bounding function for which you are able to calculate the inverse of the cdf! The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press

Ratio of Uniforms Method (some of the best features of both xformation and rejection) a line of constant x=v/u The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press

jac = jacobian([v/u u^2],[u v]) jac = [ -v/u^2, 1/u] [ 2*u, 0] syms u v; jac = jacobian([v/u u^2],[u v]) jac = [ -v/u^2, 1/u] [ 2*u, 0] abs(det(jac)) ans = 2 The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press

Ratio of Uniforms is particularly powerful when combined with squeezes outer squeeze For this particular example the desired distribution is integer valued (binomial deviates), hence the staircases. For a continuous distribution, there would just be a smooth curve between the squeezes (which would typically be too close together to see clearly). inner squeeze you only compute p(v/u) when you are between the squeezes! The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press

e.g., Leva’s algorithm for normal deviates: here, only ~1% of the (u,v) area is between the squeezes, requiring the calculation of the log The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press