Overview Overview A Quantum Computation Simulation Language Anomaly Detection in the Windows Registry Detecting Splice Sites in Genes Rotationally Invariant Face Detection
-HSK -HSK A Quantum Programming Language and Compiler Katherine H eller, Krysta S vore, Maryam K amvar (Al Aho)
What is -HSK? Quantum Computation Simulation Language Quantum Computation Simulation Language Quantum Compiler Quantum Compiler Q-HSK enables simplified programming of quantum algorithms with built-in graphics Q-HSK enables simplified programming of quantum algorithms with built-in graphics
Many Worlds Interpretation One formulation of quantum theory One formulation of quantum theory Each universe has a corresponding amplitude (i.e. complex number) Each universe has a corresponding amplitude (i.e. complex number) |amplitude| 2 = probability of existence x u1u1 u2u2 u4u4 u3u3
Qubits Quantum analogue of a classical bit Quantum analogue of a classical bit Takes on values 0, 1, or superposition of states: Takes on values 0, 1, or superposition of states: | ω › = α | 0 › + β | 1 › where |α| 2 + |β| 2 = 1 | ω › = cos(θ / 2) | 0 › + e iφ sin(θ / 2) | 1 ›
Quantum Gates Reversible – all unitary operators (U † U= I ) Reversible – all unitary operators (U † U= I ) Universal quantum gates – {U2,XOR}, Toffoli Universal quantum gates – {U2,XOR}, Toffoli Some common gates – Hadamard, QFT, CNOT Some common gates – Hadamard, QFT, CNOT HH | 1›| 1›| 1›| 1› | 0›| 0›| 0›| 0› 1/√2 ( | 0 › + | 1 › )
Key Features of the Q-HSK Compiler Familiar C-style syntax Familiar C-style syntax Matrix operations via CBLAS Matrix operations via CBLAS Complex and real data types Complex and real data types A quantum type qreg A quantum type qreg A graphical view of quantum algorithms A graphical view of quantum algorithms Lucid representation of quantum qubits, registers, and gates Lucid representation of quantum qubits, registers, and gates Interactive user options (start, stop, pause, change animation rate) Interactive user options (start, stop, pause, change animation rate) Detailed text output to trace algorithm Detailed text output to trace algorithm
A Simple Example int main( ) { int a, i; qreg *q; q=create(5); i = 0; while (i < 5) { q[i] = (0.0, 0.0); i = i + 1; } q = computeHadamard(q); a = Measure(q); printf(“This is the measure: %d”, a); return 0; } q H M
Shor’s Algorithm Shor’s Algorithm Factors large numbers Factors large numbers n - number to factorize n - number to factorize x – random number x – random number a – ranges from 0 to q-1 a – ranges from 0 to q-1 n 2 <=q<=2n 2 n 2 <=q<=2n 2 r – period of x a (mod n) – exp. classically r – period of x a (mod n) – exp. classically one factor of n is gcd(x r/2 -1,n) – fast classically one factor of n is gcd(x r/2 -1,n) – fast classically
Graphical Interface
Architecture of Q-HSK Compiler Program.q Lexical AnalyzerSyntax Analyzer Semantic AnalyzerTranslator Program.cppg++ Java Graphics Executable lex.yy.cy.tab.ctranslate.c javac
One Class Support Vector Machines for Detecting Anomalous Windows Registry Accesses Collaborators: Krysta Svore, Angelos Keromytis, Sal Stolfo
Host Based Intrusion Detection Systems Microsoft Windows – most often attacked Current method to combat attacks Virus Scanners and Security Patches Virus Scanners and Security Patches Problem: These do not combat unknown attacks so frequent updates are needed Problem: These do not combat unknown attacks so frequent updates are needed Host based IDS Monitor system accesses to detect intrusions Monitor system accesses to detect intrusions Application of data mining techniques Application of data mining techniques
The Windows Registry and RAD Windows Registry Windows Registry Stores configuration settings for system parameters – security information, programs, etc. Stores configuration settings for system parameters – security information, programs, etc. Programs query the registry for information Programs query the registry for information Registry Anomaly Detection Registry Anomaly Detection audit sensor audit sensor model generator model generator anomaly detector anomaly detector Process: EXPLORER.EXE Query: OpenKey Key: HKCR\CKSUD\{B41DB860-8EE4-11D EA9FADC173CA}\shellex\MayChangeDefaultMenu Response: SUCCESS ResultValue: NOTFOUND
Probabilistic Anomaly Detection Algorithm Computes 25 consistency checks: P(X i ) and P(X i |X j ) P(X i ) and P(X i |X j ) Multinomial with Hierarchical Prior For observed elements i: P(X = i) = C*(N i + α)/(k 0 α+N) P(X = i) = C*(N i + α)/(k 0 α+N) where N - total number of observations Ni - number of observations of symbol I α – “pseudo count” for each observed symbol k 0 – number of observed symbols L – number of possible symbols For unobserved elements i: P(X = i) = (1-C)*1/(L-k 0 ) P(X = i) = (1-C)*1/(L-k 0 ) C= N/(N+L-k 0 ) C= N/(N+L-k 0 )
One Class SVMs Analogous to two class SVM where all data lies in the first class and the origin is sole member of second class Analogous to two class SVM where all data lies in the first class and the origin is sole member of second class Solve optimization problem to find rule f with maximal margin Solve optimization problem to find rule f with maximal margin f(x)=‹w,x›+b Equivalent to solving the dual quadratic programming problem: Equivalent to solving the dual quadratic programming problem: min α (1/2) ∑ I,j α i α j K(x i,x j ) s.t. 0≤α i ≤1/(νl), ∑ i α i = 0 Kernel function projects input vectors into a feature space allowing for non-linear decision boundaries Kernel function projects input vectors into a feature space allowing for non-linear decision boundaries Φ: X → R N K(x i,x j ) = ‹Φ(x i ), Φ(x j )›
Experiments Kernels: Kernels: Linear: K(x,y) = (x·y) Linear: K(x,y) = (x·y) Polynomial: K(x,y) = (x·y+1) d Polynomial: K(x,y) = (x·y+1) d Gaussian: K(x,y) = e -║x-y║ 2 /(2σ 2 ) Gaussian: K(x,y) = e -║x-y║ 2 /(2σ 2 ) Feature Vectors: Feature Vectors: Binary Binary Frequency-based Frequency-based
Results
Sequence Information for the Splicing of Human Pre-mRNA Identified by Support Vector Machine Classification Collaborators: Xiang Zhang, Ilana Hefter, Christina Leslie, Larry Chasin
What Is Splicing? Exon1Exon2Intron Exon1Exon2 Exon1 DonorBranchAcceptor DNA mRNA
Pseudo Exons Consensus Sequences Donor Site: Donor Site: MAG|gtragt (M=A/C, r=a/g) Acceptor Site: Acceptor Site: (y) 10 ncag|G (y=c/t, n=a/c/g/t) Donor and acceptor sites scored based on closeness to consensus Identifying Pseudo Exons Intronic segments Intronic segments Have high scoring “donor” and “acceptor” sites Have high scoring “donor” and “acceptor” sites We look for discriminative signals in intronic regions near real and pseudo exons
String Kernels Feature map: number of times each k-length (contiguous) string occurs in sequence Feature map: number of times each k-length (contiguous) string occurs in sequence Dimension of feature space is N k Dimension of feature space is N k Example: k=2 Sequence = ACCTGGTG 1 AC 0 AA 0 AG 0 AT 0 CA 1 CC 0 CG 1 CT 0 GA 0 GC 1 GG 1 GT 0 TA 0 TC 2 TG 0 TT
Splice Kernels Hypothesis: False splice sites are intrinsically defective due to bad internal nt combinations All possible size k internal nt combinations are features Example (k=2): If the internal combination (3g,5a) occurs, that feature value is 1, otherwise it is 0
Recursive Feature Selection Normal vector to the hyperplane: Normal vector to the hyperplane: w=∑ i=1..m y i α i x i If |w j | large in absolute value, the jth feature is important for SVM discrimination If |w j | large in absolute value, the jth feature is important for SVM discrimination Approximation due to degree 2 polynomial kernel – calculate w up and w down separately, then eliminate bottom 50% of features for each Approximation due to degree 2 polynomial kernel – calculate w up and w down separately, then eliminate bottom 50% of features for each Stop when ROC score drops below 90% of original value on untouched test set Stop when ROC score drops below 90% of original value on untouched test set
Results FlanksSplice Sites Exon Body ROCSpecificity a USDS3’5’ CV b –––– – + ––– – –– –– + –– –– – + – –– ++ – – –––– –– –– Splice Sites Flanks Exon Bodies True positives detected 32/37 35/37 37/
Rotationally Invariant Face Detection Using Multi-Resolution Histograms Collaborators: Shikher Bisaria, Tony Jebara
Face Detection Given a picture with faces, how do we determine where the faces are in the image? Which pixels are face pixels? Given a picture with faces, how do we determine where the faces are in the image? Which pixels are face pixels? We would like to determine this with a system that: We would like to determine this with a system that: Runs in real time Runs in real time Recognizes rotations of faces Recognizes rotations of faces (e.g. when someone tilts their head to one side) (e.g. when someone tilts their head to one side)
Gaussian Blurring Face images are greyscale (.pgms) Face images are greyscale (.pgms) Successive levels of blur are obtained by reconvolving previous level of blur images with a 2 dimensional gaussian function Successive levels of blur are obtained by reconvolving previous level of blur images with a 2 dimensional gaussian function Mathematically equivalent to two passes of a one dimensional gaussian function g(i,j) = 1/(2πσ 2 ) ∑ m ∑ n e -(m 2 +n 2 )/(2σ 2 ) · f(i-m,j-n) = 1/(2πσ 2 ) ∑ m e -m 2 /(2σ 2 ) · ∑ n e -n 2 /(2σ 2 ) · f(i-m,j-n) = 1/(2πσ 2 ) ∑ m e -m 2 /(2σ 2 ) · ∑ n e -n 2 /(2σ 2 ) · f(i-m,j-n)
Multi-Resolution Histograms Histogram equalize the image Concatenate histograms of image together after successive levels of gaussian blurring Concatenate histograms of image together after successive levels of gaussian blurring
Average Histograms Compute average face and non-face multi-resolution histograms from training set Compute average face and non-face multi-resolution histograms from training set Average Non-Face Histogram Average Face Histogram
Optimization Problem C(α) = min α ║H FAVG – h F ║ 2 + ║H NFAVG – h NF ║ 2 Where h F = (1/∑ i α i ) ∑ i α i h i h NF = (1/∑ i (1- α i )) ∑ i (1-α i )h i such that 0≤ α i ≤ 1, ∑ i α i = 1 Let β i = (1- α i ) Q = ‹h i,h j › Q = ‹h i,h j › c α = ‹h i,H FAVG › · constant c α = ‹h i,H FAVG › · constant c β = ‹h i,H NFAVG › · constant c β = ‹h i,H NFAVG › · constant = min α,β α T Qα + 1/(N-1) 2 β T Qβ – 2c α T α – 2/(N-1)c β T β
Solve Using SMO α i NEW = [ 1/(N-1) 2 Q ii - 1/(N-1) 2 ∑ k≠i,j α k Q jj + (1- ∑ k≠i,j α k ) Q jj - (1- ∑ k≠i,j α k ) Q ij + 1/(N-1) 2 ∑ k≠i,j α k Q ij - 1/(N-1) 2 Q ij - c α i + c β i + c α j - c β j + ∑ k≠i,j (α k Q ik ) - ∑ k≠i,j (α k Q jk ) - 1/(N-1) 2 ∑ k≠i,j (α k Q ik ) + 1/(N-1) 2 ∑ k≠i,j (α k Q jk )] / [Q ii + Q jj - 2Q ij + 1/(N-1) 2 Q ii + 1/(N-1) 2 Q jj - 2/(N-1) 2 Q ij ] Bounds for α i NEW : L = 0 H = 1 - ∑ k≠i,j α k α j NEW = (1 - ∑ k≠i,j α k ) - α i NEW
Results