Explorations of Multidimensional Sequence Space. one symbol -> 1D coordinate of dimension = pattern length.

Slides:



Advertisements
Similar presentations
Additional Topics ARTIFICIAL INTELLIGENCE
Advertisements

Traveling Salesperson Problem
Branch and Bound Optimization In an exhaustive search, all possible trees in a search space are generated for comparison At each node, if the tree is optimal.
Blind Search1 Solving problems by searching Chapter 3.
Bar Ilan University And Georgia Tech Artistic Consultant: Aviya Amir.
Procedures of Extending the Alphabet for the PPM Algorithm Radu Rădescu George Liculescu Polytechnic University of Bucharest Faculty of Electronics, Telecommunications.
Lecture 7 – Algorithmic Approaches Justification: Any estimate of a phylogenetic tree has a large variance. Therefore, any tree that we can demonstrate.
Dimension reduction : PCA and Clustering Agnieszka S. Juncker Slides: Christopher Workman and Agnieszka S. Juncker Center for Biological Sequence Analysis.
Trees and Sequence Space J. Peter Gogarten University of Connecticut Dept. of Molecular and Cell Biology Sculpture at Royal Botanical Gardens, Kew.
Ways to construct Protein Space Construction of sequence space from (Eigen et al. 1988) illustrating the construction of a high dimensional sequence space.
A Very Basic Gibbs Sampler for Motif Detection Frances Tong July 28, 2004 Southern California Bioinformatics Summer Institute.
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman.
COMP305. Part II. Genetic Algorithms. Genetic Algorithms.
Identifying functional residues of proteins from sequence info Using MSA (multiple sequence alignment) - search for remote homologs using HMMs or profiles.
COMP305. Part II. Genetic Algorithms. Genetic Algorithms.
Dimension reduction : PCA and Clustering Christopher Workman Center for Biological Sequence Analysis DTU.
"Nothing in biology makes sense except in the light of evolution" Theodosius Dobzhansky.
Dimension reduction : PCA and Clustering by Agnieszka S. Juncker Part of the slides is adapted from Chris Workman.
Exploring Protein Sequences Tutorial 5. Exploring Protein Sequences Multiple alignment –ClustalW Motif discovery –MEME –Jaspar.
Parsing SLP Chapter 13. 7/2/2015 Speech and Language Processing - Jurafsky and Martin 2 Outline  Parsing with CFGs  Bottom-up, top-down  CKY parsing.
Sequence comparison: Local alignment
Binary Numbers.
Approaches To Infinity. Fractals Self Similarity – They appear the same at every scale, no matter how much enlarged.
Motif finding: Lecture 1 CS 498 CXZ. From DNA to Protein: In words 1.DNA = nucleotide sequence Alphabet size = 4 (A,C,G,T) 2.DNA  mRNA (single stranded)
Problem 1 Given a high-resolution computer image of a map of an irregularly shaped lake with several islands, determine the water surface area. Assume.
Multiple Sequence Alignment CSC391/691 Bioinformatics Spring 2004 Fetrow/Burg/Miller (Slides by J. Burg)
Multiple Alignment Modified from Tolga Can’s lecture notes (METU)
Motif Discovery in Protein Sequences using Messy De Bruijn Graph Mehmet Dalkilic and Rupali Patwardhan.
Review: Search problem formulation Initial state Actions Transition model Goal state (or goal test) Path cost What is the optimal solution? What is the.
SPANISH CRYPTOGRAPHY DAYS (SCD 2011) A Search Algorithm Based on Syndrome Computation to Get Efficient Shortened Cyclic Codes Correcting either Random.
1 Generalized Tree Alignment: The Deferred Path Heuristic Stinus Lindgreen
Infinities 6 Iteration Number, Algebra and Geometry.
A B CA’ B’ C’ Similar Shapes The following diagram shows an enlargement by a scale factor 3 of triangle ABC Note Each length on the enlargement A’B’C’
Pattern Recognition Introduction to bioinformatics 2006 Lecture 4.
Motif finding with Gibbs sampling CS 466 Saurabh Sinha.
Motif discovery Tutorial 5. Motif discovery MEME Creates motif PSSM de-novo (unknown motif) MAST Searches for a PSSM in a DB TOMTOM Searches for a PSSM.
Comp. Genomics Recitation 3 The statistics of database searching.
Fractals smooth surfaces and regular shapes - Euclidean-geometry methods -object shapes were described with equations natural objects - have irregular.
Introduction to Algorithms Chapter 16: Greedy Algorithms.
HMMs for alignments & Sequence pattern discovery I519 Introduction to Bioinformatics.
CS 345: Topics in Data Warehousing Tuesday, October 19, 2004.
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman modified by Hanne Jarmer.
Strategies and Rubrics for Teaching Chaos and Complex Systems Theories as Elaborating, Self-Organizing, and Fractionating Evolutionary Systems Fichter,
Computational Biology, Part 3 Representing and Finding Sequence Features using Frequency Matrices Robert F. Murphy Copyright  All rights reserved.
Lecture 11 CS5661 Structural Bioinformatics – Structure Comparison Motivation Concepts Structure Comparison.
The College Application Process. Welcome to Naviance Understanding The Common App and Apply Texas Matching the Common Application Adding colleges to the.
3.3b1 Protein Structure Threading (Fold recognition) Boris Steipe University of Toronto (Slides evolved from original material.
1 What did we learn before?. 2 line and segment generation.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
Turing’s Thesis Costas Busch - LSU.
Multiple sequence alignment (msa)
Ways to construct Protein Space
CS 1321.
Sequence comparison: Local alignment
Learning Sequence Motif Models Using Expectation Maximization (EM)
Average: 86.5% Median: 88% Stdev: 9%
Artificial Intelligence
There are four levels of structure in proteins
Average: 86.5% Median: 88% Stdev: 9%
Artificial Intelligence
APPLICATIONS of FRACTALS
Why use Binary? It is a two state system (on/off) which makes it simple to operate Even if degradation of current occurs (ie a slight drop in voltage)
UNIT 2 So what DOES it take to be an empire?
Dimension reduction : PCA and Clustering
WARM-UP 8 in. Perimeter = _____ 7 in. Area = _____ 12 in. 4 cm
Multidimensional Scaling
CO Games Development 1 Week 8 Depth-first search, Combinatorial Explosion, Heuristics, Hill-Climbing Gareth Bellaby.
Lecture 7 – Algorithmic Approaches
Motifs.
Activity 2 How is the word length of a two’s complement representation changed without affecting its value? In this activity, we are going to study how.
Presentation transcript:

Explorations of Multidimensional Sequence Space

one symbol -> 1D coordinate of dimension = pattern length

Two symbols -> Dimension = length of pattern length 1 = 1D:

Two symbols -> Dimension = length of pattern length 2 = 2D: dimensions correspond to position For each dimension two possibiities Note: Here is a possible bifurcation: a larger alphabet could be represented as more choices along the axis of position!

Two symbols -> Dimension = length of pattern length 3 = 3D:

Two symbols -> Dimension = length of pattern length 4 = 4D: aka Hypercube

Two symbols -> Dimension = length of pattern

Three Symbols (another solution is to use more values for each dimension)

Four Symbols: I.e.: with an alphabet of 4, we have a hypercube (4D) already with a pattern size of 2, provided we stick to a binary pattern in each dimension.

hypercubes at 2 and 4 alphabets 2 character alphabet, pattern size 4 4 character alphabet, pattern size 2

Three Symbols Alphabet suggests fractal representation

3 fractal enlarge fill in outer pattern repeats inner pattern = self similar = fractal

3 character alphapet 3 pattern fractal

3 character alphapet 4 pattern fractal Conjecture: For n -> infinity, the fractal midght fill a 2D triangle Note: check Mandelbrot

Same for 4 character alphabet 1 position 2 positions 3 positions

4 character alphabet continued (with cheating I didn’t actually add beads) 4 positions

4 character alphabet continued (with cheating I didn’t actually add beads) 5 positions

4 character alphabet continued (with cheating I didn’t actually add beads) 6 positions

4 character alphabet continued (with cheating I didn’t actually add beads) 7 positions

Animated GIf 1-12 positions

Protein Space in JalView

Alignment of V F A ATPase ATP binding SU (catalytic and non- catalytic SU)

UPGMA tree of V F A ATPase ATP binding SU with line dropped to partition (and colour) the 4 SU types (VA cat and non cat, F cat and non cat). Note that details of the tree

PCA analysis of V F A ATPase ATP binding SU using colours from the UPGMA tree

Same PCA analysis of V F A ATPase ATP binding SU using colours from the UPGMA tree, but turned slightly. (Giardia A SU selected in grey.)

Same PCA analysis of V F A ATPase ATP binding SU Using colours from the UPGMA tree, but replacing the 1st with the 5th axis. (Eukaryotic A SU selected in grey.)

Same PCA analysis of V F A ATPase ATP binding SU Using colours from the UPGMA tree, but replacing the 1st with the 6th axis. (Eukaryotic B SU selected in grey - forgot rice.)

Problems Jalview’s approach requires an alignment. Solution: Use pattern absence / presence as coordinate Which patterns? –GBLOCKS (new additions use PSSMs) –CDD PSSM profiles –It would be nice to stick to small words. One could screen for words/motifs/PSSMs that have a good power of resolution: –PCA with all, choose only the ones that contribute to the main axis –probably better to do data bank search and find how often it is present. One could generate random motifs (or all possible motifs) and check them out (Criterion needs work). –Empirical orthogonality –Exhaustive vs random –How to judge discriminatory power (maybe 5% significance value) –Present absence - optimal discriminatory power?