CSE-700 Parallel Programming Assignment 6 POSTECH Oct 19, 2007 박성우.

Slides:



Advertisements
Similar presentations
Clustering Overview Algorithm Begin with all sequences in one cluster While splitting some cluster improves the objective function: { Split each cluster.
Advertisements

Models of Concurrency Manna, Pnueli.
Computational Molecular Biology Biochem 218 – BioMedical Informatics Doug Brutlag Professor.
1 Orthologs: Two genes, each from a different species, that descended from a single common ancestral gene Paralogs: Two or more genes, often thought of.
CSE-321 Programming Languages Predicative Polymorphic -Calculus POSTECH May 23, 2007 박성우.
. Class 9: Phylogenetic Trees. The Tree of Life Evolution u Many theories of evolution u Basic idea: l speciation events lead to creation of different.
Sequence allignement 1 Chitta Baral. Sequences and Sequence allignment Two main kind of sequences –Sequence of base pairs in DNA molecules (A+T+C+G)*
Genomic Innovations- Orthology Paralogy. Genomic innovation.
GENE TREES Abhita Chugh. Phylogenetic tree Evolutionary tree showing the relationship among various entities that are believed to have a common ancestor.
Comparative genomics Joachim Bargsten February 2012.
Advanced Topics in Algorithms and Data Structures Page 1 Parallel merging through partitioning The partitioning strategy consists of: Breaking up the given.
M ulti P aranoid Automatic Clustering of Orthologs and Inparalogs Shared by Multiple Proteomes Andrey Alexeyenko Ivica Tamas Gang Liu Erik L.L. Sonnhammer.
CS262 Lecture 9, Win07, Batzoglou History of WGA 1982: -virus, 48,502 bp 1995: h-influenzae, 1 Mbp 2000: fly, 100 Mbp 2001 – present  human (3Gbp), mouse.
Xenolog: Homologs resulting from horizontal gene transfer.
Fall 2004COMP 3351 Recursively Enumerable and Recursive Languages.
Finding Orthologous Groups René van der Heijden. What is this lecture about? What is ‘orthology’? Why do we study gene-ancestry/gene-trees (phylogenies)?
Evolution at the DNA level …ACGGTGCAGTTACCA… …AC----CAGTCCACCA… Mutation SEQUENCE EDITS REARRANGEMENTS Deletion Inversion Translocation Duplication.
1 Uncountable Sets continued Theorem: Let be an infinite countable set. The powerset of is uncountable.
Tree Pattern Matching in Phylogenetic Trees Automatic Search for Orthologs or Paralogs in Homologous Gene Sequence Databases By: Jean-François Dufayard,
CS273a Lecture 10, Aut 08, Batzoglou Multiple Sequence Alignment.
Software and Software Vulnerabilities. Synopsis Array overflows Stack overflows String problems Pointer clobbering. Dynamic memory management Integer.
College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -1- Compiler Construction Principles & Implementation.
CS273a Lecture 9/10, Aut 10, Batzoglou Multiple Sequence Alignment.
. Class 9: Phylogenetic Trees. The Tree of Life D’après Ernst Haeckel, 1891.
Finding Orthologous Groups René van der Heijden. What is this lecture about? What is ‘orthology’? Why do we study gene-ancestry/gene-trees (phylogenies)?
DNA Barcode Data Analysis Boosting Accuracy by Combining Simple Classification Methods CSE 377 – Bioinformatics - Spring 2006 Sotirios Kentros Univ. of.
"Quadratic time algorithms for finding common intervals in two and more sequences" by T. Schmidt and J. Stoye, Proc. 15th Annual Symposium on Combinatorial.
Building Phylogenies Parsimony 1. Methods Distance-based Parsimony Maximum likelihood.
Short Primer on Comparative Genomics Today: Special guest lecture 12pm, Alway M108 Comparative genomics of animals and plants Adam Siepel Assistant Professor.
Phylogenetic Tree Construction and Related Problems Bioinformatics.
CSCI 3301 Transparency No. 9-1 Chapter #9: Finite State Machine Optimization Contemporary Logic Design.
Quadratic Time Algorithms for Finding Common Intervals in Two and More Sequences Thomas Schmidt Jens Stoye CPM 2004, Istanbul.
Phylogenetic trees Sushmita Roy BMI/CS 576
Sequence Alignment and Phylogenetic Prediction using Map Reduce Programming Model in Hadoop DFS Presented by C. Geetha Jini (07MW03) D. Komagal Meenakshi.
Sequence Analysis Alignments dot-plots scoring scheme Substitution matrices Search algorithms (BLAST)
7.5 Inverse Function 3/13/2013. x2x+ 3 x What do you notice about the 2 tables (The original function and it’s inverse)? The.
Identification of Protein Domains Eden Dror Menachem Schechter Computational Biology Seminar 2004.
Identification of Protein Domains. Orthologs and Paralogs Describing evolutionary relationships among genes (proteins): Two major ways of creating homologous.
Property of Jack Wilson, Cerritos College1 CIS Computer Programming Logic Programming Concepts Overview prepared by Jack Wilson Cerritos College.
Complexity 20-1 Complexity Andrei Bulatov Parallel Arithmetic.
Orthology & Paralogy Alignment & Assembly Alastair Kerr Ph.D. [many slides borrowed from various sources]
Using blast to study gene evolution – an example.
1 Project: Page Replacement Algorithms Lubomir Bic.
CS307P-SYSTEM PRACTICUM CPYNOT. B13107 – Amit Kumar B13141 – Vinod Kumar B13218 – Paawan Mukker.
1 Objectives ❏ To understand the differences between text and binary files ❏ To write programs that read, write, and/or append binary files ❏ To be able.
CPSC 871 John D. McGregor Module 8 Session 3 Assignment.
Recursively Enumerable and Recursive Languages
Flowcharts C++ Lab. Algorithm An informal definition of an algorithm is: a step-by-step method for solving a problem or doing a task. Input data A step-by-step.
Building Phylogenies Maximum Likelihood. Methods Distance-based Parsimony Maximum likelihood.
Computer Science: A Structured Programming Approach Using C1 Objectives ❏ To understand the differences between text and binary files ❏ To write programs.
Gene3D, Orthology and Homology-Based Inheritance of Protein-Protein Interactions Corin Yeats
(Proof By) Induction Recursion
Reconstructing the Evolutionary History of Complex Human Gene Clusters
Overview of Compilation The Compiler Front End
Overview of Compilation The Compiler Front End
Exam 3 Review.
Genome Annotation Continued
Problem Solving Techniques
CS 2308 Exam I Review.
By Chunfang Zheng and David Sankoff, 2014
CS 100: Roadmap to Computing
Cyclopeptide Sequencing Problem
Implement FSM with fewest possible states • Least number of flip flops
CSE 589 Applied Algorithms Spring 1999
Pairwise Sequence Alignment
Sorting Sorting is a fundamental problem in computer science.
Theory of Computation Lecture 23: Turing Machines III
Chapter 9 -- Simplification of Sequential Circuits
Presentation transcript:

CSE-700 Parallel Programming Assignment 6 POSTECH Oct 19, 2007 박성우

2 Species and Sequences Sequence 1 Species Sequence 2 Sequence n...

3 Ortholog Last Common Ancestor S Human S1 Dog S2 By speciation

4 Paralog Human S S1S1' By duplication

5 Inparalog Last Common Ancestor S Human S1 Chimpanzee S2 By speciation S1' By duplication

6 Paralog - Outparalog LCA HumanDog LCA = Last Common Ancestor SS'S1S1'S2S2'

7 Coortholog S1' Species A S1 Species B S2 S2'

8 Input Assume a total of n species S1, S2,..., Sn For each pair of species {Si, Sj} –Ortholog and paralog relations Thus n(n + 1)/2 ortholog/paralog files

9 Seed Ortholog Species A Si Species B Sj 1.0 Cluster

10 Invariant: No Two Seed Orthologs for Any Sequence Species A Si Species B Sj 1.0 Sk 1.0

11 Ortholog and Paralogs Species A Si Species B Sj 1.0 Cluster Si'

12 Output Assume a total of n species S1, S2,..., Sn Ortholog and paralog relations among all these species In each cluster, –seed ortholog from each pair of species –paralogs may be included.

13 Example of Cluster [1] A S1'S1 B S2S2' C S3S3' D S4'S4

14 Example of Cluster [2] A S1'S1 B S2S2' C S3S3' D S4'S4

15 Bad Clusters [1] A S1'S1 B S2S2' C S3S3' D S4'S4 E S5'S5

16 Bad Clusters [2] C S3 D S4'S4 E S6'S6 S4'' S5

17 Input File Format Each line consists of: –Cluster number –Similarity score –Species name –Seed ortholog –Sequence name

18 Goal Implement ANY sequential algorithm –There is no definitive answer. Then parallelize it. A parser and an output module are provided. –no string comparion –all integer operations