FISH Fast Identification of Segmental Homology University of North Carolina at Chapel Hill Shian-Gro Wu.

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

Object Recognition from Local Scale-Invariant Features David G. Lowe Presented by Ashley L. Kapron.
Nonmetric Multidimensional Scaling input data are ranks: most similar pair AB
EGN 1006 – Introduction to Engineering Engineering Problem Solving and Excel.
BIOINFORMATICS GENE DISCOVERY BIOINFORMATICS AND GENE DISCOVERY Iosif Vaisman 1998 UNIVERSITY OF NORTH CAROLINA AT CHAPEL HILL Bioinformatics Tutorials.
Finding regulatory modules from local alignment - Department of Computer Science & Helsinki Institute of Information Technology HIIT University of Helsinki.
Patch to the Future: Unsupervised Visual Prediction
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL FastANOVA: an Efficient Algorithm for Genome-Wide Association Study Xiang Zhang Fei Zou Wei Wang University.
Docking Algorithm Scheme Part 1: Molecular shape representation Part 2: Matching of critical features Part 3: Filtering and scoring of candidate transformations.
Multiple sequence alignment Conserved blocks are recognized Different degrees of similarity are marked.
Fast Software Encryption Producing collisions for P ANAMA, instantaneously Joan Daemen and Gilles Van Assche STMicroelectronics.
1 Convolution and Its Applications to Sequence Analysis Student: Bo-Hung Wu Advisor: Professor Herng-Yow Chen & R. C. T. Lee Department of Computer Science.
Introduction to Genetics. Chromosomes Chromosomes are made up of DNA wrapped around proteins. Each chromosome codes for several genes. Each Gene codes.
Cluster Analysis (1).
We are developing a web database for plant comparative genomics, named Phytome, that, when complete, will integrate organismal phylogenies, genetic maps.
DNA Barcode Data Analysis Boosting Accuracy by Combining Simple Classification Methods CSE 377 – Bioinformatics - Spring 2006 Sotirios Kentros Univ. of.
Fast identification and statistical evaluation of segmental homologies in comparative maps Peter Calabrese 1, Sugata Chakravarty 2 and Todd Vision 3 1.
Clustering Ram Akella Lecture 6 February 23, & 280I University of California Berkeley Silicon Valley Center/SC.
UCSC Known Genes Version 3 Take 10. Overall Pipeline Get alignments etc. from database Remove antibody fragments Clean alignments, project to genome Cluster.
CoNA : Dynamic Application Mapping for Congestion Reduction in Many-Core Systems 2012 IEEE 30th International Conference on Computer Design (ICCD) M. Fattah,
Raster Data Analysis Chapter 11. Introduction  Regular grid  Value in each cell corresponds to characteristic  Operations on individual, group, or.
Martin-Gay, Beginning Algebra, 5ed
Mouse Genome Sequencing
A hierarchical approach to building contig scaffolds Mihai Pop Dan Kosack Steven L. Salzberg Genome Research 14(1), pp , 2004.
PE-Assembler: De novo assembler using short paired-end reads Pramila Nuwantha Ariyaratne.
Distance. Euclidean Distance Minimum distance from a source (Value NoData) Input grid must have at least one source cell with the rest of the grid.
1 Converting Categories to Numbers for Approximate Nearest Neighbor Search 嘉義大學資工系 郭煌政 2004/10/20.
March 20, 2007 ISPD An Effective Clustering Algorithm for Mixed-size Placement Jianhua Li, Laleh Behjat, and Jie Huang Jianhua Li, Laleh Behjat,
Personalized Web Search by Mapping User Queries to Categories Fang Liu Presented by Jing Zhang CS491CXZ February 26, 2004.
Lecture 5 Raster Data Analysis Introduction Analysis with raster data is simple and efficient for it’s feature based on position Analysis.
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
Autonomic scheduling of tasks from data parallel patterns to CPU/GPU core mixes Published in: High Performance Computing and Simulation (HPCS), 2013 International.
4/23/13CMPS 3120 Computational Geometry1 CMPS 3120: Computational Geometry Spring 2013 Shape Matching II A B   (B,A)
Jaruloj Chongstitvatana Advanced Data Structures 1 Index Structures for Multimedia Data Feature-based Approach.
Methods of data fusion in information retrieval rank vs. score combination D. Frank Hsu Department of Computer and Information Science Fordham University.
 Read quality  Adaptor trimming  Read sequence collapse Preprocessing Genome mapping  Map read to the spruce genome (Pabies1.0- genome.fa) using Patman
1 Image Matching using Local Symmetry Features Daniel Cabrini Hauagge Noah Snavely Cornell University.
A New Spatial Index Structure for Efficient Query Processing in Location Based Services Speaker: Yihao Jhang Adviser: Yuling Hsueh 2010 IEEE International.
Augmented Reality and 3D modelling Done by Stafford Joemat Supervised by Mr James Connan.
SIFT DESCRIPTOR K Wasif Mrityunjay
Introduction to DNA. Question: From your on-line computer activity, what do you know about the structure of DNA?
Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!
Fill in Google Doc when finished.. Match the pair of points with the expression that gives the distance between the points. #1.
G W. Yan 1 Multi-Model Fusion for Robust Time-Series Forecasting Weizhong Yan Industrial Artificial Intelligence Lab GE Global Research Center.
SRI International Bioinformatics Selected PathoLogic Refining Tasks Creation of Protein Complexes Assignment of Modified Proteins Operon Prediction.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL FVGWAS: Fast Voxelwise Genome Wide Association Analysis of Large-scale Imaging Genetic Data Tutorial: pipeline,
Assembly S.O.P. Overlap Layout Consensus. Reference Assembly 1.Align reads to a reference sequence 2.??? 3.PROFIT!!!!!
CyVerse Workshop Transcriptome Assembly. Overview of work RNA-Seq without a reference genome Generate Sequence QC and Processing Transcriptome Assembly.
Expediting Peer-to-Peer Simulation using GPU Di Niu, Zhengjun Feng Apr. 14 th, 2009.
JERI DILTS SUZANNA KIM HEMA NAGRAJAN DEEPAK PURUSHOTHAM AMBILY SIVADAS AMIT RUPANI LEO WU Genome Assembly Final Results
Gene prediction in metagenomic fragments: A large scale machine learning approach Katharina J Hoff, Maike Tech, Thomas Lingner, Rolf Daniel, Burkhard Morgenstern.
Vertical Set Square Distance Based Clustering without Prior Knowledge of K Amal Perera,Taufik Abidin, Masum Serazi, Dept. of CS, North Dakota State University.
bacteria and eukaryotes
Amos Tanay Nir Yosef 1st HCA Jamboree, 8/2017
Detection of genome regulation sequences
For each pair of polynomials, find the least common multiple. Example For each pair of polynomials, find the least common multiple.
Sequence comparison: Local alignment
Notes Over 2.1 Function {- 3, - 1, 1, 2 } { 0, 2, 5 }
Introduction to Bioinformatics II
VOCABULARY! EXAMPLES! Relation: Domain: Range: Function:
Heidi Hunter-Goldsworthy, University of North Carolina – Chapel Hill
Climate Group 2 Jiajun LI, Serena DONG, Charis DENG.
Anastasia Baryshnikova  Cell Systems 
Relations and Functions
Protein Structure Prediction by A Data-level Parallel Proceedings of the 1989 ACM/IEEE conference on Supercomputing Speaker : Chuan-Cheng Lin Advisor.
Fall 2018, COMP 562 Poster Session
Nora Pierstorff Dept. of Genetics University of Cologne
Genome 540: Discussion Section Week 3
Computational Genomics of Noncoding RNA Genes
Efficient Processing of Top-k Spatial Preference Queries
Presentation transcript:

FISH Fast Identification of Segmental Homology University of North Carolina at Chapel Hill Shian-Gro Wu

Outline IntroductionIntroduction Input dataInput data How it worksHow it works –From markers to features –Form features to grid –Form grid to bolcks

Introduction FISH is software for the fast identification and statistical evaluation of segmental homologs. genome contig gene(marker)

Introduction contigA markers contigB contigA features contigB contigA contigB points contigA contigB blocks

Input data Each map file lists the names and transcriptional orientation (if known) of all the markers on one contig. Example gene namestranscriptional orientation At1g At1g At1g At1g At1g marker

Input data Each match file lists all the homologies between markers in a pair of contigs. Example gene names gene names match score At1g01010At1g At1g01010At1g At1g01010At1g At1g01010At1g At1g01010At1g

From markers to features contigA markers contigB contigA features contigB contigA contigB points contigA contigB blocks

From markers to features step1step1 –positions and transcriptional orientations (when known) of the markers are read from a set of map files, one map file per contig. Markers within each map file must be ordered according to their physical positions on the contig. –Individual homologies between markers are read from a set of match files. There is at least one, and no more than two, such files for each pair of contigs. A,B,C  A&A,A&B,A&C,B&A,B&B………

From markers to features step2step2 –FISH performs detandemization, in which multiple markers may be collapsed into single features. – MIN Score and MAX Dist. markers features a b c d e f g h A B (B) C D (C) E F

From markers to features 1.ScoreAB > MIN Score markAmarkB ScoreAB 2.ScoreAC > MIN Score and ScoreBC > MIN Score markAmarkB ScoreAB markAmarkB ScoreAC markC ScoreBC markAmarkB ScoreAB MAX Dist Range

Form features to grid contigA markers contigB contigA features contigB contigA contigB points contigA contigB blocks

Form features to grid In order to identify segmental homologies, FISH computes a grid for each pair of contigs. Points in the grid represent matches between pairs of features. contigA contigB f A1 f A2 f A3 f A4 f B1 f B2 f B3 f B4 Point A1B2 Point B2A4

Form features to grid Each position in the grid, whether or not a point is present, is called as a cell. cell (contigA,contigB) = feature (contigA) * feature (contigB) cell (contigC,contigC) = feature (contigC) * [feature (contigC) -1] / 2 A B C C

Form features to grid contig markers features contig1 contig2 points cells ….

Form features to grid contigA markers contigB contigA features contigB contigA contigB points contigA contigB blocks

Form grid to bolcks Defining the neighborhood size –FISH measures distance between two points (X i,Y i ) and (X j,Y j ) using the Manhattan distance –In order to be considered neighbors, two points must be closer than m:number of points n:number of cells

Form grid to bolcks m:number of points n:number of cells If T=0.05 dTdT m/n

Result

Form grid to bolcks Choosing among multiple neighborsChoosing among multiple neighbors –It can happen that a point may be in the neighborhood of more than one other point. –FISH ranks the cells within each neighborhood and chooses that neighbor having the highest rank Where n is the number of cells in the point’s neighborhood, d c is the distance of the cell from the point under consideration and w is the weight.

Reference User’s Manual for Fast Identification of Segmental Homologyhttp:// Fast identification and statistical evaluation of segmental homologies in comparative maps abstract/19/suppl_1/i74

Thank You