R ESEARCH G ENOME B IOINFORMATICS L AB R ESEARCH at G ENOME B IOINFORMATICS L AB Josep F. Abril Ferrando and Genís Parra Farré Genome BioInformatics Research Lab ( IMIM – UPF – CRG )
Introduction Visualization of Genomic Annotations Comparative Genomics Human and Mouse Genomes Exon Structural Selection BIOINFORMÀTICA UPF T23 – 2003/03/06 – J.F. Abril and G. Genome BioInformatics Lab – RGBI (IMIM-UPF-CRG) SUMMARY
Computational Analysis of Genomic Sequences DNA SEQUENCE Sequencing ASSEMBLED SEQUENCE Assembling ANNOTATED SEQUENCE Analyzing BIOINFORMÀTICA UPF T23 – 2003/03/06 – J.F. Abril and G. Genome BioInformatics Lab – RGBI (IMIM-UPF-CRG)
From Genes to Genomes: Single Genes BIOINFORMÀTICA UPF T23 – 2003/03/06 – J.F. Abril and G. Genome BioInformatics Lab – RGBI (IMIM-UPF-CRG)
From Genes to Genomes: Chromosomes BIOINFORMÀTICA UPF T23 – 2003/03/06 – J.F. Abril and G. Genome BioInformatics Lab – RGBI (IMIM-UPF-CRG)
From Genes to Genomes: Whole Genomes BIOINFORMÀTICA UPF T23 – 2003/03/06 – J.F. Abril and G. Genome BioInformatics Lab – RGBI (IMIM-UPF-CRG)
Comparative Genomics: Single Genes BIOINFORMÀTICA UPF T23 – 2003/03/06 – J.F. Abril and G. Genome BioInformatics Lab – RGBI (IMIM-UPF-CRG)
Comparative Genomics: Syntenic Regions BIOINFORMÀTICA UPF T23 – 2003/03/06 – J.F. Abril and G. Genome BioInformatics Lab – RGBI (IMIM-UPF-CRG)
Programming in P OST S CRIPT (I) BIOINFORMÀTICA UPF T23 – 2003/03/06 – J.F. Abril and G. Genome BioInformatics Lab – RGBI (IMIM-UPF-CRG) %!PS % % Variable Definition: $counter = 0 /counter 0 def % % Function Definition: sub box(x,y) {...} /box { %% y x box gsave % 20 mul % y X 0 % y X 0 moveto % y 20 mul % Y dup % Y Y 10 0 % Y Y 10 0 rlineto % Y Y 0 % Y Y 0 exch % Y 0 Y rlineto % Y % Y rlineto % Y neg % -Y 0 % -Y 0 exch % 0 -Y rlineto % closepath % % setrgbcolor % "green-color" fill % grestore % } def % Vector Graphics Language Prefix Notation Stacks: exec, paths, dicts,... Dictionaries: Identifier Object
% % Initialization translate % New Coords Origin 2 5 scale % Re-scaling x-axes*2 % % y-axes*5 % % BaseLine gsave % 0 0 moveto % 90 0 lineto % 0 setgray % 1 setlinewidth % stroke % grestore % Programming in P OST S CRIPT (II) BIOINFORMÀTICA UPF T23 – 2003/03/06 – J.F. Abril and G. Genome BioInformatics Lab – RGBI (IMIM-UPF-CRG) % % Main Loop mark % mark % mark counttomark % mark { %%%%%%% begin loop (x3) /counter % counter % 1 add % def % $counter = $counter + 1 counter % % 1st loop: mark counter==1 % 2nd loop: mark counter==2 % 2nd loop: mark 0.25 counter==3 box % mark... } repeat %%%%%%% finish loop (x3) pop % clean up stack (removes "mark") % showpage %EOF%
GFF2PS and GFF2APLOT BIOINFORMÀTICA UPF T23 – 2003/03/06 – J.F. Abril and G. Genome BioInformatics Lab – RGBI (IMIM-UPF-CRG)
Visualizing Genomic Annotations J.F. Abril and R. Guigó. " gff2ps: visualizing genomic annotations " Bioinformatics 16(8): (2000). M.G. Reese, G. Hartzell, N.L. Harris, U. Ohler, J.F. Abril and S.E. Lewis. " Genome Annotation Assessment in Drosophila melanogaster " Genome Research 10(4): (2000). M.D. Adams et al (including J.F. Abril). " The Genome Sequence of Drosophila melanogaster " Science 287(5461): (2000). J.C. Venter et al (including J.F. Abril and R. Guigó). " The Sequence of the Human Genome " Science 291(5507): (2001). R.A. Holt et al (including J.F. Abril and R. Guigó). " The Genome Sequence of the Malaria Mosquito Anopheles gambiae " Science 298(5591): (2002). BIOINFORMÀTICA UPF T23 – 2003/03/06 – J.F. Abril and G. Genome BioInformatics Lab – RGBI (IMIM-UPF-CRG)
Whole Genome Gene-Finding Homo sapiens GENES ab initio DATABASE homology BIOINFORMÀTICA UPF T23 – 2003/03/06 – J.F. Abril and G. Genome BioInformatics Lab – RGBI (IMIM-UPF-CRG)
Whole Genome Gene-Finding: Comparative Approach BIOINFORMÀTICA UPF T23 – 2003/03/06 – J.F. Abril and G. Genome BioInformatics Lab – RGBI (IMIM-UPF-CRG)
Whole Genome Gene-Finding: Comparative Approach GENES Homo sapiens Mus musculus GENES homology gene prediction gene prediction homology BIOINFORMÀTICA UPF T23 – 2003/03/06 – J.F. Abril and G. Genome BioInformatics Lab – RGBI (IMIM-UPF-CRG)
Whole Genome Gene-Finding Results Analysis BIOINFORMÀTICA UPF T23 – 2003/03/06 – J.F. Abril and G. Genome BioInformatics Lab – RGBI (IMIM-UPF-CRG)
Human and Mouse Comparative Genomics BIOINFORMÀTICA UPF T23 – 2003/03/06 – J.F. Abril and G. Genome BioInformatics Lab – RGBI (IMIM-UPF-CRG) Mouse Genome Sequencing Consortium (including J.F. Abril, G. Parra and R. Guigó). " Initial sequencing and comparative analysis of the mouse genome " Nature 420(6915): (2002). G. Parra, P. Agarwal, J.F. Abril, T. Wiehe, J.W. Fickett and R. Guigó. " Comparative gene prediction in human and mouse " Genome Research 13(1): (2003). R. Guigó, E.T. Dermitzakis, P. Agarwal, C.P. Ponting, G. Parra, A. Reymond, J.F. Abril, E. Keibler, R. Lyle, C. Ucla, S.E. Antonarakis and M.R. Brent. " Comparison of mouse and human genomes followed by experimental verification yields an estimated 1,019 additional genes " PNAS 100(3): (2003).
Predicting “Novel” Genes in the Mouse Genome (I) golden path annotations additional blastn matches to ENSEMBL + REFSEQ BIOINFORMÀTICA UPF T23 – 2003/03/06 – J.F. Abril and G. Genome BioInformatics Lab – RGBI (IMIM-UPF-CRG)
Predicting “Novel” Genes in the Mouse Genome (II) tblastx geneid exons tblastx sgp genes BIOINFORMÀTICA UPF T23 – 2003/03/06 – J.F. Abril and G. Genome BioInformatics Lab – RGBI (IMIM-UPF-CRG) additional blastn matches to ENSEMBL + REFSEQ
Homo sapiens Predictions Mus musculus Predictions GENES Enriched Pool Structural Alignment Exstral BIOINFORMÀTICA UPF T23 – 2003/03/06 – J.F. Abril and G. Genome BioInformatics Lab – RGBI (IMIM-UPF-CRG) Homology Blastp Homology and Gene Structure Filtering
BIOINFORMÀTICA UPF T23 – 2003/03/06 – J.F. Abril and G. Genome BioInformatics Lab – RGBI (IMIM-UPF-CRG) Exon Structure over an Alignment
BIOINFORMÀTICA UPF T23 – 2003/03/06 – J.F. Abril and G. Genome BioInformatics Lab – RGBI (IMIM-UPF-CRG) RT-PCR Validation
Number of predictions TestedSuccess Rate Enriched % Similar % Other % BIOINFORMÀTICA UPF T23 – 2003/03/06 – J.F. Abril and G. Genome BioInformatics Lab – RGBI (IMIM-UPF-CRG) Results of the Experimental Validation
BIOINFORMÀTICA UPF T23 – 2003/03/06 – J.F. Abril and G. Genome BioInformatics Lab – RGBI (IMIM-UPF-CRG) Example of a Bash Script