R ESEARCH G ENOME B IOINFORMATICS L AB R ESEARCH at G ENOME B IOINFORMATICS L AB Josep F. Abril Ferrando and Genís Parra Farré Genome BioInformatics Research.

Slides:



Advertisements
Similar presentations
Peter Tsai, Bioinformatics Institute.  University of California, Santa Cruz (UCSC)  A rapid and reliable display of any requested portion of genomes.
Advertisements

Gene prediction in ENCODE roderic guigó i serra crg-imim-upf, barcelona Advanced Bioinformatics, chsl, october 2005.
Bioinformatics at WSU Matt Settles Bioinformatics Core Washington State University Wednesday, April 23, 2008 WSU Linux User Group (LUG)‏
McPromoter – an ancient tool to predict transcription start sites
Genome Bioinformatics Tyler Alioto Center for Genomic Regulation Barcelona, Spain.
Jul /16/08Bioinformatics Workshop - Malaga Genome Bioinformatics Tyler Alioto Center for Genomic Regulation Barcelona, Spain.
Reese, E-GASP Short comparion GASP ‘99- EGASP ‘05 Martin Reese Omicia Inc Horton Street Emeryville, CA
Comparative ab initio prediction of gene structures using pair HMMs
EVOLUTIONARY AND COMPUTATIONAL GENOMICS Shin-Han Shiu Plant Biology / CMB / EEBB / Genetics / QBMI.
Genome Annotation BCB 660 October 20, From Carson Holt.
Mouse Genome Sequencing
The Ensembl Gene set The “Genebuild” 21 April 2008.
PZ07A Programming Language design and Implementation -4th Edition Copyright©Prentice Hall, PZ07A - Expressions Programming Language Design and Implementation.
Genome Annotation using MAKER-P at iPlant Collaboration with Mark Yandell Lab (University of Utah) iPlant: Josh Stein (CSHL) Matt Vaughn.
Appendixes 4. An Introduction to PostScript ® CVG Lab.
What is comparative genomics? Analyzing & comparing genetic material from different species to study evolution, gene function, and inherited disease Understand.
Genome Annotation BBSI July 14, 2005 Rita Shiang.
Gene prediction in flies ● Background ● Gene prediction pipeline ● Resources.
BASys: A Web Server for Automated Bacterial Genome Annotation Gary Van Domselaar †, Paul Stothard, Savita Shrivastava, Joseph A. Cruz, AnChi Guo, Xiaoli.
COURSE OF BIOINFORMATICS Exam_31/01/2014 A.
Overview What is PostScript? Types Language Concepts Control Operators Examples.
Part I: Identifying sequences with … Speaker : S. Gaj Date
Browsing the Genome Using Genome Browsers to Visualize and Mine Data.
Web Databases for Drosophila Introduction to FlyBase and Ensembl Database Wilson Leung6/06.
Finding genes by comparing genomes roderic guigó i serra imim/upf/crg, barcelona.
Quantitative analysis of various PCDH11 X/Y gene transcripts Kung Ahn 1, Jae-Won Huh 1, Dae-Soo Kim 2, and Heui-Soo Kim 1,2 1 Division of Biological Sciences,
PZ07A Programming Language design and Implementation -4th Edition Copyright©Prentice Hall, PZ07A - Expressions Programming Language Design and Implementation.
PZ07A Programming Language design and Implementation -4th Edition Copyright©Prentice Hall, PZ07A - Expressions Programming Language Design and Implementation.
Mark D. Adams Dept. of Genetics 9/10/04
Introduction to ab initio and evidence-based gene finding Wilson Leung08/2015.
Gene prediction roderic guigó i serra IMIM/UPF/CRG.
1 Expressions Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Section
IS 1014 Introduction to Computer Graphics -- Paul Munro A Postscript Tutorial Book available at: cdf.fnal.gov/offline/PostScript/BLUEBOOK.PDF.
Analyzing digital gene expression data in Galaxy Supervisors: Peter-Bram A.C. ’t Hoen Kostas Karasavvas Students: Ilya Kurochkin Ivan Rusinov.
Gene discovery using combined signals from genome sequence and natural selection Michael Brent Washington University The mouse genome analysis group.
JIGSAW: a better way to combine predictions J.E. Allen, W.H. Majoros, M. Pertea, and S.L. Salzberg. JIGSAW, GeneZilla, and GlimmerHMM: puzzling out the.
August 20, 2007 BDGP modENCODE Data Production. BDGP Data Production Project Goals 21,000 RACE experiments 6,000 cDNA’s from directed screening and full.
Tweaking BLAST Although you normally see BLAST as a web page with boxes to place data in and tick boxes, etc., it is actually a command line program that.
Introduction to DNA. Question: From your on-line computer activity, what do you know about the structure of DNA?
SRB Genome Assembly and Analysis From 454 Sequences HC70AL S Brandon Le & Min Chen.
Expressions Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Section
Genome Annotation Assessment in Drosophila melanogaster by Reese, M. G., et al. Summary by: Joe Reardon Swathi Appachi Max Masnick Summary of.
1 Introduction to PostScript Sep. 21 Dae-Eun Hyun 3D MAP Lab.
Tools in Bioinformatics Genome Browsers. Retrieving genomic information Previous lesson(s): annotation-based perspective of search/data Today: genomic-based.
Annotation of eukaryotic genomes
What is BLAST? Basic BLAST search What is BLAST?
BIOINFORMATICS Ayesha M. Khan Spring 2013 Lec-8.
Basics of Genome Annotation Daniel Standage Biology Department Indiana University.
What is BLAST? Basic BLAST search What is BLAST?
CS515: Bioinformatic Algorithms
Introduction to Genes and Genomes with Ensembl
Page description language from Adobe
Genomic Data Manipulation
Bioinformatics and BLAST
Prediction of selenoprotein genes in eukaryotic genomes roderic guigó i serra, bioinformatica, UPF curs 2005/ /29/2018 Bioinformatica UPF març.
Genome organization and Bioinformatics
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Hidden Markov Models in Bioinformatics min
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Identify D. melanogaster ortholog
closing in on the set of human genes. The ENCODE project.
Expressions Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Section
Gene Safari (Biological Databases)
Summarized by Sun Kim SNU Biointelligence Lab.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Introduction to Alternative Splicing and my research report
Expressions Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Section
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Expressions Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Section
Presentation transcript:

R ESEARCH G ENOME B IOINFORMATICS L AB R ESEARCH at G ENOME B IOINFORMATICS L AB Josep F. Abril Ferrando and Genís Parra Farré Genome BioInformatics Research Lab ( IMIM – UPF – CRG )

Introduction Visualization of Genomic Annotations Comparative Genomics Human and Mouse Genomes Exon Structural Selection BIOINFORMÀTICA UPF T23 – 2003/03/06 – J.F. Abril and G. Genome BioInformatics Lab – RGBI (IMIM-UPF-CRG) SUMMARY

Computational Analysis of Genomic Sequences DNA SEQUENCE Sequencing ASSEMBLED SEQUENCE Assembling ANNOTATED SEQUENCE Analyzing BIOINFORMÀTICA UPF T23 – 2003/03/06 – J.F. Abril and G. Genome BioInformatics Lab – RGBI (IMIM-UPF-CRG)

From Genes to Genomes: Single Genes BIOINFORMÀTICA UPF T23 – 2003/03/06 – J.F. Abril and G. Genome BioInformatics Lab – RGBI (IMIM-UPF-CRG)

From Genes to Genomes: Chromosomes BIOINFORMÀTICA UPF T23 – 2003/03/06 – J.F. Abril and G. Genome BioInformatics Lab – RGBI (IMIM-UPF-CRG)

From Genes to Genomes: Whole Genomes BIOINFORMÀTICA UPF T23 – 2003/03/06 – J.F. Abril and G. Genome BioInformatics Lab – RGBI (IMIM-UPF-CRG)

Comparative Genomics: Single Genes BIOINFORMÀTICA UPF T23 – 2003/03/06 – J.F. Abril and G. Genome BioInformatics Lab – RGBI (IMIM-UPF-CRG)

Comparative Genomics: Syntenic Regions BIOINFORMÀTICA UPF T23 – 2003/03/06 – J.F. Abril and G. Genome BioInformatics Lab – RGBI (IMIM-UPF-CRG)

Programming in P OST S CRIPT (I) BIOINFORMÀTICA UPF T23 – 2003/03/06 – J.F. Abril and G. Genome BioInformatics Lab – RGBI (IMIM-UPF-CRG) %!PS % % Variable Definition: $counter = 0 /counter 0 def % % Function Definition: sub box(x,y) {...} /box { %% y x box gsave % 20 mul % y X 0 % y X 0 moveto % y 20 mul % Y dup % Y Y 10 0 % Y Y 10 0 rlineto % Y Y 0 % Y Y 0 exch % Y 0 Y rlineto % Y % Y rlineto % Y neg % -Y 0 % -Y 0 exch % 0 -Y rlineto % closepath % % setrgbcolor % "green-color" fill % grestore % } def % Vector Graphics Language  Prefix Notation  Stacks: exec, paths, dicts,...  Dictionaries: Identifier Object

% % Initialization translate % New Coords Origin 2 5 scale % Re-scaling x-axes*2 % % y-axes*5 % % BaseLine gsave % 0 0 moveto % 90 0 lineto % 0 setgray % 1 setlinewidth % stroke % grestore % Programming in P OST S CRIPT (II) BIOINFORMÀTICA UPF T23 – 2003/03/06 – J.F. Abril and G. Genome BioInformatics Lab – RGBI (IMIM-UPF-CRG) % % Main Loop mark % mark % mark counttomark % mark { %%%%%%% begin loop (x3) /counter % counter % 1 add % def % $counter = $counter + 1 counter % % 1st loop: mark counter==1 % 2nd loop: mark counter==2 % 2nd loop: mark 0.25 counter==3 box % mark... } repeat %%%%%%% finish loop (x3) pop % clean up stack (removes "mark") % showpage %EOF%

GFF2PS and GFF2APLOT BIOINFORMÀTICA UPF T23 – 2003/03/06 – J.F. Abril and G. Genome BioInformatics Lab – RGBI (IMIM-UPF-CRG)

Visualizing Genomic Annotations J.F. Abril and R. Guigó. " gff2ps: visualizing genomic annotations " Bioinformatics 16(8): (2000). M.G. Reese, G. Hartzell, N.L. Harris, U. Ohler, J.F. Abril and S.E. Lewis. " Genome Annotation Assessment in Drosophila melanogaster " Genome Research 10(4): (2000). M.D. Adams et al (including J.F. Abril). " The Genome Sequence of Drosophila melanogaster " Science 287(5461): (2000). J.C. Venter et al (including J.F. Abril and R. Guigó). " The Sequence of the Human Genome " Science 291(5507): (2001). R.A. Holt et al (including J.F. Abril and R. Guigó). " The Genome Sequence of the Malaria Mosquito Anopheles gambiae " Science 298(5591): (2002). BIOINFORMÀTICA UPF T23 – 2003/03/06 – J.F. Abril and G. Genome BioInformatics Lab – RGBI (IMIM-UPF-CRG)

Whole Genome Gene-Finding Homo sapiens GENES ab initio DATABASE homology BIOINFORMÀTICA UPF T23 – 2003/03/06 – J.F. Abril and G. Genome BioInformatics Lab – RGBI (IMIM-UPF-CRG)

Whole Genome Gene-Finding: Comparative Approach BIOINFORMÀTICA UPF T23 – 2003/03/06 – J.F. Abril and G. Genome BioInformatics Lab – RGBI (IMIM-UPF-CRG)

Whole Genome Gene-Finding: Comparative Approach GENES Homo sapiens Mus musculus GENES homology gene prediction gene prediction homology BIOINFORMÀTICA UPF T23 – 2003/03/06 – J.F. Abril and G. Genome BioInformatics Lab – RGBI (IMIM-UPF-CRG)

Whole Genome Gene-Finding Results Analysis BIOINFORMÀTICA UPF T23 – 2003/03/06 – J.F. Abril and G. Genome BioInformatics Lab – RGBI (IMIM-UPF-CRG)

Human and Mouse Comparative Genomics BIOINFORMÀTICA UPF T23 – 2003/03/06 – J.F. Abril and G. Genome BioInformatics Lab – RGBI (IMIM-UPF-CRG) Mouse Genome Sequencing Consortium (including J.F. Abril, G. Parra and R. Guigó). " Initial sequencing and comparative analysis of the mouse genome " Nature 420(6915): (2002). G. Parra, P. Agarwal, J.F. Abril, T. Wiehe, J.W. Fickett and R. Guigó. " Comparative gene prediction in human and mouse " Genome Research 13(1): (2003). R. Guigó, E.T. Dermitzakis, P. Agarwal, C.P. Ponting, G. Parra, A. Reymond, J.F. Abril, E. Keibler, R. Lyle, C. Ucla, S.E. Antonarakis and M.R. Brent. " Comparison of mouse and human genomes followed by experimental verification yields an estimated 1,019 additional genes " PNAS 100(3): (2003).

Predicting “Novel” Genes in the Mouse Genome (I) golden path annotations additional blastn matches to ENSEMBL + REFSEQ BIOINFORMÀTICA UPF T23 – 2003/03/06 – J.F. Abril and G. Genome BioInformatics Lab – RGBI (IMIM-UPF-CRG)

Predicting “Novel” Genes in the Mouse Genome (II) tblastx geneid exons tblastx sgp genes BIOINFORMÀTICA UPF T23 – 2003/03/06 – J.F. Abril and G. Genome BioInformatics Lab – RGBI (IMIM-UPF-CRG) additional blastn matches to ENSEMBL + REFSEQ

Homo sapiens Predictions Mus musculus Predictions GENES Enriched Pool Structural Alignment Exstral BIOINFORMÀTICA UPF T23 – 2003/03/06 – J.F. Abril and G. Genome BioInformatics Lab – RGBI (IMIM-UPF-CRG) Homology Blastp Homology and Gene Structure Filtering

BIOINFORMÀTICA UPF T23 – 2003/03/06 – J.F. Abril and G. Genome BioInformatics Lab – RGBI (IMIM-UPF-CRG) Exon Structure over an Alignment

BIOINFORMÀTICA UPF T23 – 2003/03/06 – J.F. Abril and G. Genome BioInformatics Lab – RGBI (IMIM-UPF-CRG) RT-PCR Validation

Number of predictions TestedSuccess Rate Enriched % Similar % Other % BIOINFORMÀTICA UPF T23 – 2003/03/06 – J.F. Abril and G. Genome BioInformatics Lab – RGBI (IMIM-UPF-CRG) Results of the Experimental Validation

BIOINFORMÀTICA UPF T23 – 2003/03/06 – J.F. Abril and G. Genome BioInformatics Lab – RGBI (IMIM-UPF-CRG) Example of a Bash Script