Rhesus genome annotations Rob Norgren Department of Genetics, Cell Biology and Anatomy University of Nebraska Medical Center.

Slides:



Advertisements
Similar presentations
Analysis of Affymetrix expression data using R on Azure Cloud Anne Owen Department of Mathematical Sciences University of Essex 15/16 March, 2012 SAICG.
Advertisements

© Wiley Publishing All Rights Reserved. Using Nucleotide Sequence Databases.
Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence carry out dideoxy sequencing connect seqs. to make whole chromosomes.
ABSTRACT WormBase is a freely available information resource primarily for the nematode Caenorhabditis elegans but which progressively includes data from.
20,000 GENES IN HUMAN GENOME; WHAT WOULD HAPPEN IF ALL THESE GENES WERE EXPRESSED IN EVERY CELL IN YOUR BODY? WHAT WOULD HAPPEN IF THEY WERE EXPRESSED.
1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.
The Sense of Sequense The Sense of Sequense Chris Evelo BiGCaT Bioinformatics Universiteit Maastricht.
Bioinformatics Workshop.  We started by discussing what bioinformatics is and how it is used  We learned that DNA is the information about an organism.
1 Gene Finding Charles Yan. 2 Gene Finding Genomes of many organisms have been sequenced. We need to translate the raw sequences into knowledge. Where.
Genome Assembly and Annotation Erik Arner Omics Science Center, RIKEN Yokohama, Japan
Displaying associations, improving alignments and gene sets at UCSC Jim Kent and the UCSC Genome Bioinformatics Group.
How to access genomic information using Ensembl August 2005.
Microarrays: Theory and Application By Rich Jenkins MS Student of Zoo4670/5670 Year 2004.
Positional cloning: the rest of the story a a a a a a a a X.
Genome Annotation BCB 660 October 20, From Carson Holt.
Zachary Bendiks. Jonathan Eisen  UC Davis Genome Center  Lab focus: “Our work focuses on genomic basis for the origin of novelty in microorganisms (how.
Affymetrix Resequencing Arrays Matthew Smith Trainee Presentation West Midlands Regional Genetics Laboratory.
Fine Structure and Analysis of Eukaryotic Genes
Bioinformatics.
Mouse Genome Sequencing
Bikash Shakya Emma Lang Jorge Diaz.  BLASTx entire sequence against 9 plant genomes. RepeatMasker  55.47% repetitive sequences  82.5% retroelements.
Genome Annotation BBSI July 14, 2005 Rita Shiang.
Copyright © 2009 Pearson Education, Inc. Art and Photos in PowerPoint ® Concepts of Genetics Ninth Edition Klug, Cummings, Spencer, Palladino Chapter 21.
발표자 석사 2 년 김태형 Vol. 11, Issue 3, , March 2001 Comparative DNA Sequence Analysis of Mouse and Human Protocadherin Gene Clusters 인간과 마우스의 PCDH 유전자.
Next Generation Sequencing and its data analysis challenges Background Alignment and Assembly Applications Genome Epigenome Transcriptome.
Molecular Biology Primer. Starting 19 th century… Cellular biology: Cell as a fundamental building block 1850s+: ``DNA’’ was discovered by Friedrich Miescher.
Remember the limitations? –You must know the sequence of the primer sites to use PCR –How do you go about sequencing regions of a genome about which you.
Biological Motivation for Fragment Assembly Rhys Price Jones Anne R. Haake.
Part I: Identifying sequences with … Speaker : S. Gaj Date
Copyright © 2009 Pearson Education, Inc. Genomics, Bioinformatics, and Proteomics Chapter 21 Lecture Concepts of Genetics Tenth Edition.
RNA Sequencing I: De novo RNAseq
Biological Databases Biology outside the lab. Why do we need Bioinfomatics? Over the past few decades, major advances in the field of molecular biology,
1 Transcript modeling Brent lab. 2 Overview Of Entertainment  Gene prediction Jeltje van Baren  Improving gene prediction with tiling arrays Aaron Tenney.
Genboree Discovery Process Integration Aleksandar Milosavljevic, PhD Baylor College of Medicine January 10 th, 2008; modified April 1 st 2008.
Affymetrix Confidential Transcript Level Expression Profiling from Predicted and Transcribed Sequences with a 5 µm, PM-only Tomato Array.
HA Hong-seok, HUH Jae-Won, KIM Dae-Soo 1, JOO Myung-Jin 2 and KIM Heui-Soo* Division of Biological Sciences, College of Natural Sciences, Pusan National.
From Genomes to Genes Rui Alves.
Curation Tools Gary Williams Sanger Institute. SAB 2008 Gene curation – prediction software Gene prediction software is good, but not perfect. Out of.
Gene Regulatory Networks and Neurodegenerative Diseases Anne Chiaramello, Ph.D Associate Professor George Washington University Medical Center Department.
Plant Biology Division Post-process of IMGAG M.t. 2.0 Release Affymetrix Medicago Probe set – IMGAG 2.0 / MTGI 8.0 Mapping Zhao Bioinformatics Lab.
Genome annotation and search for homologs. Genome of the week Discuss the diversity and features of selected microbial genomes. Link to the paper describing.
Bioinformatics and Computational Biology
An approach to carry out research and teaching in Bioinformatics in remote areas Alok Bhattacharya Centre for Computational Biology & Bioinformatics JAWAHARLAL.
Alternative Splicing (a review by Liliana Florea, 2005) CS 498 SS Saurabh Sinha 11/30/06.
August 20, 2007 BDGP modENCODE Data Production. BDGP Data Production Project Goals 21,000 RACE experiments 6,000 cDNA’s from directed screening and full.
Advisory Board Meeting, Caltech 2004 Genome Sequence Updates. Paul Davis The Sanger Institute.
__________________________________________________________________________________________________ Fall 2015GCBA 815 __________________________________________________________________________________________________.
Transcriptome What is it - genome wide transcript abundance How do you obtain it - Arrays + MPSS What do you do with it when you have it - ?
1 of 28 Evaluating Genes and Transcripts (“Genebuild”)
Accessing and visualizing genomics data
Introduction to Oligonucleotide Microarray Technology
Using DNA Subway in the Classroom Genome Annotation: Red Line.
Virginia Commonwealth University
bacteria and eukaryotes
Introduction to Genes and Genomes with Ensembl
The Transcriptional Landscape of the Mammalian Genome
EGASP 2005 Evaluation Protocol
Primer design.
ENCODE Pseudogenes and Transcription
GEP Annotation Workflow
PlantGDB: Annotation Principles & Procedures
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Identify D. melanogaster ortholog
Bioinformatics Vicki & Joe.
What do you with a whole genome sequence?
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Introduction to Alternative Splicing and my research report
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Presentation transcript:

Rhesus genome annotations Rob Norgren Department of Genetics, Cell Biology and Anatomy University of Nebraska Medical Center

Conventional Approach to GeneChip Production Sequence millions of ESTs Obtain finished genomic sequences Cluster redundant ESTs Align EST clusters with genomic sequences Extract the last 571 bp of sequence from each transcript - probe selection region (PSR) Choose 11 to 16 probes that tile across the PSR

Problems with the conventional approaches for a rhesus macaque GeneChip Insufficient ESTs to cover most genes Little finished genomic sequence (in 2005)

Strategy for targeted amplification of rhesus genes Identify the terminal exon and flanking sequence for every human gene Design primers and amplify from monkey genomic DNA Obtain the rhesus PSR sequences Terminal exon PSR F R Poly A PSR: Probe selection region F: forward primer R: reverse primer

Other sources for rhesus GeneChip PSRs Preliminary Baylor Genomic Sequences In silico approach - Aligned human PSRs with preliminary rhesus genomic sequence. ESTs

Rhesus GeneChip Available in March 2005 Novel design Whole genome expression array - 52,024 probes for 47,000 transcripts Probesets include 17,093 well-annotated genes (16 probes/probeset) Probesets were designed for 1,099 well-annotated genes not present on the U human GeneChip.

Rhesus Genome Draft published in Science on April 17, 2007 “The rhesus macaque genome assembly is a draft DNA sequence, and it contains many gaps.”

What does a “draft” rhesus genome mean? 26,907 protein coding genes for the human 24,038 protein coding genes for rhesus macaques Sounds good, but is misleading. 19,450 well-annotated protein coding genes for humans 8,744 well-annotated protein coding genes for rhesus macaques What does “well annotated” mean”? No “hypothetical” genes Only genes with “good” gene symbols. No “Locs”.

Problems with GeneChip annotations Affymetrix relies on NCBI annotations, hence, many probesets are not annotated with “real” gene symbols Stop gap solution: Permanent solution requires full and complete annotation of the rhesus genome at NCBI.

What can go wrong at the genome sequencing center? Large gaps Small gaps Misassemblies Sequencing errors

What can go wrong with ab initio annotations? Incorrect assignment of pseudogene status Failure to identify genes Incorrect gene models (some exons right, some wrong) Incomplete gene models

Consequences of non-annotated genes Large number of databases depend on NCBI annotations for their annotations. Example: Affymetrix GeneChips Errors and omissions are propagated to dependent databases Users are frustrated when they see “Locs” instead of a proper gene symbol Users can Blast each probeset consensus sequence or ask their bioinformatics personnel to establish gene identity, but this is wasteful in time and energy.

How to correct annotations Annotations must be acceptable to NCBI, if they are not, corrections will not propagate to dependent databases. Some gene annotations can be corrected by manual inspection. Some gene annotations can be corrected by human ortholog-based gene models rather than ab initio approaches. Some gene annotations can only be corrected by additional sequencing. And some gene annotations require a trip to Hell...

Defensins - the gene family from Hell Large family of genes Orthologs poorly conserved - positive selection? Will require focused sequencing and annotation May require publication before NCBI annotates most of the rhesus defensins

Acknowledgements Jeff Kittrell Joel Goodsell Audrey Gomel NCRR/NIH