Survey of Misannotations and

Slides:



Advertisements
Similar presentations
One-Gene-One-Enzyme, Pseudogenes & Common Ancestry
Advertisements

4: Genome evolution. Types of Genomic Duplications Part of an exon or the entire exon is duplicated Complete gene duplication Partial chromosome duplication.
Predicting the Function of Single Nucleotide Polymorphisms Corey Harada Advisor: Eleazar Eskin.
BME 130 – Genomes Lecture 7 Genome Annotation I – Gene finding & function predictions.
Protein Evolution Jean Yeh, SoCalBSI Mike Thompson, UCLA Summer 2005.
Sequence Analysis. Today How to retrieve a DNA sequence? How to search for other related DNA sequences? How to search for its protein sequence? How to.
Arabidopsis Gene Project GK-12 April Workshop Karolyn Giang and Dr. Mulligan.
Bikash Shakya Emma Lang Jorge Diaz.  BLASTx entire sequence against 9 plant genomes. RepeatMasker  55.47% repetitive sequences  82.5% retroelements.
RNA and Protein Synthesis
Functionality of pack-mule sequences in Rice genome Kousuke Hanada 9/21/’06.
DNA PACKAGING. 8 histones make up the nucleosome core DNA wraps twice around the 8 histones Histone 1 helps maintain the nucleosome DNA is negatively.
ANALYSIS AND VISUALIZATION OF SINGLE COPY ORTHOLOGS IN ARABIDOPSIS, LETTUCE, SUNFLOWER AND OTHER PLANT SPECIES. Alexander Kozik and Richard W. Michelmore.
DNA TO RNA Transcription is the process of creating a molecule that can carry the genetic blueprint for a particular protein coding gene from the DNA.
Non-Coding Areas & Mutations Within the human genome the majority of the DNA (~75%) is made up of sequences not involved in coding for proteins, RNA, or.
Fea- ture Num- ber Feature NameFeature description 1 Average number of exons Average number of exons in the transcripts of a gene where indel is located.
Review of Protein Synthesis. Fig TRANSCRIPTION TRANSLATION DNA mRNA Ribosome Polypeptide (a) Bacterial cell Nuclear envelope TRANSCRIPTION RNA PROCESSING.
Types of mutations Mutations are changes in the genetic material
Key Area 1.6 (a) and (b) Gene Mutations. Learning Outcomes.
Bioinformatics Workshops 1 & 2 1. use of public database/search sites - range of data and access methods - interpretation of search results - understanding.
The iPlant Collaborative Vision Enable life science researchers and educators to use and extend cyberinfrastructure.
GENOME: an organism’s complete set of genetic material In humans, ~3 billion base pairs CHROMOSOME: Part of the genome; structure that holds tightly wound.
Introduction A mutation is a change in the normal DNA sequence. They are usually neutral, having no effect on the fitness of the organism. Sometimes,
8.7 Mutations A mutation is a change in an organism’s DNA. This may or may not affect phenotype.
Chapter 11 Review. Explain the difference between each of the following 1. Operator, promoter -Operator: DNA segment where an inhibitor protein binds.
VI. Mutation A.Overview B.Changes in Ploidy C.Changes in ‘Aneuploidy’ (changes in chromosome number) D. Change in Gene Number/Arrangement.
Mutations and Gene Regulation Chapter 12 Sections 4-5.
Chromosomes/DNA Mutations. Chromosome Mutation Mutations are permanent gene or chromosome changes that will be passed on to offspring if they occur in.
Genetic Code and Interrupted Gene Chapter 4. Genetic Code and Interrupted Gene Aala A. Abulfaraj.
Alternative Splicing. mRNA Splicing During RNA processing internal segments are removed from the transcript and the remaining segments spliced together.
Shin-Han Shiu and Melissa D. Lehti-Shiu Department of Plant Biology
Genetics and Evolutionary Biology
Basics of Comparative Genomics
EL: To find out what a genome is and how gene expression is regulated
GENETIC MUTATIONS Section 5.6 Pg. 259.
BTY100-Lec#4.2 DNA to Protein (Central Dogma).
Warm Up 11/30/15 What organelle is responsible for protein synthesis?
Chapter 4 – proteins, mutations & genetic disorders
Types of Mutations.
Small RNA and Cyanobacteria
MUTATIONS.
Visualization of genomic data
The Alfin-like PHD Zinc Finger Transcription Factor Family
Central Dogma.
In: What are INTRONS and EXONS again?
Transcription and Translation
Genetic Variation.
Exam 3 Mini Review.
What are the Patterns Of Nucleotide Substitution Within Coding and
One-Gene-One-Enzyme, Pseudogenes & Common Ancestry
Mutations changes in the DNA sequence that can be inherited
Mutations & Genetic Engineering
Chapter 9 Organization of the Human Genome
Mutations.
Some mutations affect a single gene, while others affect an entire chromosome.
Gene expression and regulation & Mutations
MUTATIONS.
BLAT Blast Like Alignment Tool
Gene Expression Practice Test
GT repeats are unique to Cdk6 and are conserved in different mammals.
Mutations Section 12-4 Pages
1. Unequal Crossing-Over a. process: If homologs line up askew:
MUTATIONS.
Standard Mutation Nomenclature in Molecular Diagnostics
Basics of Comparative Genomics
Mutation Notes.
Introduction to Alternative Splicing and my research report
13.3 Mutations.
Basic Local Alignment Search Tool
DNA Crash course…..
Structure of the IFL1 Gene and the Nature of the Mutations in the ifl1 Alleles.(A) A schematic representation of the exon and intron organization of the.
Presentation transcript:

Survey of Misannotations and Pseudogenes in the Arabidopsis Genome Tanmay Prakash

Objectives Objectives Find Possible Misannotations Find Possible Pseudogenes Why Misannotation can hinder research Pseudogenes can be used to study natural selection

Misannotations CDS Intron UTR Many misannotations are the result of gene prediction programs mislabeling introns because of the presence of a stop codon

Pseudogenes Pseudogenes are DNA sequences that no longer function but resemble the functional genes they once were. There are two types: Processed Non-processed Common Properties of Pseudogenes Stop Codons Frameshift mutations Lack of Selective Pressure Processed:formed by retrotransposition and comprise most of the pseudogenes in mammals Non-processed:products of duplication of the entirety of portion of a segment of genes followed by mutations. Because polyploidiszation (the process of having more one sets of chromosomes) is common in plants, the majority of pseudogenes in plants are non-processed Lack of Selective Pressure: Measured using Ka/Ks. Ka(nonsyn) Ks(syn). Functional genes have more syn so Ka/Ks significantly less than one. Pseudogenes don’t care so Ka/Ks significantly closer one. Because pseudogenes have these stop codons and frameshift mutations, the gene prediction programs often misannotate them agtacatgcataggactcgatcgactc STCIGLDRL agtacatgataggactcgatcgactc ST..DSID

Pipeline Query Protein Domains Genes BLAST Matching Search In Introns Subject Arabidopsis Introns BLAST Search HMMER CDS Genes Matching In Introns In CDS In Both Possibly Misannotated Check for Stop Codons Frameshift Check Ka/Ks Possible Pseudogenes

Query Protein Domains Genes BLAST Matching Search In Introns Subject Arabidopsis Introns Query Protein Domains HMMER Search Genes Matching In Exons Subject Arabidopsis CDS Each of the 8296 protein domain families is searched against the introns of the 25000 genes of the Arabidopsis genome. This finds any introns where there are matches to a protein domain. This is done also for the coding sequence of the Arabidopsis genome, but using a HMMER search. A HMMER search would’ve been used for the intron search, but it would take far too long. This search found matches to any domains in the coding sequence.

Genes Possibly Matching Misannotated In Both Genes Genes that don’t have matches to the same domain in both the introns and the coding sequence are then filtered out. These genes are possibly misannotated. These genes were further filtered to leave the genes that had matches in an intron and its flanking exons. These introns will be checked for stop codons and frameshift mutations. The Ka/Ks value will also be checked. This information will be used to identify pseudogenes.

Results There were 346 genes (different models not included) that had matches to the same domain in the introns and exons There were 299 genes (different models not included) that had matches to the same domain in an intron and flanking exons. These are most likely misannotations.

4 domains with the most possible misannotations

Future Research Identify pseudogenes by looking for stop codons, and frameshift mutations in the introns and checking the Ka/Ks value Use a more recent database of domains Follow the same process for the rice genome

Acknowledgement Dr. Shin-Han Shiu Dr. Kosuke Hanada Dr. Melissa Lehti-Shiu Dr. Gail Richmond HSHSP