EXPLORING DEAD GENES Adrienne Manuel I400. What are they? Dead Genes are also called Pseudogenes Pseudogenes are non functioning copies of genes in DNA.

Slides:



Advertisements
Similar presentations
Chap. 6 Problem 2 Protein coding genes are grouped into the classes known as solitary (single) genes, and duplicated or diverged genes in gene families.
Advertisements

Using phylogenetic profiles to predict protein function and localization As discussed by Catherine Grasso.
Homology Based Analysis of the Human/Mouse lncRNome
Describe the structure of a nucleosome, the basic unit of DNA packaging in eukaryotic cells.
Basics of Comparative Genomics Dr G. P. S. Raghava.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. CHAPTER 18 LECTURE SLIDES.
1 Gene Finding Charles Yan. 2 Gene Finding Genomes of many organisms have been sequenced. We need to translate the raw sequences into knowledge. Where.
CHAPTER 15 Microbial Genomics Genomic Cloning Techniques Vectors for Genomic Cloning and Sequencing MS2, RNA virus nt sequenced in 1976 X17, ssDNA.
Protein Modules An Introduction to Bioinformatics.
Similar Sequence Similar Function Charles Yan Spring 2006.
Goals of the Human Genome Project determine the entire sequence of human DNA identify all the genes in human DNA store this information in databases improve.
Scaffold Download free viewer:
Online Counseling Resource YCMOU ELearning Drive… School of Architecture, Science and Technology Yashwantrao C havan Maharashtra Open University, Nashik.
Genome Evolution: Duplication (Paralogs) & Degradation (Pseudogenes)
Assessment of sequence alignment Lecture Introduction The Dot plot Matrix visualisation matching tool: – Basics of Dot plot – Examples of Dot plot.
© Wiley Publishing All Rights Reserved.
Bikash Shakya Emma Lang Jorge Diaz.  BLASTx entire sequence against 9 plant genomes. RepeatMasker  55.47% repetitive sequences  82.5% retroelements.
What is comparative genomics? Analyzing & comparing genetic material from different species to study evolution, gene function, and inherited disease Understand.
Chapter 5 Genome Sequences and Gene Numbers. 5.1Introduction  Genome size vary from approximately 470 genes for Mycoplasma genitalium to 25,000 for human.
Protein Sequence Alignment and Database Searching.
Sequence Alignment Techniques. In this presentation…… Part 1 – Searching for Sequence Similarity Part 2 – Multiple Sequence Alignment.
Library screening Heterologous and homologous gene probes Differential screening Expression library screening.
GeneWise and Artemis Exercises Spliced Alignment using GeneWise Click on the GeneWise hyperlink on the course links page,
Regulatory Genomics Lab Saurabh Sinha Regulatory Genomics Lab v1 | Saurabh Sinha1 Powerpoint by Casey Hanson.
20.1 Structural Genomics Determines the DNA Sequences of Entire Genomes The ultimate goal of genomic research: determining the ordered nucleotide sequences.
Discussion Points for 2 nd Pseudogene Call Mark Gerstein 2005, :00 EST.
ANALYSIS AND VISUALIZATION OF SINGLE COPY ORTHOLOGS IN ARABIDOPSIS, LETTUCE, SUNFLOWER AND OTHER PLANT SPECIES. Alexander Kozik and Richard W. Michelmore.
DNA sequencing. Dideoxy analogs of normal nucleotide triphosphates (ddNTP) cause premature termination of a growing chain of nucleotides. ACAGTCGATTG ACAddG.
Last lecture summary. Window size? Stringency? Color mapping? Frame shifts?
Genomes & their evolution Ch 21.4,5. About 1.2% of the human genome is protein coding exons. In 9/2012, in papers in Nature, the ENCODE group has produced.
Chapter 21 Eukaryotic Genome Sequences
Fea- ture Num- ber Feature NameFeature description 1 Average number of exons Average number of exons in the transcripts of a gene where indel is located.
 Read quality  Adaptor trimming  Read sequence collapse Preprocessing Genome mapping  Map read to the spruce genome (Pabies1.0- genome.fa) using Patman
Regulatory Genomics Lab Saurabh Sinha Regulatory Genomics | Saurabh Sinha | PowerPoint by Casey Hanson.
Table 8.3 & Alberts Fig.1.38 EVOLUTION OF GENOMES C-value paradox: - in certain cases, lack of correlation between morphological complexity and genome.
Eukaryotic Genomes: The Organization and Control.
Lecture 18 – Functional Genomics Based on chapter 8 Functional and Comparative Genomics Copyright © 2010 Pearson Education Inc.
MPL The DNA Sequence of chimpanzee chromosome 22 and comparative analysis with its human ortholog, chromosome 21 Bioinformatics Dae-Soo Kim.
JIGSAW: a better way to combine predictions J.E. Allen, W.H. Majoros, M. Pertea, and S.L. Salzberg. JIGSAW, GeneZilla, and GlimmerHMM: puzzling out the.
Johnson - The Living World: 3rd Ed. - All Rights Reserved - McGraw Hill Companies Genomics Chapter 10 Copyright © McGraw-Hill Companies Permission required.
Blast 2.0 Details The Filter Option: –process of hiding regions of (nucleic acid or amino acid) sequence having characteristics.
Genomics Chapter 18.
Advisory Board Meeting, Caltech 2004 Genome Sequence Updates. Paul Davis The Sanger Institute.
How many genes are there?
Finding genes in the genome
IB Saccharomyces cerevisiae - Jan Major model system for molecular genetics. For example, one can clone the gene encoding a protein if you.
Aim: How is DNA organized in a eukaryotic cell?. Why is the control of gene expression more complex in eukaryotes than prokaryotes ? Eukaryotes have:
Considerations for multi-omics data integration Michael Tress CNIO,
Evolution of eukaryotic genomes
Basics of Comparative Genomics
ENCODE Pseudogenes and Transcription
BTY100-Lec#4.2 DNA to Protein (Central Dogma).
Volume 6, Issue 4, Pages (October 2000)
SGN23 The Organization of the Human Genome
Mark M Metzstein, H.Robert Horvitz  Molecular Cell 
Mutations changes in the DNA sequence that can be inherited
Organization of the human genome
Identification and Characterization of pre-miRNA Candidates in the C
Eukaryotic Genomes: The Organization and Control.
Conserved Seed Pairing, Often Flanked by Adenosines, Indicates that Thousands of Human Genes are MicroRNA Targets  Benjamin P. Lewis, Christopher B. Burge,
Essential Question: How cells make proteins
Basic Local Alignment Search Tool
lincRNAs: Genomics, Evolution, and Mechanisms
Gene expression and regulation & Mutations
Chapter 6 Genome Sequences and Gene Numbers
Joseph Rodriguez, Jerome S. Menet, Michael Rosbash  Molecular Cell 
Basics of Comparative Genomics
Basic Local Alignment Search Tool
by Tim Wang, Kıvanç Birsoy, Nicholas W. Hughes, Kevin M
Volume 11, Issue 7, Pages (May 2015)
Presentation transcript:

EXPLORING DEAD GENES Adrienne Manuel I400

What are they? Dead Genes are also called Pseudogenes Pseudogenes are non functioning copies of genes in DNA Results from reverse transcription from an mRNA transcript Or from gene duplication and subsequent disablement

Expression of Pseudogenes Evidently transcribed Expression of pseudogenes vary Snail (lymnaea stagnalis) example of an organism that still has functioning

Pseudogenes, Good and Bad! - Raised expression for tumor cells + Useful in studying molecular evolution + Helpful in determining rates of genomic DNA Loss for an organism

Size and Distribution of Pseudogenes DEFINING POPULATIONS AND SUBPOPULATIONS G ‘G’ the total population of confirmed and predicted protein-encoding genes ΨG is the estimated population of pseudogenes that correspond to G

The Set of genes with at least one verifying EST match was derived G E A set of genes that were deemed to be highly expressed was derived from microarray expression data and denoted G M The corresponding predicted tool or pseudogenes is denoted ΨG M

Data Files Sanger Sequencing Centre ftp (ftp://ftp.sanger.ac.uk) in this website are the six complete sequences of worm chromosomesftp://ftp.sanger.ac.uk GFF Data Files with annotations for genes and other genomic features that correspond to wormpep18 Arranged were the pseudogene population in the form of a pipeline

Pipelines Step 1: Sanger centre pseudogene annotations Start with list of 332 pseudogenes Pseudogene population was derived by looking for gene disablement Step 2: FASTA matching to find potential pseudogenes

PIPELINES (continued) Worm genes masked for low complexity region with the program SEG TFASTX and TFASTY are next used to compare the complete wormpep18 against the worm genome After comparison Pseudogene matches were refined with the next step

Pipeline (continued) Step 3: reduction for overlaps on the genomic DNA Significant matches of protein sequences to the DNA were reduced for redundancy where homologs match the same segment of DDNA Matches are then sorted Step 4: Prevention of over counting for adjacent matches. Initial matches may correspond to same pseudogene To avoid over counting matches were realigned

Pipeline Step 5: Masking against Sanger Centre annotation and Transposon library. Potential pseudogenes filtered for overlap with any other annotations in the Sanger Centre GFF files e.g. exons of genes, tandem or inverted repeats Step 6: Reduction for possible additional repeat elements At this point there is a set of 3814 pseudogenic fragments

Pipeline (final step) Step 7: reducing threshold stringency e-value match threshold reduced from.01 to.001 Check the web! To find pseudogene population, the data can be viewed either by searching for protein name or viewing specific range in the chromosome

Size of Pseudogene Popuation Composed of 2168 sequence, that’s about 12% of total gene complement Factors that affect the size: 1. Dead copies of transposable elements 2. Size of pseudogene underestimated because pseudogenes with less obvious disablement aren't included. 3.Annotated genes might be pseudogenes because disablement is undetectable 4. Pseudogenes still part of functioning gene 5. Some pseudogenes arise due to sequencing errors 6. Possible genomic repeats

SUBPOPULATIONS Highly expressed genes have fewer dead gene copies The most reliable subset of the pseudogene population is about half the total for ΨG. 39% of pseudogenes are intronic-these kinds of pseudogenes aren't ailing families of proteins

Chromosomal Distributions More abundant near the ends of chromosome (the “arms”) For each chromosome, there is a calculated proportion of dead genes

The data plot above indicates genome to genome over all age. The percentage composition for each of the 20 amino acids is graphed in decreasing order of the implied amino acid composition in the pseudogene set. In the bottom part of the figure, the G difference for each amino acid composition is indicated by a bar.

Listed are the largest sequence families in the worm ranked by genes and pseudogenes They’re named for their particular representative. Four of the 10 paralog genes family when ranked by number are functionally uncharacterized Three of the pseudogenes top 10 are amongst the biggest families when we rank according to number of genes

Pseudofolds These charts ranked in terms of implied structural pseudofolds Proteins encoded by the worm genome have been assigned to globular domain folds From the SCOP database

Why was this studied again? To provide an initial estimate of the size distribution and characterizations of the pseudogene comparing C.elegans in attempt to estimate the total number in humans. Found few pseudogenes that are apparently due to processing in the worm genome Found large uncharacterized gene family that makes up 2/3 of dead genes Arms of chromosome are an unreliable for encoding genes but more likely to spawn new proteins