Genes and Genomic Datasets. DNA compositional biases Base composition of genomes: E. coli: 25% A, 25% C, 25% G, 25% T P. falciparum (Malaria parasite):

Slides:



Advertisements
Similar presentations
Test-tube or keyboard? Computation in the life sciences.
Advertisements

Introduction to genomes & genome browsers
Bioinformatics Master Course II: DNA/Protein structure-function analysis and prediction Lecture 12: DNA/RNA structure Centre for Integrative Bioinformatics.
Introduction to bioinformatics Lecture 2 Genes and Genomes.
Dr. Almut Nebel Dept. of Human Genetics University of the Witwatersrand Johannesburg South Africa Significance of SNPs for human disease.
Introduction to bioinformatics Lecture 2 Genes and Genomes.
The Golden Age of Biology DNA -> RNA -> Proteins -> Metabolites Genomics Technologies MECHANISMS OF LIFE Health Care Diagnostics Medicines Animal Products.
Bioinformatics: a Multidisciplinary Challenge Ron Y. Pinter Dept. of Computer Science Technion March 12, 2003.
Introduction to Genomics, Bioinformatics & Proteomics Brian Rybarczyk, PhD PMABS Department of Biology University of North Carolina Chapel Hill.
The Cell, Central Dogma and Human Genome Project.
The Central Dogma of Molecular Biology (Things are not really this simple) Genetic information is stored in our DNA (~ 3 billion bp) The DNA of a.
Putting it all together: Finding the cystic fibrosis gene Cystic fibrosis (CF) is a genetic disorder that is relatively common in some ethnic groups A.
A Study of Cystic Fibrosis Using Web-Based Tools Anuradha Datta Murphy Graduate Student, Dept. of Molecular and Integrative Physiology, University of Illinois.
CH 11 pg217 Role of Gene Expression DNA on several chromosomes –Only some of these genes are expressed at any given time Activation of a gene that results.
Decoding DNA : Transcription, Translation and Gene Regulation.
Does gene order matter? Cis-regulatory elements, proteins, and messengers are integrated into biological circuits. Does gene location in the genome affect.
Chapter 5 Genome Sequences and Gene Numbers. 5.1Introduction  Genome size vary from approximately 470 genes for Mycoplasma genitalium to 25,000 for human.
Master’s course Bioinformatics Data Analysis and Tools Lecture 1: Introduction Centre for Integrative Bioinformatics FEW/FALW
A brief Introduction to Bioinformatics Y. SINGH NELSON R. MANDELA SCHOOL OF MEDICINE DEPARTMENT OF TELEHEALTH Content licensed under.
Introduction to Bioinformatics Spring 2002 Adapted from Irit Orr Course at WIS.
DNA Technology Bio Summarize the process of gel electrophoresis as a technique to separate molecules based on size. Students should learn the general.
Genomes and Their Evolution. GenomicsThe study of whole sets of genes and their interactions. Bioinformatics The use of computer modeling and computational.
Development: differentiating cells to become an organism.
Genomes and Genomics.
Genetics Inheritance through Evolution. Essential Ideas 3.1 Every living organism inherits a blueprint for life from its parents All members of a species.
Genomics for Librarians Stuart M. Brown, Ph.D. Director, Research Computing, NYU School of Medicine.
Molecular Basis for Relationship between Genotype and Phenotype DNA RNA protein genotype function organism phenotype DNA sequence amino acid sequence transcription.
Predicting protein degradation rates Karen Page. The central dogma DNA RNA protein Transcription Translation The expression of genetic information stored.
Overview  Introduction  Biological network data  Text mining  Gene Ontology  Expression data basics  Expression, text mining, and GO  Modules and.
Gene, Proteins, and Genetic Code. Protein Synthesis in a Cell.
Epidemiology 217 Molecular and Genetic Epidemiology Bioinformatics & Proteomics John Witte.
EB3233 Bioinformatics Introduction to Bioinformatics.
Bioinformatics and Computational Biology
1 From Mendel to Genomics Historically –Identify or create mutations, follow inheritance –Determine linkage, create maps Now: Genomics –Not just a gene,
ANALYSIS OF GENE EXPRESSION DATA. Gene expression data is a high-throughput data type (like DNA and protein sequences) that requires bioinformatic pattern.
Bioinformatics Workshops 1 & 2 1. use of public database/search sites - range of data and access methods - interpretation of search results - understanding.
Genetic disorders can be due to any of the following factors: A. Monogenetic Disorders: Caused by a mutation in a single gene 1. Autosomal recessive alleles:
Genomics A Systematic Study of the Locations, Functions and Interactions of Many Genes at Once.
PLANT BIOTECHNOLOGY & GENETIC ENGINEERING (3 CREDIT HOURS) LECTURE 13 ANALYSIS OF THE TRANSCRIPTOME.
Genomics A Systematic Study of the Locations, Functions and Interactions of Many Genes at Once.
Starter What do you know about DNA and gene expression?
Genetics. Outline DNA Structure Replication (not in detail) DNA and RNA Overview Transcription Overview Translation Some important details of transcription.
Rate of mutations in the Human Genome A study published in Current Biology in 2009, shows that in total, we all carry new mutations in our DNA.
Protein Synthesis Transcription and Translation RNA Structure Like DNA, RNA consists of a long chain of nucleotides 3 Differences between RNA and DNA:
Notes: Human Genome (Right side page)
Faculdade de Medicina da Universidade de Coimbra Curso de Medicina 1º Ano Ano lectivo 2009/2010.
Chapter 1 Biology and You Biology is the study of life. All living organisms share certain general properties that separate them from nonliving things.
KEY CONCEPT 8.5 Translation converts an mRNA message into a polypeptide, or protein.
Different microarray applications Rita Holdhus Introduction to microarrays September 2010 microarray.no Aim of lecture: To get some basic knowledge about.
Gene structure and function
Name the 4 gene mutations that can occur State the effect of gene mutations on amino acid sequences.
Chapter 13 Section 13.3 The Human Genome. Genomes contain all the information needed for an organism to grow and survive The Human Genome Project (HGP)
1 Finding disease genes: A challenge for Medicine, Mathematics and Computer Science Andrew Collins, Professor of Genetic Epidemiology and Bioinformatics.
Relationship between Genotype and Phenotype
Genomics A Systematic Study of the Locations, Functions and Interactions of Many Genes at Once.
Sunday, Tuesday & Thursday 2-3
2/23/15 Learning Objectives
School of Pharmacy, University of Nizwa
Genes 3.1.
“Proteomics is a science that focuses on the study of proteins: their roles, their structures, their localization, their interactions, and other factors.”
Relationship between Genotype and Phenotype
Genetics Lesson 4.
Different mode and types of inheritance
Entry Task: Educated Guess!
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Introduction to Bioinformatics
In these studies, expression levels are viewed as quantitative traits, and gene expression phenotypes are mapped to particular genomic loci by combining.
AH Biology: Unit 1 Proteomics and Protein Structure 1
School of Pharmacy, University of Nizwa
Presentation transcript:

Genes and Genomic Datasets

DNA compositional biases Base composition of genomes: E. coli: 25% A, 25% C, 25% G, 25% T P. falciparum (Malaria parasite): 82%A+T Translation initiation: ATG is the near universal motif indicating the start of translation in DNA coding sequence.

Some facts about human genes Comprise about 3% of the genome Average gene length: ~ 8,000 bp Average of 5-6 exons/gene Average exon length: ~200 bp Average intron length: ~2,000 bp ~8% genes have a single exon Some exons can be as small as 1 or 3 bp. HUMFMR1S is not atypical: 17 exons bp long, comprising 3% of a 67,000 bp gene

Genetic diseases Many diseases run in families and are a result of genes which predispose such family members to these illnesses Examples are Alzheimer’s disease, cystic fibrosis (CF), breast or colon cancer, or heart diseases. Some of these diseases can be caused by a problem within a single gene, such as with CF.

Genetic diseases (Cont.) For other illnesses, like heart disease, at least genes are thought to play a part, and it is still unknown which combination of problems within which genes are responsible. With a “problem” within a gene is meant that a single nucleotide or a combination of those within the gene are causing the disease (or make that the body is not sufficiently fighting the disease). Persons with different combinations of these nucleotides could then be unaffected by these diseases.

Genetic diseases (Cont.) Cystic Fibrosis Known since very early on (“Celtic gene”) Inherited autosomal recessive condition (Chr. 7) Symptoms: –Clogging and infection of lungs (early death) –Intestinal obstruction –Reduced fertility and (male) anatomical anomalies CF gene CFTR has 3-bp deletion leading to Del508 (Phe) in 1480 aa protein (epithelial Cl - channel) – protein degraded in ER instead of inserted into cell membrane

Genomic Data Sources DNA/protein sequence Expression (microarray) Proteome (xray, NMR, mass spectrometry) Metabolome Physiome (spatial, temporal) Integrative bioinformatics

Dinner discussion: Integrative Bioinformatics & Genomics VU metabolome proteome genome transcriptome physiome Genomic Data Sources Vertical Genomics

A gene codes for a protein Protein mRNA DNA transcription translation CCTGAGCCAACTATTGATGAA PEPTIDEPEPTIDE CCUGAGCCAACUAUUGAUGAA

Humans have spliced genes…

DNA makes RNA makes Protein

Remark The problem of identifying (annotating) human genes is considerably harder than the early success story for ß- globin might suggest. The human factor VIII gene (whose mutations cause hemophilia A) is spread over ~186,000 bp. It consists of 26 exons ranging in size from 69 to 3,106 bp, and its 25 introns range in size from 207 to 32,400 bp. The complete gene is thus ~9 kb of exon and ~177 kb of intron. The biggest human gene yet is for dystrophin. It has > 30 exons and is spread over 2.4 million bp.

DNA makes RNA makes Protein: Expression data More copies of mRNA for a gene leads to more protein mRNA can now be measured for all the genes in a cell at ones through microarray technology Can have 60,000 spots (genes) on a single gene chip Colour change gives intensity of gene expression (over- or under-expression)

Metabolic networks Glycolysis and Gluconeogenesis Kegg database (Japan)

High-throughput Biological Data Enormous amounts of biological data are being generated by high-throughput capabilities; even more are coming –genomic sequences –gene expression data –mass spec. data –protein-protein interaction –protein structures –......

Protein structural data explosion Protein Data Bank (PDB): Structures (6 March 2001) x-ray crystallography, 1810 NMR, 278 theoretical models, others...

Dickerson’s formula: equivalent to Moore’s law On 27 March 2001 there were 12,123 3D protein structures in the PDB: Dickerson’s formula predicts 12,066 (within 0.5%)! n = e 0.19(y-1960) with y the year.

Sequence versus structural data Despite structural genomics efforts, growth of PDB slowed down in (i.e did not keep up with Dickerson’s formula) More than 100 completely sequenced genomes Increasing gap between structural and sequence data

Bioinformatics Large - external (integrative)ScienceHuman Planetary ScienceCultural Anthropology Population Biology Sociology SociobiologyPsychology Systems Biology Biology Medicine Molecular Biology Chemistry Physics Small – internal (individual) Bioinformatics

Offers an ever more essential input to –Molecular Biology –Pharmacology (drug design) –Agriculture –Biotechnology –Clinical medicine –Anthropology –Forensic science –Chemical industries (detergent industries, etc.)

Tot hier 05/02/2003