Study of Gene Expression: Statistics, Biology, and Microarrays Ker-Chau Li Statistics Department UCLA

Slides:



Advertisements
Similar presentations
The Human Genome Project: Effects on Human Health FODOR KINGA KAPRONCZAI ROBERT NAGY RENATA.
Advertisements

The Human Genome Project
1 Genetics The Study of Biological Information. 2 Chapter Outline DNA molecules encode the biological information fundamental to all life forms DNA molecules.
Workshop: computational gene prediction in DNA sequences (intro)
living organisms According to Presence of cell The non- cellular organism The cellular organisms According to Type the Eukaryotes the prokaryotes human.
Gene Expression Chapter 9.
Copyright, ©, 2002, John Wiley & Sons, Inc.,Karp/CELL & MOLECULAR BIOLOGY 3E Transcriptional Control in Eukaryotes Background Information Microarrays.
Genomics MUPGRET Weekend Workshop Timeline Answers ne_2.html ne_2.html.
Bioinformatics Kick Ass Cool Stuff. Bioinformatics Def.: where the rubber meets the road (i.e., where computer science meets biology) "The mathematical,
10 Genomics, Proteomics and Genetic Engineering. 2 Genomics and Proteomics The field of genomics deals with the DNA sequence, organization, function,
Genomics MUPGRET Weekend Workshop Timeline Answers ne_2.html ne_2.html.
STAT 254 -lecture1 An overview Cell biology, microarray, statistics Bioinformatics and Statistics Topics to cover Keep a skeptical eye on everything you.
Summer Bioinformatics Workshop 2008 Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State University – Rochester
CISC667, F05, Lec24, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) DNA Microarray, 2d gel, MSMS, yeast 2-hybrid.
Chi-Cheng Lin, Ph.D. Associate Professor Department of Computer Science Winona State University – Rochester Center Introduction to Bioinformatics.
Genomics and Its Impact on Medicine and Society: A 2001 Primer Human Genome Program, U.S. Department of Energy.
Genomics I: The Transcriptome RNA Expression Analysis Determining genomewide RNA expression levels.
Exploring the Metabolic and Genetic Control of Gene Expression on a Genomic Scale Joseph L. DeRisi, Vishwanath R. Iyer, Patrick O. Brown Science Vol. 278.
Analysis of microarray data
Paola CASTAGNOLI Maria FOTI Microarrays. Applicazioni nella genomica funzionale e nel genotyping DIPARTIMENTO DI BIOTECNOLOGIE E BIOSCIENZE.
What is genomics? Study of genomes. What is the genome? Entire genetic compliment of an organism.
9/30/2004TCSS588A Isabelle Bichindaritz1 Introduction to Bioinformatics.
Georgia Wiesner, MD CREC June 20, GATACAATGCATCATATG TATCAGATGCAATATATC ATTGTATCATGTATCATG TATCATGTATCATGTATC ATGTATCATGTCTCCAGA TGCTATGGATCTTATGTA.
LEQ: WHAT ARE THE BENEFITS OF DNA TECHNOLOGY & THE HUMAN GENOME PROJECT? to
AP Biology Ch. 20 Biotechnology.
DNA MICROARRAYS WHAT ARE THEY? BEFORE WE ANSWER THAT FIRST TAKE 1 MIN TO WRITE DOWN WHAT YOU KNOW ABOUT GENE EXPRESSION THEN SHARE YOUR THOUGHTS IN GROUPS.
Human Genome Project. In 2003 scientists in the Human Genome Project obtained the DNA sequence of the 3 billion base pairs making up the human genome.
Beyond the Human Genome Project Future goals and projects based on findings from the HGP.
HUMAN GENOMICS Human Genome Program, U.S. Department of Energy, Genomics and Its Impact on Medicine and Society: A 2001 Primer, 2001.
U.S. Department of Energy Genome Programs
CDNA Microarrays MB206.
Lesson Overview Lesson Overview Studying the Human Genome Lesson Overview 14.3 Studying the Human Genome.
Write down what you know about the human genome project.
Genomics and Its Impact on Science and Society: The Human Genome Project and Beyond U.S. Department of Energy Genome Programs
Finish up array applications Move on to proteomics Protein microarrays.
Genomes and Their Evolution. GenomicsThe study of whole sets of genes and their interactions. Bioinformatics The use of computer modeling and computational.
Literature reviews revised is due4/11 (Friday) turn in together: revised paper (with bibliography) and peer review and 1st draft.
GTL User Facilities Facility IV: Analysis and Modeling of Cellular Systems Jim K. Fredrickson.
Chapter 13 Table of Contents Section 1 DNA Technology
Aim: What have we learned from the Human Genome Project ? Human Genome Project Progress Project goals were togoals 1.identify all the approximately 20,000-25,000.
Bioinformatics field of science in which biology, computer science, and information technology merge to form a single discipline.
CO 1: Ability to explain foundations of modern biotechnology.
Genomes To Life Biology for 21 st Century A Joint Initiative of the Office of Advanced Scientific Computing Research and Office of Biological and Environmental.
Bioinformatics The application of computer technology to the management of biological information
Biotechnology. Polymerase Chain Reaction PCR is the cloning of DNA (amplification). Copies are made and the amount of DNA can be rapidly increased. Useful.
HUMAN GENOME PROJECT International effort of 13 years (1990 – 2003) Identified all the approximate 20,000 – 25,000 genes in human DNA Determined the sequences.
KEY CONCEPT Biotechnology relies on cutting DNA at specific places.
Biotechnology AQLIMA ALI & ATIKAH MSU.
ANALYSIS OF GENE EXPRESSION DATA. Gene expression data is a high-throughput data type (like DNA and protein sequences) that requires bioinformatic pattern.
Advances in Genetic Technology Class Notes Make sure you study this along with our first PowerPoint on Transgenics and your class Article notes.
Microarrays and Other High-Throughput Methods BMI/CS 576 Colin Dewey Fall 2010.
Benefits of Human Genome Project Research
The Human Genome Project
Introduction to Physiological Genomics: Defining the Discipline and its Methods 2005 IUPS Congress Timothy P. O’Connor, Ph.D. Department of Genetic Medicine.
Chapter 13 Section 13.3 The Human Genome. Genomes contain all the information needed for an organism to grow and survive The Human Genome Project (HGP)
What does the draft human genome sequence tell us?
What does the draft human genome sequence tell us?
U.S. Department of Energy Genome Programs
Microarray Technology and Applications
U.S. Department of Energy Genome Programs
U.S. Department of Energy Genome Programs
U.S. Department of Energy Genome Programs
The Human Genome Project
U.S. Department of Energy Genome Programs
Genetics: From Genes to Genomes
The Study of Biological Information
U.S. Department of Energy Genome Programs
U.S. Department of Energy Genome Programs
In 2003 scientists in the Human Genome Project achieved a long-sought goal by obtaining the DNA sequence of the 3.2 billion base pairs (the order of As,
Presentation transcript:

Study of Gene Expression: Statistics, Biology, and Microarrays Ker-Chau Li Statistics Department UCLA

PART I. Cellular Biology Macromolecules: DNA, mRNA, protein

Why Biology?

Human Genome Project Begun in 1990, the U.S. Human Genome Project is a 13-year effort coordinated by the U.S. Department of Energy and the National Institutes of Health. The project originally was planned to last 15 years, but effective resource and technological advances have accelerated the expected completion date to Project goals are to ■ identify all the approximate 30,000 genes in human DNA, ■ determine the sequences of the 3 billion chemical base pairs that make up human DNA, ■ store this information in databases, ■ improve tools for data analysis, ■ transfer related technologies to the private sector, and ■ address the ethical, legal, and social issues (ELSI) that may arise from the project. Recent Milestones: ■ June 2000 completion of a working draft of the entire human genome ■ February 2001 analyses of the working draft are published Human Genome Program, U.S. Department of Energy, Genomics and Its Impact on Medicine and Society: A 2001 Primer, 2001

Gene number, exact locations, and functions Gene regulation DNA sequence organization Chromosomal structure and organization Noncoding DNA types, amount, distribution, information content, and functions Coordination of gene expression, protein synthesis, and post-translational events Interaction of proteins in complex molecular machines Predicted vs experimentally determined gene function Evolutionary conservation among organisms Protein conservation (structure and function) Proteomes (total protein content and function) in organisms Correlation of SNPs (single-base DNA variations among individuals) with health and disease Disease-susceptibility prediction based on gene sequence variation Genes involved in complex traits and multigene diseases Complex systems biology including microbial consortia useful for environmental restoration Developmental genetics, genomics Future Challenges: What We Still Don’t Know Human Genome Program, U.S. Department of Energy, Genomics and Its Impact on Medicine and Society: A 2001 Primer, 2001

Medicine and the New Genomics Gene Testing Gene Therapy Pharmacogenomics Human Genome Program, U.S. Department of Energy, Genomics and Its Impact on Medicine and Society: A 2001 Primer, 2001 improved diagnosis of disease earlier detection of genetic predispositions to disease rational drug design gene therapy and control systems for drugs personalized, custom drugs Anticipated Benefits

Molecular Medicine improved diagnosis of disease earlier detection of genetic predispositions to disease rational drug design gene therapy and control systems for drugs pharmacogenomics "custom drugs" Microbial Genomics rapid detection and treatment of pathogens (disease-causing microbes) in medicine new energy sources (biofuels) environmental monitoring to detect pollutants protection from biological and chemical warfare safe, efficient toxic waste cleanup Human Genome Program, U.S. Department of Energy, Genomics and Its Impact on Medicine and Society: A 2001 Primer, 2001

Agriculture, Livestock Breeding, and Bioprocessing disease-, insect-, and drought-resistant crops healthier, more productive, disease-resistant farm animals more nutritious produce biopesticides edible vaccines incorporated into food products new environmental cleanup uses for plants like tobacco Human Genome Program, U.S. Department of Energy, Genomics and Its Impact on Medicine and Society: A 2001 Primer, 2001 Anticipated Benefits

Human Genome Program, U.S. Department of Energy, Genomics and Its Impact on Medicine and Society: A 2001 Primer, 2001

What is a gene ?

SNP and Genetic Disease

Mitochondrial ATP Synthase E. coli ATP Synthase These images depicting models of ATP Synthase subunit structure were provided by John Walker. Some equivalent subunits from different organisms have different names.

PART II. Microarray Genome-wide expression profiling

Differential Gene expression: tissues, organs

Next Step in Genomics Transcriptomics involves large ‑ scale analysis of messenger RNAs (molecules that are transcribed from active genes) to follow when, where, and under what conditions genes are expressed. Proteomics—the study of protein expression and function—can bring researchers closer than gene expression studies to what’s actually happening in the cell. Structural genomics initiatives are being launched worldwide to generate the 3 ‑ D structures of one or more proteins from each protein family, thus offering clues to function and biological targets for drug design. Knockout studies are one experimental method for understanding the function of DNA sequences and the proteins they encode. Researchers inactivate genes in living organisms and monitor any changes that could reveal the function of specific genes. Comparative genomics—analyzing DNA sequence patterns of humans and well ‑ studied model organisms side ‑ by ‑ side—has become one of the most powerful strategies for identifying human genes and interpreting their function. Human Genome Program, U.S. Department of Energy, Genomics and Its Impact on Medicine and Society: A 2001 Primer, 2001

Microarray

MicroArray Allows measuring the mRNA level of thousands of genes in one experiment -- system level response The data generation can be fully automated by robots Common experimental themes: –Time Course –Mutation/Knockout Response

Reverse-transcription Color : cy3, cy5 green, red

Exploring the Metabolic and Genetic Control of Gene Expression on a Genomic Scale Joseph L. DeRisi, Vishwanath R. Iyer, Patrick O. Brown*

PART III. Statistics Low-level analysis Comparative expression Feature extraction Classification,clustering Pearson correlation Liquid association

Image analysis Convert an image into a number representing the ratio of the levels of expression between red and green channels Color bias Spatial, tip, spot effects Background noises cDNA, oligonucleotide arrays,

Genome-wide expression profile A basic structure cond1 cond2 …….. condp Gene1 x11 x12 …….. x1p Gene2 x21 x22 …….. x2p … …... Genen xn1 xn2 …….. xnp

Cond1, cond2, …, condp denote various environmental conditions, time points, cell types, etc. under which mRNA samples are taken Note : numerous cells are involved Data quality issues : 1. chip (manufacturer) 2. mRNA sample (user) It is important to have a homogeneous sample so that cellular signals can be amplified - Yeast Cell Cycle data : ideally all cells are engaged in the same activities- synchronization

Example 1 Comparative expression Normal versus cancer cells ALL versus AML

E.Lander’s group at MIT Cancer classification (leukemia) ALL; AML (arising from lymphoid or myeloid precursors) Require different treatments Traditional methods ; nuclear morphology; Enzyme-based histochemical analysis(1960) Antibodies (1970) Genome wide expression comparision

ALL (acute lymphoblastic leukemia) AML(acute myeloid leukemia)

Gene selection For each gene (row) compute a score defined by sample mean of X - sample mean of Y divided by standard deviation of X + standard deviation of Y X=ALL, Y=AML Genes (rows) with highest scores are selected. Works ???? 34 new leukemia samples 29 are predicated with 100% accuracy; 5 weak predication cases