Gene expression. Gene Expression 2 protein RNA DNA.

Slides:



Advertisements
Similar presentations
Analysis of Microarray Genomic Data of Breast Cancer Patients Hui Liu, MS candidate Department of statistics Prof. Eric Suess, faculty mentor Department.
Advertisements

Molecular Biomedical Informatics Machine Learning and Bioinformatics Machine Learning & Bioinformatics 1.
02/21/00 V1.2 Clustering Large Data Sets in Gene expression analysis Daniel Weaver.
Expression profiles for prognosis and prediction Laura J. Van ‘t Veer The Netherlands Cancer Institute, Amsterdam.
Microarray technology and analysis of gene expression data Hillevi Lindroos.
Microarrays Dr Peter Smooker,
By Russell Armstrong Supervisor Mrs Wei Ji Diagnosis Analysis of Lung Cancer by Genome Expression Profiles.
SocalBSI 2008: Clustering Microarray Datasets Sagar Damle, Ph.D. Candidate, Caltech  Distance Metrics: Measuring similarity using the Euclidean and Correlation.
Protein RNA DNA Predicting Protein Function. Biochemical function (molecular function) What does it do? Kinase??? Ligase??? Page 245.
C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Lecture 9 Clustering Algorithms Bioinformatics Data Analysis and Tools.
‘Gene Shaving’ as a method for identifying distinct sets of genes with similar expression patterns Tim Randolph & Garth Tan Presentation for Stat 593E.
Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.
Key dates lists of suggested projects published * *If you or your partner are working in a biology lab, try to find a relevant project which can.
Projects Key dates lists of suggested projects published * *You are highly encouraged to choose a project yourself or find a relevant project.
Introduction to Hierarchical Clustering Analysis Pengyu Hong 09/16/2005.
An Introduction to DNA Microarrays Jack Newton University of Alberta
Microarrays and Gene Expression Analysis. 2 Gene Expression Data Microarray experiments Applications Data analysis Gene Expression Databases.
Why microarrays in a bioinformatics class? Design of chips Quantitation of signals Integration of the data Extraction of groups of genes with linked expression.
High Throughput Sequencing
Gene expression profiling identifies molecular subtypes of gliomas
Epigenome 1. 2 Background: GWAS Genome-Wide Association Studies 3.
From motif search to gene expression analysis
Clustering of DNA Microarray Data Michael Slifker CIS 526.
Multiple Examples of tumor tissue (public data from Whitehead/MIT) SVM Classification of Multiple Tumor Types DNA Microarray Data Oracle Data Mining 78.25%
Table S1. Characteristics of breast tumor and normal breast tissue samples. Relevant characteristics of breast tumor and normal breast tissue samples analyzed.
Finish up array applications Move on to proteomics Protein microarrays.
Bioinformatics Brad Windle Ph# Web Site:
Sample classification using Microarray Data. AB We have two sample entities malignant vs. benign tumor patient responding to drug vs. patient resistant.
Microarrays.
Microarray data analysis David A. McClellan, Ph.D. Introduction to Bioinformatics Brigham Young University Dept. Integrative Biology.
Construction of cancer pathways for personalized medicine | Presented By Date Construction of cancer pathways for personalized medicine Predictive, Preventive.
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
Microarrays and Gene Expression Analysis. 2 Gene Expression Data Microarray experiments Applications Data analysis Gene Expression Databases.
Intel Confidential – Internal Only Co-clustering of biological networks and gene expression data Hanisch et al. This paper appears in: bioinformatics 2002.
Primary Mets Node Patient 1Patient 2Patient 3 Primary Mets Node Patient 1Patient 2Patient 3 Primary Mets Node Patient 1Patient 2Patient 3 Primary Mets.
1 FINAL PROJECT- Key dates –last day to decided on a project * 11-10/1- Presenting a proposed project in small groups A very short presentation (Max.
Evolutionary Algorithms for Finding Optimal Gene Sets in Micro array Prediction. J. M. Deutsch Presented by: Shruti Sharma.
Gene Expression Analysis. 2 DNA Microarray First introduced in 1987 A microarray is a tool for analyzing gene expression in genomic scale. The microarray.
Gene Expression and Networks. 2 Microarray Analysis Supervised Methods -Analysis of variance -Discriminate analysis -Support Vector Machine (SVM) Unsupervised.
Lecture 7. Functional Genomics: Gene Expression Profiling using
An Overview of Clustering Methods Michael D. Kane, Ph.D.
Bioinformatics MEDC601 Lecture by Brad Windle Ph# Office: Massey Cancer Center, Goodwin Labs Room 319 Web site for lecture:
Nuria Lopez-Bigas Methods and tools in functional genomics (microarrays) BCO17.
Gene expression & Clustering. Determining gene function Sequence comparison tells us if a gene is similar to another gene, e.g., in a new species –Dynamic.
Support Vector Machines and Gene Function Prediction Brown et al PNAS. CS 466 Saurabh Sinha.
Overview of Microarray. 2/71 Gene Expression Gene expression Production of mRNA is very much a reflection of the activity level of gene In the past, looking.
Cluster validation Integration ICES Bioinformatics.
ANALYSIS OF GENE EXPRESSION DATA. Gene expression data is a high-throughput data type (like DNA and protein sequences) that requires bioinformatic pattern.
Evaluation of gene-expression clustering via mutual information distance measure Ido Priness, Oded Maimon and Irad Ben-Gal BMC Bioinformatics, 2007.
Biological Networks.
Tutorial 8 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO Clustering –Hierarchical clustering –K-means clustering.
CSE182 L14 Mass Spec Quantitation MS applications Microarray analysis.
NCode TM miRNA Analysis Platform Identifies Differentially Expressed Novel miRNAs in Adenocarcinoma Using Clinical Human Samples Provided By BioServe.
Projects
Introduction to Bioinformatics
FINAL PROJECT- Key dates
Gene Expression Analysis
Gene expression.
CellExpress Examples A Comprehensive Microarray-Based Cancer Cell Line and Clinical Sample Gene Expression Analysis Online System :8080 NTU.
Functional Genomics Analysis Reveals a MYC Signature Associated with a Poor Clinical Prognosis in Liposarcomas  Dat Tran, Kundan Verma, Kristin Ward,
Claudio Lottaz and Rainer Spang
Loyola Marymount University
Class Prediction Based on Gene Expression Data Issues in the Design and Analysis of Microarray Experiments Michael D. Radmacher, Ph.D. Biometric Research.
Genomic Run-On Evaluates Transcription Rates for All Yeast Genes and Identifies Gene Regulatory Mechanisms  José Garcı́a-Martı́nez, Agustı́n Aranda, José.
Loyola Marymount University
Loyola Marymount University
Loyola Marymount University
Loyola Marymount University
Claudio Lottaz and Rainer Spang
Presentation transcript:

Gene expression

Gene Expression 2 protein RNA DNA

Gene Expression 3 AAAAAAA mRNA gene1 mRNA gene2 mRNA gene3

Studying Gene Expression cDNA Microarrays (first high throughput gene expression experiments) DNA chips (High density oligonucleotide microarrays ) RNA-seq (High throughput sequencing)

Classical versus modern technologies to study gene expression 5 Classical Methods (Microarrays) -Require prior knowledge on the RNA transcript Good for studying the expression of known genes High throughput RNA sequencing -Do not require prior knowledge Good for discovering new transcripts

RNA-seq 6

What can we learn from RNAseq? 7 - Comparing the expression between two genes in the same sample - Comparing the expression between the same gene in different samples

What can we learn from RNAseq? 8 Comparing the expression between two genes in the same sample PROBLEM : * Genes of different length are expected to have different number of reads * The coverage is strongly dependent on the sequencing depth

What can we learn from RNAseq? 9 Possible solution: Normalizing by transcript length and the total number of reads mapped in the experiment RPKM =

10 Gene B> Gene A > Gene C Gene A> Gene B > Gene C Problems with Normalization Warning !!! normalization by total number of reads can lead to false detection of differentially expressed genes

What can we learn from RNAseq? Comparing the expression between the same gene in different samples Example : Finding new markers for pluripotency (תאי גזע עובריים) (תאים ממוינים) Highly Expressed Lowly Expressed

What can we learn from RNAseq? Comparing the expression between the same gene in different samples Fold change (FC) = Ratio between the expression of the gene in sample X to the expression of the gene in sample Y Sample X (Stem cell) Sample Y (Fibroblasts) Is fold change enough to evaluate the difference?

Remember: We always need to evaluate the statistical significance of the results Standard measure = q-value (which is the p-value corrected for multiple testing) 13 Finding new markers for pluripotency Expression in stem cells versus fibroblasts Possible candidates for being pluripotent markers

14 Clustering the data according to expression profiles. Genes Expression in different conditions NEXT… Highly Expressed Lowly Expressed

15 WHY? What can we learn from the clusterers? Diagnostics and Therapy –A set of genes which differs in the gene expression can indicate a disease state Identify gene function –Set of genes with similar gene expression can infer similar function

16 Ramaswamy et al, 2003 Nat Genet 33:49-54 Samples were taken from patients with adenocarcinoma. hundreds of genes that differentiate between cancer tissues in different stages of the tumor were found. The arrow shows an example of a tumor cells which were not detected correctly by histological or other clinical parameters. A molecular signature of metastasis in primary solid tumors

17 HOW? Different clustering approaches Unsupervised - Hierarchical Clustering - K-means Supervised Methods (למידה מונחית) -Support Vector Machine (SVM)

Clustering Clustering organizes things that are close into groups. - What does it mean for two genes to be close? - Once we know this, how do we define groups?

What does it mean for two genes to be close? 19 We need a mathematical definition of distance between the expression pattern of two genes Gene 1 Gene 2 Gene1= (E 11, E 12, …, E 1N )’ Gene2= (E 21, E 22, …, E 2N )’

Calculating the distance between two expression patterns 20 Gene1= (E 11, E 12, …, E 1N )’ Gene2= (E 21, E 22, …, E 2N )’ Euclidean distance (ED)= Sqrt of Sum of (E 1i -E 2i ) 2, i=1,…,N We can use many different distance measures Distance X1,Y1 X2,Y2 When N is 100 we have to think abstractly Low Euclidean DistanceHigh similarity

Calculating the distance between two expression patterns 21 Pearson correlation coefficient High correlation coefficientHigh similarity

Distance and correlations can produce very different results 22 Counts Euclidian distance= 1740 Pearson correlation= 0.9 High similarity Low similarity

Clustering the genes according to expression 23 Generate a tree based on the distances between genes (similar to a phylogenetic tree) Each gene is a leaf on the tree Distances reflect the similarity of their expression pattern Hierarchical Clustering Genes Expression in different conditions Gene Cluster

24 abcd a 0424 b c d Clustering the genes according to gene expression Distance Table Distances ( Euclidian distance )* Genes Dab = 4 Dac = 2 Dad = 4 Dbc = 4.47 Dbd = 2.82 Dcd = 4.47 Can be calculated using different distance metrics GENE a 1, -1, 1, 1, 1,-1,-1,-1 GENE b 1, 1, -1, 1, 1, 1,-1, 1 GENE c 1, -1, 1, -1, 1,-1,-1,-1 GENE d -1, 1, -1, 1, 1, 1,-1,-1

25 Analyzing the clusters of genes Cluster 2 Cluster 3 Cluster 4

26 What can we learn from clusters with similar gene expression ?? Similar expression between genes -The genes have similar function -The genes work together in the same pathway /complex -All genes are controlled by a common regulatory genes

27 What can we learn from clusters with similar gene expression ?? Similar expression between genes -The genes have similar function -The genes work together in the same pathway /complex -All genes are controlled by a common regulatory genes

28 Example: Identifying genes that have similar function HnRNPA1 and SRp40 are not clear homologs based on blast e-value but have a very similar gene expression pattern in different tissues

29 Are hnRNP A1 and SRp40 functionally homologs ?? SF SRP40 hnRNP A1 YES!!!!

30 What can we learn from clusters with similar gene expression ?? Similar expression between genes –The genes have similar function –The genes work together in the same pathway /complex –All genes are controlled by a common regulatory genes

31 Example: Genes work together in the same complex Counts Transcription Factor Long non-coding RNA TF

32 How can gene expression help in diagnostics?

How can gene-expression help in diagnostics ? Different patients (BRCA1 or BRCA2) RESEARCH QUESTION Can we distinguish BRCA1 from BRCA2– cancers based solely on their gene expression profiles? HERE we want to cluster the patients not the genes !!! Genes

34 How can gene expression be applied for diagnostic ? Patient 1 patient 2 patient 3 patient 4 patient 5 Gen Gen Gen Gen Gen Breast Cancer Patient

35 How can gene expression be applied for diagnostic ? patinet 1 patient 2 patient 4 patient 3 patient 5 Gen Gen Gen Gen Gen BRCA1BRCA2 Two-Way clustering = clustering the patients and genes

36 How can gene expression be applied for diagnostic ? patinet 1 patient 2 patient 4 patient 3 patient 5 Gen Gen Gen Gen Gen Informative Genes BRCA1BRCA2 Two-Way clustering = clustering the patients and genes

Supervised approaches for diagnostic based on expression data Support Vector Machine SVM

SVM would begin with a set of samples from patients which have been diagnosed as either BRCA1 (red dots) or BRCA2 (blue dots). Each dot represents a vector of the expression pattern taken from the microarray experiment of a patient.

39 How do SVM’s work with expression data? The SVM is trained on data which was classified based on histology. ? After training the SVM to separated the BRCA1 from BRAC2 tumors given the expression data, we can then apply it to diagnose an unknown tumor for which we have the equivalent expression data.

Projects

Key dates 7.12 lists of suggested projects published * *You are highly encouraged to choose a project yourself or find a relevant project which can help in your research 3.1 Final date to chose a project 10.1 Submission project overview (one page) -Title -Main question -Major Tools you are planning to use to answer the questions 11.1 /18.1– meetings on projects 9.3 Poster submission 16.3 Poster presentation Instructions for the final project Introduction to Bioinformatics

2. Planning your research After you have described the main question or questions of your project, you should carefully plan your next steps A. Make sure you understand the problem and read the necessary background to proceed B. formulate your working plan, step by step C. After you have a plan, start from extracting the necessary data and decide on the relevant tools to use at the first step. When running a tool make sure to summarize the results and extract the relevant information you need to answer your question, it is recommended to save the raw data for your records, don't present raw data in your final project. Your initial results should guide you towards your next steps. D. When you feel you explored all tools you can apply to answer your question you should summarize and get to conclusions. Remember NO is also an answer as long as you are sure it is NO. Also remember this is a course project not only a HW exercise..

3.Summarizing final project in a poster (in pairs) Prepare in PPT poster size cm Title of the project Names and affiliation of the students presenting The poster should include 5 sections : Background should include description of your question (can add figure) Goal and Research Plan: Describe the main objective and the research plan Results (main section) : Present your results in 3-4 figures, describe each figure (figure legends) and give a title to each result Conclusions : summarized in points the conclusions of your project References : List the references of paper/databases/tools used for your project Examples of posters will be presented in class