Download presentation
Presentation is loading. Please wait.
Published byJewel Cobb Modified over 8 years ago
1
Gene expression
2
Gene Expression 2 protein RNA DNA
3
Gene Expression 3 AAAAAAA mRNA gene1 mRNA gene2 mRNA gene3
4
Studying Gene Expression 1987-2013 4 cDNA Microarrays (first high throughput gene expression experiments) DNA chips (High density oligonucleotide microarrays ) RNA-seq (High throughput sequencing)
5
Classical versus modern technologies to study gene expression 5 Classical Methods (Microarrays) -Require prior knowledge on the RNA transcript Good for studying the expression of known genes High throughput RNA sequencing -Do not require prior knowledge Good for discovering new transcripts
6
RNA-seq 6
7
What can we learn from RNAseq? 7 - Comparing the expression between two genes in the same sample - Comparing the expression between the same gene in different samples
8
What can we learn from RNAseq? 8 Comparing the expression between two genes in the same sample PROBLEM : * Genes of different length are expected to have different number of reads * The coverage is strongly dependent on the sequencing depth
9
What can we learn from RNAseq? 9 Possible solution: Normalizing by transcript length and the total number of reads mapped in the experiment RPKM =
10
10 Gene B> Gene A > Gene C Gene A> Gene B > Gene C Problems with Normalization Warning !!! normalization by total number of reads can lead to false detection of differentially expressed genes
11
What can we learn from RNAseq? Comparing the expression between the same gene in different samples Example : Finding new markers for pluripotency (תאי גזע עובריים) (תאים ממוינים) Highly Expressed Lowly Expressed
12
What can we learn from RNAseq? Comparing the expression between the same gene in different samples Fold change (FC) = Ratio between the expression of the gene in sample X to the expression of the gene in sample Y Sample X (Stem cell) Sample Y (Fibroblasts) Is fold change enough to evaluate the difference?
13
Remember: We always need to evaluate the statistical significance of the results Standard measure = q-value (which is the p-value corrected for multiple testing) 13 Finding new markers for pluripotency Expression in stem cells versus fibroblasts Possible candidates for being pluripotent markers
14
14 Clustering the data according to expression profiles. Genes Expression in different conditions NEXT… Highly Expressed Lowly Expressed
15
15 WHY? What can we learn from the clusterers? Diagnostics and Therapy –A set of genes which differs in the gene expression can indicate a disease state Identify gene function –Set of genes with similar gene expression can infer similar function
16
16 Ramaswamy et al, 2003 Nat Genet 33:49-54 Samples were taken from patients with adenocarcinoma. hundreds of genes that differentiate between cancer tissues in different stages of the tumor were found. The arrow shows an example of a tumor cells which were not detected correctly by histological or other clinical parameters. A molecular signature of metastasis in primary solid tumors
17
17 HOW? Different clustering approaches Unsupervised - Hierarchical Clustering - K-means Supervised Methods (למידה מונחית) -Support Vector Machine (SVM)
18
Clustering Clustering organizes things that are close into groups. - What does it mean for two genes to be close? - Once we know this, how do we define groups?
19
What does it mean for two genes to be close? 19 We need a mathematical definition of distance between the expression pattern of two genes Gene 1 Gene 2 Gene1= (E 11, E 12, …, E 1N )’ Gene2= (E 21, E 22, …, E 2N )’ 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22
20
Calculating the distance between two expression patterns 20 Gene1= (E 11, E 12, …, E 1N )’ Gene2= (E 21, E 22, …, E 2N )’ Euclidean distance (ED)= Sqrt of Sum of (E 1i -E 2i ) 2, i=1,…,N We can use many different distance measures Distance X1,Y1 X2,Y2 When N is 100 we have to think abstractly Low Euclidean DistanceHigh similarity
21
Calculating the distance between two expression patterns 21 Pearson correlation coefficient High correlation coefficientHigh similarity
22
Distance and correlations can produce very different results 22 Counts Euclidian distance= 1740 Pearson correlation= 0.9 High similarity Low similarity
23
Clustering the genes according to expression 23 Generate a tree based on the distances between genes (similar to a phylogenetic tree) Each gene is a leaf on the tree Distances reflect the similarity of their expression pattern Hierarchical Clustering Genes Expression in different conditions Gene Cluster
24
24 abcd a 0424 b 404.472.82 c 24.470 d 42.824.470 Clustering the genes according to gene expression Distance Table Distances ( Euclidian distance )* Genes Dab = 4 Dac = 2 Dad = 4 Dbc = 4.47 Dbd = 2.82 Dcd = 4.47 Can be calculated using different distance metrics GENE a 1, -1, 1, 1, 1,-1,-1,-1 GENE b 1, 1, -1, 1, 1, 1,-1, 1 GENE c 1, -1, 1, -1, 1,-1,-1,-1 GENE d -1, 1, -1, 1, 1, 1,-1,-1
25
25 Analyzing the clusters of genes Cluster 2 Cluster 3 Cluster 4
26
26 What can we learn from clusters with similar gene expression ?? Similar expression between genes -The genes have similar function -The genes work together in the same pathway /complex -All genes are controlled by a common regulatory genes
27
27 What can we learn from clusters with similar gene expression ?? Similar expression between genes -The genes have similar function -The genes work together in the same pathway /complex -All genes are controlled by a common regulatory genes
28
28 Example: Identifying genes that have similar function HnRNPA1 and SRp40 are not clear homologs based on blast e-value but have a very similar gene expression pattern in different tissues
29
29 Are hnRNP A1 and SRp40 functionally homologs ?? SF SRP40 hnRNP A1 YES!!!!
30
30 What can we learn from clusters with similar gene expression ?? Similar expression between genes –The genes have similar function –The genes work together in the same pathway /complex –All genes are controlled by a common regulatory genes
31
31 Example: Genes work together in the same complex Counts Transcription Factor Long non-coding RNA TF
32
32 How can gene expression help in diagnostics?
33
How can gene-expression help in diagnostics ? Different patients (BRCA1 or BRCA2) RESEARCH QUESTION Can we distinguish BRCA1 from BRCA2– cancers based solely on their gene expression profiles? HERE we want to cluster the patients not the genes !!! Genes
34
34 How can gene expression be applied for diagnostic ? Patient 1 patient 2 patient 3 patient 4 patient 5 Gen1 +--++ Gen2 ++-+- Gen3 -+++- Gen4 +++-- Gen5 --+-+ 5 Breast Cancer Patient
35
35 How can gene expression be applied for diagnostic ? patinet 1 patient 2 patient 4 patient 3 patient 5 Gen1 +-+-+ Gen3 -+++- Gen4 ++-+- Gen2 +++-- Gen5 ---++ BRCA1BRCA2 Two-Way clustering = clustering the patients and genes
36
36 How can gene expression be applied for diagnostic ? patinet 1 patient 2 patient 4 patient 3 patient 5 Gen1 +-+-+ Gen3 -+++- Gen4 ++-+- Gen2 +++-- Gen5 ---++ Informative Genes BRCA1BRCA2 Two-Way clustering = clustering the patients and genes
37
Supervised approaches for diagnostic based on expression data Support Vector Machine SVM
38
SVM would begin with a set of samples from patients which have been diagnosed as either BRCA1 (red dots) or BRCA2 (blue dots). Each dot represents a vector of the expression pattern taken from the microarray experiment of a patient.
39
39 How do SVM’s work with expression data? The SVM is trained on data which was classified based on histology. ? After training the SVM to separated the BRCA1 from BRAC2 tumors given the expression data, we can then apply it to diagnose an unknown tumor for which we have the equivalent expression data.
40
Projects 2015-16
41
Key dates 7.12 lists of suggested projects published * *You are highly encouraged to choose a project yourself or find a relevant project which can help in your research 3.1 Final date to chose a project 10.1 Submission project overview (one page) -Title -Main question -Major Tools you are planning to use to answer the questions 11.1 /18.1– meetings on projects 9.3 Poster submission 16.3 Poster presentation Instructions for the final project Introduction to Bioinformatics 2013-14
42
2. Planning your research After you have described the main question or questions of your project, you should carefully plan your next steps A. Make sure you understand the problem and read the necessary background to proceed B. formulate your working plan, step by step C. After you have a plan, start from extracting the necessary data and decide on the relevant tools to use at the first step. When running a tool make sure to summarize the results and extract the relevant information you need to answer your question, it is recommended to save the raw data for your records, don't present raw data in your final project. Your initial results should guide you towards your next steps. D. When you feel you explored all tools you can apply to answer your question you should summarize and get to conclusions. Remember NO is also an answer as long as you are sure it is NO. Also remember this is a course project not only a HW exercise..
43
3.Summarizing final project in a poster (in pairs) Prepare in PPT poster size 90-120 cm Title of the project Names and affiliation of the students presenting The poster should include 5 sections : Background should include description of your question (can add figure) Goal and Research Plan: Describe the main objective and the research plan Results (main section) : Present your results in 3-4 figures, describe each figure (figure legends) and give a title to each result Conclusions : summarized in points the conclusions of your project References : List the references of paper/databases/tools used for your project Examples of posters will be presented in class
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.