Download presentation
Presentation is loading. Please wait.
1
From motif search to gene expression analysis
2
Finding TF targets using a bioinformatics approach
Scenario 1 : Binding motif is known (easier case) Scenario 2 : Binding motif is unknown (hard case)
3
Are common motifs the right thing to search for ?
4
Solutions: -Searching for motifs which are enriched in one set but not in a random set - Use experimental information to rank the sequences according to their binding affinity and search for enriched motifs at the top of the list
5
ChIP-Seq Sequencing the regions in the genome to which a protein (e.g. transcription factor) binds to.
6
Finding the p53 binding motif in a set of p53 target sequences which are ranked according to binding affinity Best Binders ChIP –SEQ Weak Binders
7
a word search approach to search for enriched motif in a ranked list
Ranked sequences list CTGTGA CTGTGA CTGTGA CTGTGA Candidate k-mers CTACGC ACTTGA ACGTGA ACGTGC CTGTGC CTGTGA CTGTAC ATGTGC ATGTGA CTATGC CTGTGA CTGTGC CTGTGA CTGTGA CTGTGA
8
uses the minimal hyper geometric statistics (mHG) to find enriched motifs
The total number of input sequences The number of sequences containing the motif The number of sequences at the top of the list The number of sequences containing the motif among the top sequences Ranked sequences list CTGTGA CTGTGA CTGTGA CTGTGA CTGTGA CTGTGA CTGTGA CTGTGA
9
The enriched motifs are combined to get a PSSM which represents the binding motif
11
P[ED]XK[RW][RK]X[ED]
Protein Motifs Protein motifs are usually 6-20 amino acids long and can be represented as a consensus/profile: P[ED]XK[RW][RK]X[ED] or as PWM
12
Gene Expression Analysis
13
Gene Expression DNA RNA protein
14
Gene Expression mRNA gene1 mRNA gene2 mRNA gene3 AAAAAAA AAAAAAA
15
Studying Gene Expression 1987-2013
Microarray (first high throughput gene expression experiments) DNA chips RNA-seq (Next Generation Sequencing)
16
Classical versus modern technologies to study gene expression
Classical Methods (Spotted microarray, DNA chips) -Require prior knowledge on the RNA transcript Good for studying the expression of known genes New generation RNA sequencing Do not require prior knowledge Good for discovering new transcripts
17
Experimental Protocol Two channel cDNA arrays
18
One channel DNA chips Each sequence is represented by a probe set colored with one fluorescent dye Target hybridizes to complimentary probes only The fluorescence intensity is indicative of the expression of the target sequence
19
Affymetrix Chip
20
RNA-seq
21
Clustering the data according to expression profiles
NEXT… Clustering the data according to expression profiles . Genes Expression in different conditions
22
WHY? What can we learn from the clusterers?
Identify gene function Similar expression can infer similar function Diagnostics and Therapy Different genes expression can indicate a disease state Genes which change expression in a disease can be good candidates for drug targets
23
A molecular signature of metastasis in primary solid tumors
Samples were taken from patients with adenocarcinoma. Hundreds of genes that differentiate between cancer tissues in different stages of the tumor were found. The arrow shows an example of a tumor cells which were not detected correctly by histological or other clinical parameters. Ramaswamy et al, 2003 Nat Genet 33:49-54
24
HOW? Different clustering approaches
Unsupervised - Hierarchical Clustering - K-means Supervised Methods -Support Vector Machine (SVM)
25
Clustering Clustering organizes things that are close into groups.
- What does it mean for two genes to be close? - Once we know this, how do we define groups? Notice we do this ourselves all the time: divide people by race, divide animals into families, etc…
26
What does it mean for two genes to be close?
We need a mathematical definition of distance between the expression pattern of two genes For example distance between gene 1 and 2 Gene 1 Gene 2 Gene1= (E11, E12, …, E1N)’ Gene2= (E21, E22, …, E2N)’ Euclidean distance= Sqrt of Sum of (E1i -E2i)2, i=1,…,N
27
Clustering the genes according to expression
Hierarchical Clustering Generate a tree based on the distances between genes (similar to a phylogenetic tree) Each gene is a leaf on the tree Distances reflect the similarity of their expression pattern Gene Cluster Genes Expression in different conditions
28
Clustering the genes according to gene expression
Distance Table GENE a 1, -1, 1, 1, 1,-1,-1,-1 GENE b 1, 1, -1, 1, 1, 1,-1, 1 GENE c 1, -1, 1, -1, 1,-1,-1,-1 GENE d -1, 1, -1, 1, 1, 1,-1,-1 a b c d 4 2 4.47 2.82 Distances (Euclidian distance)* Dab = 4 Dac = 2 Dad = 4 Dbc = 4.47 Dbd = 2.82 Dcd = 4.47 Can be calculated using different distance metrics 28
29
Analyzing the clusters of genes
30
What can we learn from clusters with similar gene expression ??
31
EXAMPLE- hnRNP A1 and SRp40 HnRNPA1 and SRp40 are not clear homologs based on blast e-value but have a very similar gene expression pattern in different tissues
32
Are hnRNP A1 and SRp40 functionally homologs ??
SF SF SF SF SF SF SF SF SF SF SF SF SRP40 YES!!!!
33
What else can we learn from clusters with similar gene expression ??
Similar expression between genes The genes have similar function One gene controls the other All genes are controlled by a common regulatory genes
34
How can gene expression help in diagnostics?
35
How can gene-expression help in diagnostics ?
Different patients (BRCA1 or BRCA2) Genes RESEARCH QUESTION Can we distinguish BRCA1 from BRCA2– cancers based solely on their gene expression profiles? HERE we want to cluster the patients not the genes !!!
36
+ - How can gene expression be applied for diagnostic ?
5 Breast Cancer Patient Patient 1 patient 2 patient 3 patient4 patient 5 Gen1 + - Gen2 Gen3 Gen4 Gen5
37
+ - How can gene expression be applied for diagnostic ? BRCA1 BRCA2
patinet1 patient 2 patient4 patient 3 patient 5 Gen1 + - Gen3 Gen4 Gen2 Gen5 Informative Genes Two-Way clustering = clustering the patients and genes
38
Supervised approaches for diagnostic based on expression data
Support Vector Machine SVM
39
SVM would begin with a set of samples from patients which have been diagnosed as either BRCA1 (red dots) or BRCA2 (blue dots). Each dot represents a vector of the expression pattern taken from the microarray experiment of a patient.
40
How do SVM’s work with expression data?
The SVM is trained on data which was classified based on histology. ? After training the SVM to separated the BRCA1 from BRAC2 tumors given the expression data, we can then apply it to diagnose an unknown tumor for which we have the equivalent expression data .
41
Projects
42
Instructions for the final project
Introduction to Bioinformatics Key dates lists of suggested projects published * *You are highly encouraged to choose a project yourself or find a relevant project which can help in your research 9.1 Submission project overview (one page) -Title -Main question -Major Tools you are planning to use to answer the questions Final week – meetings on projects 12.3 Poster submission 19.3 Poster presentation
43
2. Planning your research
After you have described the main question or questions of your project, you should carefully plan your next steps A. Make sure you understand the problem and read the necessary background to proceed B. formulate your working plan, step by step C. After you have a plan, start from extracting the necessary data and decide on the relevant tools to use at the first step. When running a tool make sure to summarize the results and extract the relevant information you need to answer your question, it is recommended to save the raw data for your records , don't present raw data in your final project. Your initial results should guide you towards your next steps. D. When you feel you explored all tools you can apply to answer your question you should summarize and get to conclusions. Remember NO is also an answer as long as you are sure it is NO. Also remember this is a course project not only a HW exercise. .
44
Summarizing final project in a poster (in pairs)
Prepare in PPT poster size cm Title of the project Names and affiliation of the students presenting The poster should include 5 sections : Background should include description of your question (can add figure) Goal and Research Plan: Describe the main objective and the research plan Results (main section) : Present your results in 3-4 figures, describe each figure (figure legends) and give a title to each result Conclusions : summarized in points the conclusions of your project References : List the references of paper/databases/tools used for your project Examples of posters will be presented in class
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.