Presentation is loading. Please wait.

Presentation is loading. Please wait.

From motif search to gene expression analysis

Similar presentations


Presentation on theme: "From motif search to gene expression analysis"— Presentation transcript:

1 From motif search to gene expression analysis

2 Finding TF targets using a bioinformatics approach
Scenario 1 : Binding motif is known (easier case) Scenario 2 : Binding motif is unknown (hard case)

3 Are common motifs the right thing to search for ?

4 Solutions: -Searching for motifs which are enriched in one set but not in a random set - Use experimental information to rank the sequences according to their binding affinity and search for enriched motifs at the top of the list

5 ChIP-Seq Sequencing the regions in the genome to which a protein (e.g. transcription factor) binds to.

6 Finding the p53 binding motif in a set of p53 target sequences which are ranked according to binding affinity Best Binders ChIP –SEQ Weak Binders

7 a word search approach to search for enriched motif in a ranked list
Ranked sequences list CTGTGA CTGTGA CTGTGA CTGTGA Candidate k-mers CTACGC ACTTGA ACGTGA ACGTGC CTGTGC CTGTGA CTGTAC ATGTGC ATGTGA CTATGC CTGTGA CTGTGC CTGTGA CTGTGA CTGTGA

8 uses the minimal hyper geometric statistics (mHG) to find enriched motifs
The total number of input sequences The number of sequences containing the motif The number of sequences at the top of the list The number of sequences containing the motif among the top sequences Ranked sequences list CTGTGA CTGTGA CTGTGA CTGTGA CTGTGA CTGTGA CTGTGA CTGTGA

9 The enriched motifs are combined to get a PSSM which represents the binding motif

10

11 P[ED]XK[RW][RK]X[ED]
Protein Motifs Protein motifs are usually 6-20 amino acids long and can be represented as a consensus/profile: P[ED]XK[RW][RK]X[ED] or as PWM

12 Gene Expression Analysis

13 Gene Expression DNA RNA protein

14 Gene Expression mRNA gene1 mRNA gene2 mRNA gene3 AAAAAAA AAAAAAA

15 Studying Gene Expression 1987-2013
Microarray (first high throughput gene expression experiments) DNA chips RNA-seq (Next Generation Sequencing)

16 Classical versus modern technologies to study gene expression
Classical Methods (Spotted microarray, DNA chips) -Require prior knowledge on the RNA transcript Good for studying the expression of known genes New generation RNA sequencing Do not require prior knowledge Good for discovering new transcripts

17 Experimental Protocol Two channel cDNA arrays

18 One channel DNA chips Each sequence is represented by a probe set colored with one fluorescent dye Target hybridizes to complimentary probes only The fluorescence intensity is indicative of the expression of the target sequence

19 Affymetrix Chip

20 RNA-seq

21 Clustering the data according to expression profiles
NEXT… Clustering the data according to expression profiles . Genes Expression in different conditions

22 WHY? What can we learn from the clusterers?
Identify gene function Similar expression can infer similar function Diagnostics and Therapy Different genes expression can indicate a disease state Genes which change expression in a disease can be good candidates for drug targets

23 A molecular signature of metastasis in primary solid tumors
Samples were taken from patients with adenocarcinoma. Hundreds of genes that differentiate between cancer tissues in different stages of the tumor were found. The arrow shows an example of a tumor cells which were not detected correctly by histological or other clinical parameters. Ramaswamy et al, 2003 Nat Genet 33:49-54

24 HOW? Different clustering approaches
Unsupervised - Hierarchical Clustering - K-means Supervised Methods -Support Vector Machine (SVM)

25 Clustering Clustering organizes things that are close into groups.
- What does it mean for two genes to be close? - Once we know this, how do we define groups? Notice we do this ourselves all the time: divide people by race, divide animals into families, etc…

26 What does it mean for two genes to be close?
We need a mathematical definition of distance between the expression pattern of two genes For example distance between gene 1 and 2 Gene 1 Gene 2 Gene1= (E11, E12, …, E1N)’ Gene2= (E21, E22, …, E2N)’ Euclidean distance= Sqrt of Sum of (E1i -E2i)2, i=1,…,N

27 Clustering the genes according to expression
Hierarchical Clustering Generate a tree based on the distances between genes (similar to a phylogenetic tree) Each gene is a leaf on the tree Distances reflect the similarity of their expression pattern Gene Cluster Genes Expression in different conditions

28 Clustering the genes according to gene expression
Distance Table GENE a 1, -1, 1, 1, 1,-1,-1,-1 GENE b 1, 1, -1, 1, 1, 1,-1, 1 GENE c 1, -1, 1, -1, 1,-1,-1,-1 GENE d -1, 1, -1, 1, 1, 1,-1,-1 a b c d 4 2 4.47 2.82 Distances (Euclidian distance)* Dab = 4 Dac = 2 Dad = 4 Dbc = 4.47 Dbd = 2.82 Dcd = 4.47 Can be calculated using different distance metrics 28

29 Analyzing the clusters of genes

30 What can we learn from clusters with similar gene expression ??

31 EXAMPLE- hnRNP A1 and SRp40 HnRNPA1 and SRp40 are not clear homologs based on blast e-value but have a very similar gene expression pattern in different tissues

32 Are hnRNP A1 and SRp40 functionally homologs ??
SF SF SF SF SF SF SF SF SF SF SF SF SRP40 YES!!!!

33 What else can we learn from clusters with similar gene expression ??
Similar expression between genes The genes have similar function One gene controls the other All genes are controlled by a common regulatory genes

34 How can gene expression help in diagnostics?

35 How can gene-expression help in diagnostics ?
Different patients (BRCA1 or BRCA2) Genes RESEARCH QUESTION Can we distinguish BRCA1 from BRCA2– cancers based solely on their gene expression profiles? HERE we want to cluster the patients not the genes !!!

36 + - How can gene expression be applied for diagnostic ?
5 Breast Cancer Patient Patient 1 patient 2 patient 3 patient4 patient 5 Gen1 + - Gen2 Gen3 Gen4 Gen5

37 + - How can gene expression be applied for diagnostic ? BRCA1 BRCA2
patinet1 patient 2 patient4 patient 3 patient 5 Gen1 + - Gen3 Gen4 Gen2 Gen5 Informative Genes Two-Way clustering = clustering the patients and genes

38 Supervised approaches for diagnostic based on expression data
Support Vector Machine SVM

39 SVM would begin with a set of samples from patients which have been diagnosed as either BRCA1 (red dots) or BRCA2 (blue dots). Each dot represents a vector of the expression pattern taken from the microarray experiment of a patient.

40 How do SVM’s work with expression data?
The SVM is trained on data which was classified based on histology. ? After training the SVM to separated the BRCA1 from BRAC2 tumors given the expression data, we can then apply it to diagnose an unknown tumor for which we have the equivalent expression data .

41 Projects

42 Instructions for the final project
Introduction to Bioinformatics Key dates lists of suggested projects published * *You are highly encouraged to choose a project yourself or find a relevant project which can help in your research 9.1 Submission project overview (one page) -Title -Main question -Major Tools you are planning to use to answer the questions Final week – meetings on projects 12.3 Poster submission 19.3 Poster presentation

43 2. Planning your research
After you have described the main question or questions of your project, you should carefully plan your next steps A. Make sure you understand the problem and read the necessary background to proceed B. formulate your working plan, step by step C. After you have a plan, start from extracting the necessary data and decide on the relevant tools to use at the first step. When running a tool make sure to summarize the results and extract the relevant information you need to answer your question, it is recommended to save the raw data for your records , don't present raw data in your final project. Your initial results should guide you towards your next steps. D. When you feel you explored all tools you can apply to answer your question you should summarize and get to conclusions. Remember NO is also an answer as long as you are sure it is NO. Also remember this is a course project not only a HW exercise. .

44 Summarizing final project in a poster (in pairs)
Prepare in PPT poster size cm Title of the project Names and affiliation of the students presenting The poster should include 5 sections : Background should include description of your question (can add figure) Goal and Research Plan: Describe the main objective and the research plan Results (main section) : Present your results in 3-4 figures, describe each figure (figure legends) and give a title to each result Conclusions : summarized in points the conclusions of your project References : List the references of paper/databases/tools used for your project Examples of posters will be presented in class


Download ppt "From motif search to gene expression analysis"

Similar presentations


Ads by Google