Presentation is loading. Please wait.

Presentation is loading. Please wait.

Projects 2016-17.

Similar presentations


Presentation on theme: "Projects 2016-17."— Presentation transcript:

1 Projects

2 Instructions for the final project
Introduction to Bioinformatics Key dates lists of suggested projects published * *You are highly encouraged to choose a project yourself or find a relevant project which can help in your research Final date to chose a project 23.1– meetings on projects (individual pairs with supervisor) 30.1- Submission project overview (one page) 15.3 Poster submission 22.3 Poster presentation (12-2PM)

3 Working on your project-step by step
Projects are conducted in pairs ! 1. Choose a topic (either from the list or your own idea) 2. Preparing for the first meetings After you have chosen the topic you should start planning the project Make sure you understand the problem and read the necessary biological background As most projects require working on a specific data set you should search for the most suitable data for your project . At this stage you should download the data and explore it Now that you have the information you should formulate your working plan and think of the relevant tools you would need to use Prepare questions Being prepared for the meeting is the first important step for succeeding in the project!!!

4 The proposal should be one page and include -Title -Main question
Write a proposal Following the meeting with your supervisor you will need to write a short one page proposal , describing your project The proposal should be one page and include -Title -Main question -Major Tools you are planning to use to answer the questions 4. Working on your proposal Following the submission of the proposal and the feedback from your supervisor you can start working on the project following your proposed plan. Your initial results should guide you towards your next steps. During the work you are highly encouraged to consult with your supervisor. Important! make sure to summarize the results and extract the relevant information needed to answer your question, it is recommended to save the raw data for your records , but don't present raw data in your poster

5 ! Remember NO is also an answer as long as you are sure it is NO.
5. Summarize the project When you feel you explored all tools you can apply to answer your question you should summarize your results and get to conclusions. At this stage it is highly recommended to set a meeting with your supervisor . ! Remember NO is also an answer as long as you are sure it is NO. .

6 6. Summarizing final project in a poster
Prepare in PPT poster size cm Title of the project Names and affiliation of the students presenting The poster should include 5 sections : Background should include description of your question (can add figure) Goal and Research Plan: Describe the main objective and the research plan Results (main section) : Present your results in 3-4 figures, describe each figure (figure legends) and give a title to each result Conclusions : summarized in points the conclusions of your project References : List the references of paper/databases/tools used for your project Examples of posters will be presented in class

7 Motif Search

8 What are Motifs Motif (dictionary) A recurrent thematic element, a common theme

9 Find a common motif in the text

10 Find a short common motif in the text

11 Motifs in biological sequences
Sequence motifs represent a short common sequence (length 4-20) which is highly represented in the data

12 Motifs in biological sequences
What can we learn from these motifs? Regulatory motifs on DNA or RNA Functional sites in proteins

13 Regulatory Motifs on DNA
Transcription Factors (TF) are regulatory protein that bind to regulatory motifs near the gene and act as a switch bottom (on/off) TF binding motifs are usually 6 – 20 nucleotides long located near target gene, mostly upstream the transcription start site Transcription Start Site TF1 TF2 Gene X TF1 motif TF2 motif

14 What can we learn from these motifs?
About half of all cancer patients have a mutation in a gene called p53 which codes for a key Transcription factors. The mutations are in the DNA binding region and allows tumors to survive and continue growing even after chemotherapy severely damages their DNA P53 Transcription Factor Target Gene Binding sites (moifs)

15 Why is P53 involved in so many cancer types?
p53 regulated over 100 different genes (hub) We are interested to identify the genes regulated by p53

16 Can we find TF targets using a bioinformatics approach?

17 Finding TF targets using a bioinformatics approach?
Scenario 1 : Binding motif is known (easier case) Scenario 2 : Binding motif is unknown (hard case)

18 Scenario 1 : Binding motif is known
Given a motif find the binding sites in an input sequence

19 Challenges in biological sequences
Motifs are usually not exact words …….

20 How to present non exact motifs?

21 How to present non exact motifs?
Position Specific Scoring Matrix (PSSM) Probability for each base in each position Seq 1 AAAGCCC Seq 2 CTATCCA Seq 3 CTATCCC Seq 4 CTATCCC Seq 5 GTATCCC Seq 6 CTATCCC Seq 7 CTATCCC Seq 8 CTATCCC Seq 9 TTATCTG 1 2 3 4 5 6 A T G C

22 The PSSM can be also represented as a sequence logo
-A letter’s height indicates the information it contains

23 Presenting a sequence motif as a logo
PSSM PWM 1 2 3 4 5 6 A G C T 1 2 3 4 5 6 A 0.75 0.25 G C T TTCACG TACATG TACAGG TACAAG Divide each score by background probability 0.25 PWM= Position Weight Matrix Letter Height Log2S T position 1=Log24=2 T position 5=Log21=0

24 חידה מהו המקסימום גובה שנוכל לקבל בלוגו שמתאר מוטיב שהתקבל מרצפי חלבונים??

25 How to search for a motif in a sequence given a PSSM:
Given a string s of length l = 7 s = s1s2…sl Pr(s | W) = Example: Pr(CTAATCCG) = 0.67 x 0.89 x 1 x 1 x 0.89 x 1 x 0.89 x 0.11 1 9 A 6 8 7 C G T .11 1 A .67 .89 .78 C G T W Counts of each base In each column Probability of each base In each column Wk = probability of base  in column k

26 How to search for a motif in a sequence given a PSSM:
Given sequence S (e.g., 1000 base-pairs long) For each substring s of S, Compute Pr(s|W) Define if Pr(s|W) > threshold The threshold is calculated based on the probability to find it in random !! And can be different for each motif. Open question: What do we do when searching motifs in DNA?

27 Scenario 2 : Binding motif is unknown
“Ab initio motif finding” Why is it hard???

28 Are common motifs the right thing to search for ?

29 ?

30 Solutions: -Searching for motifs which are enriched in one set but not in a random set - Use experimental information to rank the sequences according to their binding affinity and search for enriched motifs at the top of the list

31 Chromatin Immunoprecipitation followed by sequencing
ChIP-Seq Chromatin Immunoprecipitation followed by sequencing Finding regions in the genome to which a DNA binding protein (transcription factor) binds to.

32 Cross-linking immunoprecipitation
CLIP-Seq Cross-linking immunoprecipitation Identifying the regions in the transcriptome to which an RNA binding protein (e.g. splicing factor) binds to.

33 Finding the p53 binding motif in a set of p53 target sequences which are ranked according to binding affinity Best Binders ChIP –seq Weak Binders

34 a word search approach to search for enriched motif in a ranked list
Ranked sequences list CTGTGA CTGTGA CTGTGA CTGTGA Candidate k-mers CTACGC ACTTGA ACGTGA ACGTGC CTGTGC CTGTGA CTGTAC ATGTGC ATGTGA CTATGC CTGTGA CTGTGC CTGTGA CTGTGA CTGTGA

35 uses the minimal hyper geometric statistics (mHG) to find enriched motifs
The total number of input sequences The number of sequences containing the motif The number of sequences at the top of the list The number of sequences containing the motif among the top sequences Ranked sequences list CTGTGA CTGTGA CTGTGA CTGTGA CTGTGA CTGTGA CTGTGA CTGTGA

36 The enriched motifs are combined to get a PSSM which represents the binding motif

37

38 P[ED]XK[RW][RK]X[ED]
Protein Motifs Protein motifs are usually 6-20 amino acids long and can be represented as a consensus/profile: P[ED]XK[RW][RK]X[ED] or as PWM


Download ppt "Projects 2016-17."

Similar presentations


Ads by Google