Presentation is loading. Please wait.

Presentation is loading. Please wait.

DNA Motif and protein domain discovery

Similar presentations


Presentation on theme: "DNA Motif and protein domain discovery"— Presentation transcript:

1 DNA Motif and protein domain discovery
Presented by: Deeter Neumann Peter St. Andre PDB; human enhancer binding protein PDB; zinc finger 224

2 Outline What are DNA motifs & proteins domains?
Their importance and function motif algorithms locating domain/motif experimentally available programs: PFAM & SMART Taken fromwikimedia.org

3 What are DNA sequence motifs?
“Sequence motifs are short recurring patterns in DNA that are presumed to have biological function.” D’haeseleer, P. Nature Biotechnology 24, (2006). Image taken from bio.miami.edu

4 Why are DNA sequence motifs important
to know? Indicates common structural protein domains Identifies similar function Other possible biological functions, eg. transcription factors, mRNA processing

5 What is the function of DNA domains?
specific and non-specific interactions permits binding of transcription factor to target gene sequence-specific recognition Precise collection of sequence elements Human Molecular Genetics 3; Strachan & Read

6 What are protein domains?
Protein sequences and structures that evolve, function, and exist independently from the rest of the protein They often form functional units, like metal binding domains Image of human zinc finger domain Taken from .ionchannels.org

7 Why are Proteins Domains Important?
Bind to other molecules in the cell Signal transduction pathways Genetically engineering novel proteins Pharmaceutical importance 7

8 Algorithmic Approaches for both DNA motifs and protein domain searches
Three general approaches are used: Enumeration Deterministic optimization Probabilistic optimization

9 Enumeration Employs the broadest approach Looks at all possible motifs
Few limitations are enacted on it

10 Enumeration, cont. Key point: Covers all possible sequence motifs with few limitations Pros: Does not get stuck in local optimum Cons: May overlook subtle patterns Programs like WeederWeb and YMF use these type of algorithms

11 WeederWeb

12 WeederWeb Results

13 Deterministic optimization
Takes into account an Expectation Maximization model and a position weight matrix MEME is one program that uses this approach What does this mean?

14 Deteriministic optimization, cont.

15 Deterministic optimization, cont.
Taken from ws.nbcr.net/app /meme.html

16 Probabilistic optimization
Uses a Gibbs sampling approach Randomized implementation of expectation maximization model How is this applied?

17 Probabilistic optimization, cont.
Selects random sites and each is weighted against known motifs Allows program to add or remove sequences and continuously update motifs

18 AlignAce 3.0

19 Results

20 Which one to use? Recent research showed that enumeration approaches worked very well Generally accepted that no one approach is the best Programs that incorporate several approaches work the best Important to rerun programs

21 Examples of programs WeederWeb is a web-based interface with an enumerative approach YMF is another enumerative program MEME is an online program that uses a deterministic optimization approach MotifSampler is a program that combines Gibbs sampling and a third order Markov model

22 YMF

23 YMF results

24 Measurements used to score sequence motifs
Three main statistics used: Information content Log likelihood MAP score

25 Other measures of motif quality
Group specificity, or site specificity Probability of having a certain number of target sequences with the site in question Sequence specificity Accounts for both number of sequences with the sites in question and the number of sites per sequence Positional bias, or uniformity Looks at how uniform of the sites in question are distribute with respect to transcription start sites of the gene

26 Identification and preliminary characterization of a protein motif related to the zinc finger
Lovering et al. (1993)

27 What is a zinc finger? autonomously folding domain structural motif
zinc required for folding and DNA interactions PDB; single zinc finger in solution part of protein that is used to regulate DNA

28 Classic zinc finger conserved cysteines and histidines binds with zinc
Tetrahedral structure antiparallel two-stranded β-sheets and an α-helix C2H2--classic Conserved Cys and His form loop (finger) (often tandomly repeated) 2nd class- steroid/nuclear receptor 2 Zn for single-fold domain. 4 cysteines for each zinc 3rd class- GAL4 DNA binding domain Alpha helix in major groove for interaction----> common feature of 3 classes N-terminal of alpha-helix---> DNA interaction image from wikipedia

29 Figure 1A Lovering et al. 377 Amino Acids Glycine-rich region (27%)
Cysteine-rich region near N terminus (residues 15-64, indicated by boxes) Is there anything else you can tell me about this sequence? Lovering et al.

30 Actual RING1 sequence MTTPANAQNASKTWELSLYELHRTPQEAIMDGTEIAVSPRSLHSELMCPICLDMLKNTMTTKECLHRFCSDCIVTALRSGNKECPTCRKKLVSKRSLRPDPNFDALISKIYPSREEYEAHQDRVLIRLSRLHNQQALSSSIEEGLRMQAMHRAQRVRRPIPGSDQTTTMSGGEGEPGEGEGDGEDVSSDSAPDSAPGPAPKRPRGGGAGGSSVGTGGGGTGGVGGGAGSEDSGDRGGTLGGGTLGPPSPPGAPSPPEPGGEIELVFRPHPLLVEKGEYCQTRYVKTTGNATVDHLSKYLALRIALERRQQQEAGEPGGPGGGASDTGGPDGCGGEGGGAGGGDGPEEPALPSLEGVSEKQYTIYIAPGGGAFTTLNGSLTLELVNEKFWKVSRPLELCYAPTKDPK 406 Amino Acids (compared to 377) Residues (not 15-64)

31 RING finger Cys1-Xaa-hydrophobic aa-Cys2-Xaa9-27-Cys3-Xaa1-3-His- Xaa-hydrophobic aa-Cys4-Xaa2-Cys5-hydrophobic aa- Xaa5-47-Cys6-Xaa2-Cys7 Summary of extended motif Cys and His always conserved in RING finger family Conservation one residue before Cys 2, 3, 4 (Phe), 7 one residue after Cys 5, 6 (Pro), 7 generally hydrophobic Important for secondary and tertiary structure Positioning and Orientation of metal ligand

32 Figure 1B Gene expression similar in variety of cell lines
Fig. 1B Lovering et al. Gene expression similar in variety of cell lines Present as 1.6kb Agreement with cDNA w/o poly-A tail

33 Figure 2 DNA binding regulation recombination repair Lovering et al.
Avg length 1st Loop- 11±1aa (NOT CG30 and PE38) 2nd Loop- 10±3aa ( NOT T18) Partially Symmetric???--common structural arrangement for protein-DNA binding motif Regulation: Xenopus(XNF-7) Drosophila (sina, Psc, Su(z)2) DG17 (Dictyostelium) Rpt-1 (mouse) down regulation of IL2R PAS4 (yeast) Recombination: RAG-1 Repair: RAD18 (yeast) Human Oncogenes: MEL18 BMI-1 RET T18 CBL Viral Early Genes: E110 VZ61 CG30 PE-38 EPO Viral Genes: LCMV P11 Ribonucleoprotein Complex: SS-A/Ro Lovering et al.

34 RING1 peptide 55 aa synthetic peptide (residues in RING1 seq) RING finger metal binding ---> prefers Zinc cobalt cadmium copper No binding of Iron, manganese, and magnesium Metal binding was tested for by titrating the synthetic protein with different metals in the presence of Cobalt

35 Figure 3A S-C0(II) ___ cobalt ----- zinc Co(II) d-d transitions
S-Co determined that Cysteine was involved with the binding Co d-d transitions indicate tetrahedral configuration Increase zinc concentrations Characteristic cobalt transitions diminish Cobalt competitor Zinc binds 10x more tightly than cobalt Fig. 3A Lovering et al.

36 Figure 4A Zinc dependence binding Gel mobility-shift assays
Forms discrete band in excess zinc DNA binding is Zn dependent!

37 RING1 function No known function (not published until 1993)
Inhibit transactivation of recombination signal binding protein-J (RBP-J) (Hongyan et al.) Ubiquitin-protein ligases @ time of paper RING1 function unknown (1992) 2004-found to Inhibit transactivation of RBP-J (recombination signal binding protein-J)

38 Pfam database http://pfam.sanger.ac.uk/
Database that contains large collection of protein domains and families Represented as sequence alignments and HMMs List of key features about protein New interface that combined other Pfam versions New updates have made it more user-friendly

39 Pfam search of RING1

40 Pfam search

41 Pfam search results

42 Pfam search results

43 Pfam link out

44 HMM logo of sequence motif

45 SMART http://smart.embl-heidelberg.de/
Multiple sequence alignment of members >400 domains in >54,000 different proteins Searches database using HMMs Facilitate study of domain evolution and multi-domain architecture by correlation with phyletic distributions Detects domains on multiple sequence alignments of representative family members Intended for eukaryotic signal transduction DNA, RNA, chromatin, and actin cytoskeleton functions HMMs used to improve sensitivity of domain and repeat detection----HMMer2

46 SMART 2 different modes normal swiss-Prot SP-TrEMBL ensemble genomic
proteomes of sequenced genomes

47 SMART SMART home page Normal mode Genome mode

48 SMART

49 SMART

50 SMART

51 SMART

52 SMART Interactions of proteins that are associated with LMNA
Different colored lines represent different evidence for relationship experiments-purple databases-aqua textmining-yellow homology-light blue

53 SMART

54 More motif madness

55 PRINTS

56 PRINTS

57 PROSITE

58 PROSITE

59 Questions?

60 How primitive is this RING-finger motif
How primitive is this RING-finger motif? The author only discusses genes containing this motif that come from eukaryotes. Is this motif found in prokaryotes as well?


Download ppt "DNA Motif and protein domain discovery"

Similar presentations


Ads by Google