Presentation is loading. Please wait.

Presentation is loading. Please wait.

Promoter Analysis TFBS Detection Daniel Rico, PhD. Daniel Rico, PhD.

Similar presentations


Presentation on theme: "Promoter Analysis TFBS Detection Daniel Rico, PhD. Daniel Rico, PhD."— Presentation transcript:

1 Promoter Analysis TFBS Detection Daniel Rico, PhD. drico@cnio.es Daniel Rico, PhD. drico@cnio.es

2 1.Promoters and gene regulation in Eukaryotes 2.Position Weight Matrices (PWM) 3.PWM Databases 4.TFBS prediction using PWMs 5.Pattern Discovery: Finding unknown motifs 6.Exercise: Use the human NOS2 sequence to predict TFBS with Match and JASPAR 2

3 Transcription Factor Binding Sites 1.Promoters and gene regulation in Eukaryotes 2.Position Weight Matrices (PWM) 3.PWM Databases 4.TFBS prediction using PWMs 5.Exercise: Use the human NOS2 sequence to predict TFBS with Match and JASPAR 3

4 4 Gene Enhancer TSS: Transcription Start Site “Proximal” promoter (100bp-2Kb 5’ Upstream)

5 P ROMOTERS Promoters are DNA segments upstream of transcripts that initiate transcription Promoter attracts RNA Polymerase to the transcription start site 5’5’ Promoter 3’3’ 5

6 GENES IN ENSEMBL 6 5’ Forward (+) strand 3’ Reverse (-) strand

7 7 Transcription Termination Site Transcription Start Site Transcription Start Site

8 P ROMOTER S TRUCTURE IN P ROKARYOTES (E.C OLI ) Transcription starts at offset 0. Pribnow Box (-10) Gilbert Box (-30) Ribosomal Binding Site (+10) 8

9 9 P ROMOTER S TRUCTURE IN E UKARYOTES

10 10 CAGE (Cap Analysis of Gene Expression)) detects the transcriptional activity of each promoter transcript. Experimental Transcription Start Sites (TSS) by CAGE

11 11 Representation of CAGE preparation protocol adapted to various platforms. Now Solexa and Illumina are preferred. 454 Life Sciences (FLX system) is not used any longer because concatenation requires additional PCR cycles and complicated manipulation. In the future, single- molecule sequencing technology will be preferred because PCR may not be required.

12 12 http://www.osc.riken.jp/english/activity/cage/basic/

13 13 http://fantom.gsc.riken.jp/4/edgeexpress/view/

14 http://www.epd.isb-sib.ch/ 14

15 15

16 S EQUENCE A NALYSIS : S EARCHING T RANSCRIPTION F ACTOR B INDING S ITES (TFBS) 16

17 TFBS: D ETECTION METHODS in vivo Functional analysis ChIP in vitro on cloned fragment Footprinting reactions Exonuclease digests Gel retardation (EMSA) UV Crosslinking in vitro on artificial DNA: SELEX: Systematic Evolution of Ligands by Exponential enrichment 17

18 18 Affinity Specificity Nat Rev Genet. 2010 Nov;11(11):751-60. Epub 2010 Sep 28. Determining the specificity of protein-DNA interactions. T RANSCRIPTION F ACTORS BIND TO TFBS IN DNA

19 19 TF B INDING S ITES Problems: often poorly defined consensus Sequences not conserved within species, and even worse between species Examples of enhancers functionally conserved but not sequence-conserved Most of the TFBS sequence data comes from just a few species Very often in vitro experiments 2 completely different binding sites could be merged in the same matrix/consensus 19

20 Transcription Factor Binding Sites 1.Promoters and gene regulation in Eukaryotes 2.Position Weight Matrices (PWM) 3.PWM Databases 4.TFBS prediction using PWMs 5.Pattern Discovery: Finding unknown motifs 6.Exercise: Use the human NOS2 sequence to predict TFBS with Match and JASPAR 20

21 Data collection Probabilities can be calculated and corrected for background Also called position-specific scoring matrices (PSSMs). In log scale. 21

22 F ROM PFM TO PWM/PSSM Transcription Factor Binding Sites 22

23 SEQUENCE LOGOS: The information content of a matrix column ranges from 0 (no base preference) and 2 (only 1 base used). http://weblogo.berkeley.edu/ http://www.lecb.ncifcrf.gov/~toms/sequencelogo.html 23

24 AAGTTC AAGCTC AGGCTC AAGGTC A 430000 C 000204 G 014100 T 000140 Consensus: ARGBTC Summary 24

25 Transcription Factor Binding Sites 1.Promoters and gene regulation in Eukaryotes 2.Position Weight Matrices (PWM) 3.PWM Databases 4.TFBS prediction using PWMs 5.Pattern Discovery: Finding unknown motifs 6.Exercise: Obtain mouse and human fosB promoters and predict TFBS with Match and JASPAR 25

26 26 Transfac: not free, 848 matrices, loads of information and references, quality score based on methods used Jaspar: open sources, 123 matrices, minimal information, majority based on SELEX method (80%) 26

27 TRANSFAC® 27 http://www.gene-regulation.com/pub/databases.html

28 http://jaspar.cgb.ki.se/ http://jaspar.genereg.net/ 28

29 29 J ASPAR EXAMPLE : P AX 6 29

30 Transcription Factor Binding Sites 1.Promoters and gene regulation in Eukaryotes 2.Position Weight Matrices (PWM) 3.PWM Databases 4.Pattern Matching: TFBS prediction using PWMs 5.Pattern Discovery: Finding unknown motifs 6.Exercise: Use the human NOS2 sequence to predict TFBS with Match and JASPAR 30

31 Click here to select all TFBS 31

32 Transcription Factor Binding Sites 1.Promoters and gene regulation in Eukaryotes 2.Position Weight Matrices (PWM) 3.PWM Databases 4.Pattern Matching: TFBS prediction using PWMs 5.Pattern Discovery: Finding unknown motifs 6.Exercise: Use the human NOS2 sequence to predict TFBS with Match and JASPAR 32

33 33 P ATTERN DISCOVERY Reference Genome Seq. oligoexpected frequency AAAAAA0.00024 AAAAAC0.00030 AAAAAG0.00031 AAAAAT0.00024 AAAACC0.00028 … Sequences of interest Seq. oligoobserved frequency AAAAAA0.00023 AAAAAC0.00031 AAAAAG0.00125 AAAAAT0.00018 AAAACC0.00026 … *** 33

34 http://meme.sdsc.edu/meme/ 34

35 Transcription Factor Binding Sites 1.Promoters and gene regulation in Eukaryotes 2.Position Weight Matrices (PWM) 3.PWM Databases 4.Pattern Matching: TFBS prediction using PWMs 5.Pattern Discovery: Finding unknown motifs 6.Exercise: Use the human NOS2 sequence to predict TFBS with Match and JASPAR 35

36 EXERCISE Step by step a.Download from UCSC or Ensembl the human NOS2 gene plus 5000 bases upstream. Select the “proximal promoter” first 1Kb: from -1000 to TSS (hint: there is no zero position!) b.Go to JASPAR and search for TFBS in promoter with the defaults. c.Do the same exercise with the mouse NOS2. d.Compare the results. 36

37 C HROMATIN A CCESSIBILITY Access to experimental information 37

38 http://www.nature.com/scitable/

39

40 E UCROMATINA Y H ETEROCROMATINA Replicatión tardía (late)Replicatión temprana (early)

41

42

43

44 Nat Rev Genet. 2011 Jul 12;12(8):554-64. doi: 10.1038/nrg3017. Determinants and dynamics of genome accessibility.

45

46

47 ENCODE: WWW. GENOME. GOV /10005107 ENCyclopedia of DNA Elements, NHGRI Consortium of international researchers UCSC is the Data Coordination Center 47 Slides from http://www.openhelix.com/ENCODE

48 ENCODE B ACKGROUND Pilot phase, or phase I: www.genome.gov/26525202 Selected regions of the genome: 1%, 30 MB 48

49 ENCODE P ILOT D ATA AND B EYOND ENCODE portal: http://genome.ucsc.edu/ENCODE/ Pilot ENCODE browser: genome.ucsc.edu/ENCODE/pilot.html 49

50 ENCODE N EXT P HASE : P RODUCTION P HASE UCSC is the DCC for human and mouse data The portal is available: genome.ucsc.edu/ENCODE/ New aspects of the Production Phase projects 50

51 ENCODE P RODUCTION P HASE F OCUS ENCODE is now genome-wide Specific cell types and new technologies being applied Project focus topics selected, then supplemented Copyright OpenHelix. No use or reproduction without express written consent 51 chromatin transcriptome/ genes promoters/ regulatory sites DNase sites

52 ENCODE D ATA IS F LOWING ! Data being submitted to UCSC DCC by data providers “Wranglers” ensure meta data is present Quality checks occur, data is released for use Copyright OpenHelix. No use or reproduction without express written consent 52

53 ENCODE D ATA T YPES Mapping data Genes Expression Regulation Variation 53 ENCODE Tracks identified with icon

54 R EGULATION D ATA Regulation data Structure: modifications, open vs. closed chromatin 54 Image from NIH

55 R EGULATION D ATA II Transcription factor binding sites, TFBS RNA binding proteins 55 TATA bound to DNA


Download ppt "Promoter Analysis TFBS Detection Daniel Rico, PhD. Daniel Rico, PhD."

Similar presentations


Ads by Google