Download presentation
Presentation is loading. Please wait.
Published byElwin Lewis Robertson Modified over 9 years ago
1
Promoter Analysis TFBS Detection Daniel Rico, PhD. drico@cnio.es Daniel Rico, PhD. drico@cnio.es
2
1.Promoters and gene regulation in Eukaryotes 2.Position Weight Matrices (PWM) 3.PWM Databases 4.TFBS prediction using PWMs 5.Pattern Discovery: Finding unknown motifs 6.Exercise: Use the human NOS2 sequence to predict TFBS with Match and JASPAR 2
3
Transcription Factor Binding Sites 1.Promoters and gene regulation in Eukaryotes 2.Position Weight Matrices (PWM) 3.PWM Databases 4.TFBS prediction using PWMs 5.Exercise: Use the human NOS2 sequence to predict TFBS with Match and JASPAR 3
4
4 Gene Enhancer TSS: Transcription Start Site “Proximal” promoter (100bp-2Kb 5’ Upstream)
5
P ROMOTERS Promoters are DNA segments upstream of transcripts that initiate transcription Promoter attracts RNA Polymerase to the transcription start site 5’5’ Promoter 3’3’ 5
6
GENES IN ENSEMBL 6 5’ Forward (+) strand 3’ Reverse (-) strand
7
7 Transcription Termination Site Transcription Start Site Transcription Start Site
8
P ROMOTER S TRUCTURE IN P ROKARYOTES (E.C OLI ) Transcription starts at offset 0. Pribnow Box (-10) Gilbert Box (-30) Ribosomal Binding Site (+10) 8
9
9 P ROMOTER S TRUCTURE IN E UKARYOTES
10
10 CAGE (Cap Analysis of Gene Expression)) detects the transcriptional activity of each promoter transcript. Experimental Transcription Start Sites (TSS) by CAGE
11
11 Representation of CAGE preparation protocol adapted to various platforms. Now Solexa and Illumina are preferred. 454 Life Sciences (FLX system) is not used any longer because concatenation requires additional PCR cycles and complicated manipulation. In the future, single- molecule sequencing technology will be preferred because PCR may not be required.
12
12 http://www.osc.riken.jp/english/activity/cage/basic/
13
13 http://fantom.gsc.riken.jp/4/edgeexpress/view/
14
http://www.epd.isb-sib.ch/ 14
15
15
16
S EQUENCE A NALYSIS : S EARCHING T RANSCRIPTION F ACTOR B INDING S ITES (TFBS) 16
17
TFBS: D ETECTION METHODS in vivo Functional analysis ChIP in vitro on cloned fragment Footprinting reactions Exonuclease digests Gel retardation (EMSA) UV Crosslinking in vitro on artificial DNA: SELEX: Systematic Evolution of Ligands by Exponential enrichment 17
18
18 Affinity Specificity Nat Rev Genet. 2010 Nov;11(11):751-60. Epub 2010 Sep 28. Determining the specificity of protein-DNA interactions. T RANSCRIPTION F ACTORS BIND TO TFBS IN DNA
19
19 TF B INDING S ITES Problems: often poorly defined consensus Sequences not conserved within species, and even worse between species Examples of enhancers functionally conserved but not sequence-conserved Most of the TFBS sequence data comes from just a few species Very often in vitro experiments 2 completely different binding sites could be merged in the same matrix/consensus 19
20
Transcription Factor Binding Sites 1.Promoters and gene regulation in Eukaryotes 2.Position Weight Matrices (PWM) 3.PWM Databases 4.TFBS prediction using PWMs 5.Pattern Discovery: Finding unknown motifs 6.Exercise: Use the human NOS2 sequence to predict TFBS with Match and JASPAR 20
21
Data collection Probabilities can be calculated and corrected for background Also called position-specific scoring matrices (PSSMs). In log scale. 21
22
F ROM PFM TO PWM/PSSM Transcription Factor Binding Sites 22
23
SEQUENCE LOGOS: The information content of a matrix column ranges from 0 (no base preference) and 2 (only 1 base used). http://weblogo.berkeley.edu/ http://www.lecb.ncifcrf.gov/~toms/sequencelogo.html 23
24
AAGTTC AAGCTC AGGCTC AAGGTC A 430000 C 000204 G 014100 T 000140 Consensus: ARGBTC Summary 24
25
Transcription Factor Binding Sites 1.Promoters and gene regulation in Eukaryotes 2.Position Weight Matrices (PWM) 3.PWM Databases 4.TFBS prediction using PWMs 5.Pattern Discovery: Finding unknown motifs 6.Exercise: Obtain mouse and human fosB promoters and predict TFBS with Match and JASPAR 25
26
26 Transfac: not free, 848 matrices, loads of information and references, quality score based on methods used Jaspar: open sources, 123 matrices, minimal information, majority based on SELEX method (80%) 26
27
TRANSFAC® 27 http://www.gene-regulation.com/pub/databases.html
28
http://jaspar.cgb.ki.se/ http://jaspar.genereg.net/ 28
29
29 J ASPAR EXAMPLE : P AX 6 29
30
Transcription Factor Binding Sites 1.Promoters and gene regulation in Eukaryotes 2.Position Weight Matrices (PWM) 3.PWM Databases 4.Pattern Matching: TFBS prediction using PWMs 5.Pattern Discovery: Finding unknown motifs 6.Exercise: Use the human NOS2 sequence to predict TFBS with Match and JASPAR 30
31
Click here to select all TFBS 31
32
Transcription Factor Binding Sites 1.Promoters and gene regulation in Eukaryotes 2.Position Weight Matrices (PWM) 3.PWM Databases 4.Pattern Matching: TFBS prediction using PWMs 5.Pattern Discovery: Finding unknown motifs 6.Exercise: Use the human NOS2 sequence to predict TFBS with Match and JASPAR 32
33
33 P ATTERN DISCOVERY Reference Genome Seq. oligoexpected frequency AAAAAA0.00024 AAAAAC0.00030 AAAAAG0.00031 AAAAAT0.00024 AAAACC0.00028 … Sequences of interest Seq. oligoobserved frequency AAAAAA0.00023 AAAAAC0.00031 AAAAAG0.00125 AAAAAT0.00018 AAAACC0.00026 … *** 33
34
http://meme.sdsc.edu/meme/ 34
35
Transcription Factor Binding Sites 1.Promoters and gene regulation in Eukaryotes 2.Position Weight Matrices (PWM) 3.PWM Databases 4.Pattern Matching: TFBS prediction using PWMs 5.Pattern Discovery: Finding unknown motifs 6.Exercise: Use the human NOS2 sequence to predict TFBS with Match and JASPAR 35
36
EXERCISE Step by step a.Download from UCSC or Ensembl the human NOS2 gene plus 5000 bases upstream. Select the “proximal promoter” first 1Kb: from -1000 to TSS (hint: there is no zero position!) b.Go to JASPAR and search for TFBS in promoter with the defaults. c.Do the same exercise with the mouse NOS2. d.Compare the results. 36
37
C HROMATIN A CCESSIBILITY Access to experimental information 37
38
http://www.nature.com/scitable/
40
E UCROMATINA Y H ETEROCROMATINA Replicatión tardía (late)Replicatión temprana (early)
44
Nat Rev Genet. 2011 Jul 12;12(8):554-64. doi: 10.1038/nrg3017. Determinants and dynamics of genome accessibility.
47
ENCODE: WWW. GENOME. GOV /10005107 ENCyclopedia of DNA Elements, NHGRI Consortium of international researchers UCSC is the Data Coordination Center 47 Slides from http://www.openhelix.com/ENCODE
48
ENCODE B ACKGROUND Pilot phase, or phase I: www.genome.gov/26525202 Selected regions of the genome: 1%, 30 MB 48
49
ENCODE P ILOT D ATA AND B EYOND ENCODE portal: http://genome.ucsc.edu/ENCODE/ Pilot ENCODE browser: genome.ucsc.edu/ENCODE/pilot.html 49
50
ENCODE N EXT P HASE : P RODUCTION P HASE UCSC is the DCC for human and mouse data The portal is available: genome.ucsc.edu/ENCODE/ New aspects of the Production Phase projects 50
51
ENCODE P RODUCTION P HASE F OCUS ENCODE is now genome-wide Specific cell types and new technologies being applied Project focus topics selected, then supplemented Copyright OpenHelix. No use or reproduction without express written consent 51 chromatin transcriptome/ genes promoters/ regulatory sites DNase sites
52
ENCODE D ATA IS F LOWING ! Data being submitted to UCSC DCC by data providers “Wranglers” ensure meta data is present Quality checks occur, data is released for use Copyright OpenHelix. No use or reproduction without express written consent 52
53
ENCODE D ATA T YPES Mapping data Genes Expression Regulation Variation 53 ENCODE Tracks identified with icon
54
R EGULATION D ATA Regulation data Structure: modifications, open vs. closed chromatin 54 Image from NIH
55
R EGULATION D ATA II Transcription factor binding sites, TFBS RNA binding proteins 55 TATA bound to DNA
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.