Download presentation
Presentation is loading. Please wait.
Published byCory Greer Modified over 9 years ago
1
From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de ake@biobase.de www.biobase.dewww.biobase.de
2
TRANSCompel TRANSFAC TRANSPATH Patho DB S/MARt DB - mechanistic - semantic Match Patch Catch Pathway builder Array analyser Cytomer TRANSGenome TRANSPLORER CMFinder
3
The TRANSFAC ® System comprises 7 databases: TRANSFAC ® Professional Suite TRANSFAC ® Professional Transcription factor database TRANSCompel ® Professional Composite elements database PathoDB ® Professional Pathologically altered transcription factors TRANSPRO™ Professional Collection of human promoter sequences S/MARt DB™ Professional Scaffold or Matrix Attached Regions databases Cytomer ® Ontology of cells, structures, organs TRANSPATH ® Professional Signal transduction pathways
4
TRANSFAC ® Professional Transcription factor database
5
… cis trans
7
ST GM-CSF Homo sapiens +1 T-cell specific inducible enhancer at –3500 bp Promoter TATTT -54 AP-1 NFAT CE NF- B p50/p65 -88 AP-1 NFAT CE AP-1 NFAT CE AP-1 NFAT AP-1 NFAT CE NF- B c-Rel/p65 HMG Y(I) -114 CD28 response element CBF Structure of regulatory regions of eukaryotic genes
8
Protein-DNA and protein-protein interactions in gene transcriptional regulation.
9
Transcription factors Sequence- specific DNA binding Non-DNA binding TF1 TF2 TF3 TF4 adapter Co-activator HAT DNA Layer I Layer III Layer II
10
interacting factor coding regionregulatory region gene expression SITE FACTORGENE SYNONYMS FEATURESCLASSSPECIES MATRIX SEQUENCE METHODCELL Q FUNCTIONAL ELEMENT TRANSFAC: relational scheme
11
Manual annotation of the databases: input client
12
TRANSFAC: GENE table
13
TRANSFAC: SITE table
14
Structure of transcription factors USF-1, dimer
15
DNA binding domain Activation domain oligomerization domain Ligand- binding domain Protein-protein interaction domain Structure of transcription factors
16
TRANSFAC: FACTOR table, protein sequence
17
TRANSFAC: FACTOR table, protein domains
18
TRANSFAC: FACTOR table, structural and functional features
19
TRANSFAC: FACTOR table, links to other databases
20
TRANSFAC: classification of transcription factors
21
TRANSFAC: CLASS table
22
TRANSFAC 8.1 (2004-03-31): number of factor entries for different species
23
TRANSFAC 8.1 (2004-03-31): distribution of experimentally known TFBS in 5‘ regions of genes.
24
TRANSFAC: FACTOR table, protein-DNA and protein- protein interactions
25
TRANSFAC: MATRIX table
26
TRANSCompel ® Professional Composite elements database
27
tgccacacaggtagactctt TTGAAAATA tg TGTAATA tgtaaaa catcgtgaca cccccatatt… …....... -96 -79 ST COMPEL:C00050 NF-ATp AP-1 Mouse Interleukin-2 gene promoter TGAGTCA AP-1 consensus
28
Synergistic activation of transcription Low level of transcription Low level of transcription F1 F2 Composite elements Minimal functional units where both protein-DNA and protein-protein interactions contribute to a highly specific pattern of gene expression and provide cross-coupling of different signal transduction pathways.
29
NGeneScheme of CE 1.IgH **, Mus musculus 2. IL-2, Homo sapiens -283 -268 : : 3. IL-2, Homo sapiens -167 -142 : : 5. 4. Il-2, Mus musculus -167 -142 : : IgH **, Homo sapiens 6. Serum amyloid А1, Rattus norv -117 -73 : : 7. IRF-1, Mus musculus -123 -113 -49 -40 : : : : AP-1 Ets AP-1 NFAT AP-1 NF- B Ets CBF AP-1 Oct-2 NF- B C/EBP NF- B STAT-1 Combinatorial regulation by the composite elements
30
Ternary complex NFATp - AP1 - DNA
31
Description of an evidence (experiment, cell type, two individual interactions) flat files Link to the TRANSFAC GENE table Link to EMBL Link to the TRANSFAC FACTOR table
32
Cross-coupling of signal transduction pathways
33
Tissue-specific 32 Inducible 44 119 Cell-cycle dependent 12 Dev. stage- dependent 3 Ubiquitous constitutive 3960212 F1 F2 Tissue- specific Indu- cible Cell- cycle dep. Dev. stage- dependent Ubiquit. constitut. 2 Inducible/inducible 19 CE‘s ETS / AP-1 providing cross-coupling of Ras/Raf- and PKC-dependent signalling pathways; 15 CE‘s NFATp / AP-1 providing cross-coupling of Ca2+ - and PKC-dependent signalling pathways; 14CE‘s NF- B / C/EBP NF- B is inducible by IL-1 and TNF- ; C/EBP is inducible by IL-6.
34
Tissue-specific 32 Inducible 44119 Cell-cycle dependent 12 Dev. stage- dependent 3 Ubiquitous constitutive 39 60 212 F1 F2 Tissue- specific Indu- cible Cell- cycle dep. Dev. stage- dependent Ubiquit. constitut. 2 Inducible/constitutive 9 CE‘s ETS / Sp1 ETS factors are inducible through Ras/Raf- dependent signalling pathway; 5 CE‘s Smad / TEF3 Smads are inducible by TGF- signalling.
35
Tissue-specific 32 Inducible 44 119 Cell-cycle dependent 12 Dev. stage- dependent 3 Ubiquitous constitutive 3960212 F1 F2 Tissue- specific Indu- cible Cell- cycle dep. Dev. stage- dependent Ubiquit. constitut. 2 Inducible/tissue-restricted CE‘s Pit-1 / AP-1 Pit1 is pituitary-restricted transcription factor whereas AP-1 and Ets are ubiquitous inducible factors;
36
Mechanisms of functioning of synergistic composite elements
37
F2 F1 s1 s2 F1 F2 5) Relief of autoinhibition as a result of protein- protein interactions
38
Mechanisms of functioning of synergistic composite elements
39
Mechanisms of functioning of antagonistic composite elements
41
TRANSPATH ® Professional Database on signal transduction pathways
42
TRANSPATH: map of IFN pathway
43
TRANSPATH ® TRANSFAC ®
44
Extracellular ligand Membrane receptor Adaptor Second messanger Kinase(s) Transcription factor Target gene TRANSPATH: molecules
45
TLR4(h):MyD88(h) complexes TLR4(h) TLR4(m) TLR5(h) basic IL-1/Toll receptor family TLRs TLR4 TLR5 family ortholog modified form TLR4(h) p TRANSPATH: molecule hierarchy TLR4a(h) TLR4b(m) isoform
46
TRANSPATH: reactions Binding Phosphorylation Dephosphoralation Degradation Acetylation Dissociation Transregulation Expression Activation... Educts Products Enzyme
47
B C A R Reaction R, catalyzed by catalyst C, converts substance A into substance B. The elementar reaction step
48
Smad 4 T: TR2 p R2R2 T: TR2 p :TR1 p R4R4 S2 P : S4 TGF R-II R1R1 TGF 1 NTP Smad 2 R3R3 Smad 2 p gene R5R5 tc NDP TGF R-I Pathway steps: Pathway steps depict the signaling in a more biochemical way.
49
In a semantic reaction, just individual key molecules are given. Semantic: TGF 1 TGF -RII TGF -RI Smad2 Smad4 gene R 1 R 2 R 3 R 4 R 5
50
Info about a specific molecule Parts of a molecule entry Many synonyms make sure, that you find your protein External database links allow identification of proteins easily
51
Specific molecule (cont.) Opens data entry of a specific reaction Parts of a molecule entry Disease information and GO terminology localization of human APP
52
Specific reaction of APP(h) Evaluation of this reaction is based on experimental evidences Part of a reaction entry
53
Extracellular ligand Membrane receptor Adaptor Second messanger Kinase(s) Transcription factor Target gene Signal transduction pathways
54
Connecting path between two molecules Connection between one specific molecule (magenta) and a group of molecules (transcription factors in blue)
55
Oncostatin M pathway B-cell antigen receptor pathway PDGF pathway Insulin pathway
56
Overview of a pathway – hand-drawn map
57
TRANSPATH: number of entries
58
Main tables+ NetPro –Molecule18029 + 7333 –Reaction20199 + 30316 –Reference 8258 + 9582 Molecules of mammalian origin – Human2503 3521 – Mouse1653 2025 – Rat 810 1224 Prediction 26 588 predicted human gene products of which 30.8% (~9000) seem to be signal transduction relevant (Venter et al., 2001) => 28% coverage of predicted proteins in TRANSPATH ® Statistics: TRANSPATH ® 5.1 and NetPro 1.1
59
TRANSFAC ® System From patterns to pathways
60
The starting point: A set of induced genes from microarray experiments Array analysis
61
The conventional analysis: deduce the gene products and map them to the network of metabolic pathways KEGG biochemical effects Array analysis
62
Extension of conventional analysis: map the induced gene products to the network of regulatory pathways biological effects TRANSPATH Array analysis
63
Reasoning of experimental findings: promoter analysis of induced genes connected to network mapping KEGG TRANSPATH Identification of new targets
64
Array analysis promoter model TRANSGENOME database additional predicted genes extended predicted network Promoter analysis identifies additional target genes and extends the affected network
65
microarray: set of induced genes indirect hints on causes retrieval of upstream sequences promoter analysis network analysis new target TRANSPATH TRANSFAC TRANSGENOME assignment of gene products modeling of effects metabolic network mapping KEGG regulatory network mapping TRANSPATH Array analysis Causes Effects
66
… cis trans
67
? …
68
TRANSPLORER (TRANScription exPLORER) is a software package for the analysis of transcription regulatory sequences. Currently, TRANSPLORER site prediction tool uses position weight matrices (PWM) collections. It is able to use several matrix sources: the largest and most up-to-date library of matrices derived from TRANSFAC ® Professional database, other matrix libraries as well as any user-developed matrix libraries. This means that it provides an opportunity to search for a great variety of different transcription factor binding sites. A search can be made using all or subsets of matrices from the libraries.TRANSFAC ® Professional
69
Search for most probable binding sites regulating gene expression
70
Search for binding sites coinsiding with SNPs
71
Mouse c-fos promoter (Matrix search for TF binding sites) Exon 2 sequence of human thyroid transcription factor-1 (TTF-1) gene (HS198161) (Matrix search for TF binding sites)
72
Recruitment of CIITA to MHC-II promoters. A prototypical MHC-II promoter (HLA-DRA) is represented schematically with the W, X, X2, and Y sequences conserved in all MHC-II, Ii, and HLA-DM promoters. RFX, X2BP, NF-Y, and an as yet undefined W- binding protein bind cooperatively to these sequences and assemble into a stable higher order nucleoprotein complex referred to here as the MHC-II enhanceosome. CIITA is tethered to the enhanceosome via multiple weak protein-protein interactions with the W, X, X2, and Y-binding factors. The octamer site found in the HLA-DRA promoter (O), and its cognate activators (Oct and OBF- 1) are not required for recruitment of CIITA. CIITA is proposed to activate transcription (arrow) via its amino-terminal activation domains (AD), which contact the RNA polymerase II basal transcription machinery. Masternak K et al., Genes Dev 2000 May 1;14(9):1156-66 Enhanceosome
73
Recognition method for T-cell specific Composite Elements NFAT/AP-1 NFATp AP-1 0,7 1,7 2,7 3,7 4,7 5,7 6,7 0,71,21,72,22,73,23,74,24,7 NFAT/AP-1 (training) Random NFAT = -log(1-score NFAT ) AP-1 = -log(1-score AP-1 ) Composite score
74
TTTGGCGCGAAA Selection of motifs with high frequency in a window WSG motif: window: [ ] Promoters of cell-cycle genes: Exon 2 sequences:............. } } Frequency of the motifs in the window.............
75
Motifs found in the local context of E2F sites in promoters of cell cycle-related genes Score of context: +1 1000 3000 5000 7000 9000 +1 1000 3000 5000 7000 9000 -1000 Human uracil DNA-glycosylase (E2F sites) + score of context ttTTTGCCGCGAAAag q=0.92 d=2.8 (known site)
76
SITEVIDEO system Building of E2F site recognition program (step 2)
77
SITEVIDEO system Building of E2F site recognition program (step 3)
78
Composite modules w... Start of transcription... Parameters of the model to be estimated K - number of TF matrixes
79
Composite modules w... Start of transcription... Parameters of the model to be estimated Genetic Algorithms
80
Exon-2 sequences Cell cycle-related promoters Composite module in promoters of cell cycle-related genes
81
Mouse c-fos promoter Cell cycle composite module
82
Computationally predicted E2F target genes confirmed by in vivo footprint Chromatin crosslinking Immunoprecipitation PCR
83
G1 G1/SSG2 G1 G1/SSG2 G1/S-cycle G1/S-growth
84
Results of selection of a specific combinations of sites that distinguish G1/S cycle and G1/S growth promoters. (microarray data) E2F and a set of additional factors can distinguish these two sets of promoters. AP-4 factors – an ubiquitous factor that have similar structure of DNA binding domains as E2F and Myc – main cell cycle regulators; IK3 (Ik-1...Ik-5 - a family of zink finger TF that play a role in development of the lymphocytes). Pax-2 factor is known to be involved in regulating cell cycle by inhibiting the p53 transcription. It is known that Oct-3 differentially phosphorylated during cell cycle and may have a role in the regulation of the G1/S growth promoters. As for Cup site, it was already speculated that the structure of the basal promoter may play an important role in differentiating gene expression during cell cycle
85
TGASTCA AP-1... Jun Fos
86
human TNF promoter mast cells T-cells + ? dendritic cells T-cells -107-74 NFAT AP-1 NF-kB C/EBP AP-1 VDR
87
Fuzzy puzzle hypothesis of the multipurpose structure of the eukaryotic promoters: of coding multiple regulatory messages in the same DNA sequence. A,B,C and D,E,F – two sets of TF; 1,2 – two sites in DNA; BC – basal complex.
88
There‘s More Then One Way To Do It (Convergent evolution)
89
AXX list of genes
90
Extract promoters using TRANSGENOME AXX promoter set
91
ImportanceCore cut-off Matr. Cut-off ACMatrix --------------------------------------------- --------------------------------- 0.917751 0.877000 0.930000 M00062 V$IRF1_01 0.323077 1.000000 0.948000 M00339 V$ETS1_B 0.640828 0.989000 0.982000 M00199 V$AP1_C 0.276923 0.840000 0.853000 M00037 V$NFE2_01 1.000000 0.7560000.760000 M00481 V$AR_01 0.159172 0.869000 0.866000 M00699 V$ICSBP_Q6 Interferon regulatory factor 1 Ets factors AP-1 NF-E2 – an erythroid-specific factor Androgen receptor Interferon Consensus Sequence binding protein Composite module found in the AXX promoters
94
Insulin pathway ? InsR
95
Insulin Part of the insulin signaling network in TRANSPATH STAT1 Ras InsR Signaling network analysis
96
AhR targets Gene expression Log(Experiment/Control)
97
S41 distance = 0.417599 D2:0.658627 SIG:0.000000 MIN_LENGTH 300 0.000000 3.581248 1.000000 0.933000 M00026 V$AHR_Q5 2.942371 1.000000 0.917000 M00639 V$HNF6_Q6 0.798865 0.844000 0.900000 M00220 V$SREBP1_01 0.409376 0.962000 0.926000 M00173 V$AP1_Q2 0.055716 0.959000 0.989000 M00726 V$USF2_Q6 -1.329975 1.000000 0.959000 M00235 V$AHRARNT_01 -0.713625 1.000000 0.918000 M00156 V$RORA1_01 -0.668375 0.903000 0.854000 M00201 V$CEBP_C Composite model correlate with the expression level TSS -1000+1000 V$AHR_Q5 V$AHRARNT_01
98
0.0983 * V$TCF11MAFG_01(0.821) 0.0471 * V$FOXO4_01(0.961) 0.0301 * V$IPF1_Q4(0.852) 0.0410 * V$AR_01(0.851) 0.0766 * V$GR_Q6(0.971) 0.0482 * V$STAT1_02(0.995) 0.0508 * V$CEBPB_01(0.98) 0.0281 * V$STAT5A_02(0.826) 0.1040 * V$CETS1P54_02(0.949) -50- V$TCF4_Q5(0.908) 0.0751 * V$TCF1P_Q6(0.726) -50- V$STAT6_01(0.861) 0.0728 * V$SF1_Q6(0.684) -50- V$SMAD3_Q6(0.833) 0.0419 * V$ELK1_02(0.862) -50- V$GRE_C(0.842) Composite module found in promoters of differentially expressed genes in liver of growth hormone-deficient mice (Sma1). differentially expressed genes Non-changed genes
99
Results of the ArrayAnalyzer ™ search upstream from TFs resulting in identifying: growth hormone (GH) and receptor tyrosine kinases (RTK) as potential key molecules involved in differential expression of the genes in liver of growth hormone-deficient mice (Sma1).
100
TRANSPATH and tools, ArrayAnalyzer and PathwayBuilder 4 At the next step, one can map the transcription factors found at the previous step on the signaling network of the TRANSPATH. If the factors found are parts of the same cascades that have been suggested on the step 1, then probability is increased that those factors are responsible for the coordinated gene regulation.
101
Feedback loops in activating immune cells through NF-AT/AP-1
102
Network controlling S phase entry in response to a proliferative signal
103
Phylogenetic footprint of promoter regions of nucleolin genes HSNUCLEO - Homo sapiens; CSNUCLEO - Cricetulus griseus; MMNUCLEO - Mus musculus; RNNUCIA1 – Rattus norvegicus TFBS identification via pattern search
104
A T G C
105
A T G C A T G C A T G C 1) 2)3)
106
Result of comparison of four different pattern discovery programs on the sets of simulated sequences with implanted TF binding sites for one matrix; y-axis: the averaged sum of squared differences between reveled matrix and the original one; x-axis: values, that are the probabilities of “consensus nucleotide” in each position of the matrix.
107
Gradual evolution by fixation of multiple substitutions (Protein functional centres) Edited bipolymer by fixation of a small number of substitutions (Protein folding) Evolution at once by fixation of single substitutions (Regulatory regions of eukaryotic genes) Three mechanisms of biopolymer evolution
108
Thank you ! www.biobase.de
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.