Presentation is loading. Please wait.

Presentation is loading. Please wait.

From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany

Similar presentations


Presentation on theme: "From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany"— Presentation transcript:

1 From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de ake@biobase.de www.biobase.dewww.biobase.de

2 TRANSCompel TRANSFAC TRANSPATH Patho DB S/MARt DB - mechanistic - semantic Match Patch Catch Pathway builder Array analyser Cytomer TRANSGenome TRANSPLORER CMFinder

3 The TRANSFAC ® System comprises 7 databases: TRANSFAC ® Professional Suite TRANSFAC ® Professional Transcription factor database TRANSCompel ® Professional Composite elements database PathoDB ® Professional Pathologically altered transcription factors TRANSPRO™ Professional Collection of human promoter sequences S/MARt DB™ Professional Scaffold or Matrix Attached Regions databases Cytomer ® Ontology of cells, structures, organs TRANSPATH ® Professional Signal transduction pathways

4 TRANSFAC ® Professional Transcription factor database

5 … cis trans

6

7 ST GM-CSF Homo sapiens +1 T-cell specific inducible enhancer at –3500 bp Promoter TATTT -54 AP-1 NFAT CE NF-  B p50/p65 -88 AP-1 NFAT CE AP-1 NFAT CE AP-1 NFAT AP-1 NFAT CE NF-  B c-Rel/p65 HMG Y(I) -114 CD28 response element CBF Structure of regulatory regions of eukaryotic genes

8 Protein-DNA and protein-protein interactions in gene transcriptional regulation.

9 Transcription factors Sequence- specific DNA binding Non-DNA binding TF1 TF2 TF3 TF4 adapter Co-activator HAT DNA Layer I Layer III Layer II

10 interacting factor coding regionregulatory region gene expression SITE FACTORGENE SYNONYMS FEATURESCLASSSPECIES MATRIX SEQUENCE METHODCELL Q FUNCTIONAL ELEMENT TRANSFAC: relational scheme

11 Manual annotation of the databases: input client

12 TRANSFAC: GENE table

13 TRANSFAC: SITE table

14 Structure of transcription factors USF-1, dimer

15 DNA binding domain Activation domain oligomerization domain Ligand- binding domain Protein-protein interaction domain Structure of transcription factors

16 TRANSFAC: FACTOR table, protein sequence

17 TRANSFAC: FACTOR table, protein domains

18 TRANSFAC: FACTOR table, structural and functional features

19 TRANSFAC: FACTOR table, links to other databases

20 TRANSFAC: classification of transcription factors

21 TRANSFAC: CLASS table

22 TRANSFAC 8.1 (2004-03-31): number of factor entries for different species

23 TRANSFAC 8.1 (2004-03-31): distribution of experimentally known TFBS in 5‘ regions of genes.

24 TRANSFAC: FACTOR table, protein-DNA and protein- protein interactions

25 TRANSFAC: MATRIX table

26 TRANSCompel ® Professional Composite elements database

27 tgccacacaggtagactctt TTGAAAATA tg TGTAATA tgtaaaa catcgtgaca cccccatatt… …....... -96 -79 ST COMPEL:C00050 NF-ATp AP-1 Mouse Interleukin-2 gene promoter TGAGTCA AP-1 consensus

28 Synergistic activation of transcription Low level of transcription Low level of transcription F1 F2 Composite elements Minimal functional units where both protein-DNA and protein-protein interactions contribute to a highly specific pattern of gene expression and provide cross-coupling of different signal transduction pathways.

29 NGeneScheme of CE 1.IgH **, Mus musculus 2. IL-2, Homo sapiens -283 -268 : : 3. IL-2, Homo sapiens -167 -142 : : 5. 4. Il-2, Mus musculus -167 -142 : : IgH **, Homo sapiens 6. Serum amyloid А1, Rattus norv -117 -73 : : 7. IRF-1, Mus musculus -123 -113 -49 -40 : : : : AP-1 Ets AP-1 NFAT AP-1 NF-  B Ets CBF  AP-1 Oct-2 NF-  B C/EBP  NF-  B STAT-1 Combinatorial regulation by the composite elements

30 Ternary complex NFATp - AP1 - DNA

31 Description of an evidence (experiment, cell type, two individual interactions) flat files Link to the TRANSFAC GENE table Link to EMBL Link to the TRANSFAC FACTOR table

32 Cross-coupling of signal transduction pathways

33 Tissue-specific 32 Inducible 44 119 Cell-cycle dependent 12 Dev. stage- dependent 3 Ubiquitous constitutive 3960212 F1 F2 Tissue- specific Indu- cible Cell- cycle dep. Dev. stage- dependent Ubiquit. constitut. 2 Inducible/inducible 19 CE‘s ETS / AP-1 providing cross-coupling of Ras/Raf- and PKC-dependent signalling pathways; 15 CE‘s NFATp / AP-1 providing cross-coupling of Ca2+ - and PKC-dependent signalling pathways; 14CE‘s NF-  B / C/EBP  NF-  B is inducible by IL-1 and TNF-  ; C/EBP  is inducible by IL-6.

34 Tissue-specific 32 Inducible 44119 Cell-cycle dependent 12 Dev. stage- dependent 3 Ubiquitous constitutive 39 60 212 F1 F2 Tissue- specific Indu- cible Cell- cycle dep. Dev. stage- dependent Ubiquit. constitut. 2 Inducible/constitutive 9 CE‘s ETS / Sp1 ETS factors are inducible through Ras/Raf- dependent signalling pathway; 5 CE‘s Smad / TEF3 Smads are inducible by TGF-  signalling.

35 Tissue-specific 32 Inducible 44 119 Cell-cycle dependent 12 Dev. stage- dependent 3 Ubiquitous constitutive 3960212 F1 F2 Tissue- specific Indu- cible Cell- cycle dep. Dev. stage- dependent Ubiquit. constitut. 2 Inducible/tissue-restricted CE‘s Pit-1 / AP-1 Pit1 is pituitary-restricted transcription factor whereas AP-1 and Ets are ubiquitous inducible factors;

36 Mechanisms of functioning of synergistic composite elements

37 F2 F1 s1 s2 F1 F2 5) Relief of autoinhibition as a result of protein- protein interactions

38 Mechanisms of functioning of synergistic composite elements

39 Mechanisms of functioning of antagonistic composite elements

40

41 TRANSPATH ® Professional Database on signal transduction pathways

42 TRANSPATH: map of IFN pathway

43 TRANSPATH ® TRANSFAC ®

44 Extracellular ligand Membrane receptor Adaptor Second messanger Kinase(s) Transcription factor Target gene TRANSPATH: molecules

45 TLR4(h):MyD88(h) complexes TLR4(h) TLR4(m) TLR5(h) basic IL-1/Toll receptor family TLRs TLR4 TLR5 family ortholog modified form TLR4(h) p TRANSPATH: molecule hierarchy TLR4a(h) TLR4b(m) isoform

46 TRANSPATH: reactions Binding Phosphorylation Dephosphoralation Degradation Acetylation Dissociation Transregulation Expression Activation... Educts Products Enzyme

47 B C A R Reaction R, catalyzed by catalyst C, converts substance A into substance B. The elementar reaction step

48 Smad 4 T: TR2 p R2R2 T: TR2 p :TR1 p R4R4 S2 P : S4 TGF  R-II R1R1 TGF  1 NTP Smad 2 R3R3 Smad 2 p gene R5R5 tc NDP TGF  R-I Pathway steps: Pathway steps depict the signaling in a more biochemical way.

49 In a semantic reaction, just individual key molecules are given. Semantic: TGF  1  TGF  -RII  TGF  -RI  Smad2  Smad4  gene R 1 R 2 R 3 R 4 R 5

50 Info about a specific molecule Parts of a molecule entry Many synonyms make sure, that you find your protein External database links allow identification of proteins easily

51 Specific molecule (cont.) Opens data entry of a specific reaction Parts of a molecule entry Disease information and GO terminology localization of human APP

52 Specific reaction of APP(h) Evaluation of this reaction is based on experimental evidences Part of a reaction entry

53 Extracellular ligand Membrane receptor Adaptor Second messanger Kinase(s) Transcription factor Target gene Signal transduction pathways

54 Connecting path between two molecules Connection between one specific molecule (magenta) and a group of molecules (transcription factors in blue)

55 Oncostatin M pathway B-cell antigen receptor pathway PDGF pathway Insulin pathway

56 Overview of a pathway – hand-drawn map

57 TRANSPATH: number of entries

58 Main tables+ NetPro –Molecule18029 + 7333 –Reaction20199 + 30316 –Reference 8258 + 9582 Molecules of mammalian origin – Human2503 3521 – Mouse1653 2025 – Rat 810 1224 Prediction 26 588 predicted human gene products of which 30.8% (~9000) seem to be signal transduction relevant (Venter et al., 2001) => 28% coverage of predicted proteins in TRANSPATH ® Statistics: TRANSPATH ® 5.1 and NetPro 1.1

59 TRANSFAC ® System From patterns to pathways

60 The starting point: A set of induced genes from microarray experiments Array analysis

61 The conventional analysis: deduce the gene products and map them to the network of metabolic pathways KEGG biochemical effects Array analysis

62 Extension of conventional analysis: map the induced gene products to the network of regulatory pathways biological effects TRANSPATH Array analysis

63 Reasoning of experimental findings: promoter analysis of induced genes connected to network mapping KEGG TRANSPATH Identification of new targets

64 Array analysis promoter model TRANSGENOME database additional predicted genes extended predicted network Promoter analysis identifies additional target genes and extends the affected network

65 microarray: set of induced genes indirect hints on causes retrieval of upstream sequences promoter analysis network analysis new target TRANSPATH TRANSFAC TRANSGENOME assignment of gene products modeling of effects metabolic network mapping KEGG regulatory network mapping TRANSPATH Array analysis Causes Effects

66 … cis trans

67 ? …

68 TRANSPLORER (TRANScription exPLORER) is a software package for the analysis of transcription regulatory sequences. Currently, TRANSPLORER site prediction tool uses position weight matrices (PWM) collections. It is able to use several matrix sources: the largest and most up-to-date library of matrices derived from TRANSFAC ® Professional database, other matrix libraries as well as any user-developed matrix libraries. This means that it provides an opportunity to search for a great variety of different transcription factor binding sites. A search can be made using all or subsets of matrices from the libraries.TRANSFAC ® Professional

69 Search for most probable binding sites regulating gene expression

70 Search for binding sites coinsiding with SNPs

71 Mouse c-fos promoter (Matrix search for TF binding sites) Exon 2 sequence of human thyroid transcription factor-1 (TTF-1) gene (HS198161) (Matrix search for TF binding sites)

72 Recruitment of CIITA to MHC-II promoters. A prototypical MHC-II promoter (HLA-DRA) is represented schematically with the W, X, X2, and Y sequences conserved in all MHC-II, Ii, and HLA-DM promoters. RFX, X2BP, NF-Y, and an as yet undefined W- binding protein bind cooperatively to these sequences and assemble into a stable higher order nucleoprotein complex referred to here as the MHC-II enhanceosome. CIITA is tethered to the enhanceosome via multiple weak protein-protein interactions with the W, X, X2, and Y-binding factors. The octamer site found in the HLA-DRA promoter (O), and its cognate activators (Oct and OBF- 1) are not required for recruitment of CIITA. CIITA is proposed to activate transcription (arrow) via its amino-terminal activation domains (AD), which contact the RNA polymerase II basal transcription machinery. Masternak K et al., Genes Dev 2000 May 1;14(9):1156-66 Enhanceosome

73 Recognition method for T-cell specific Composite Elements NFAT/AP-1 NFATp AP-1 0,7 1,7 2,7 3,7 4,7 5,7 6,7 0,71,21,72,22,73,23,74,24,7 NFAT/AP-1 (training) Random  NFAT = -log(1-score NFAT )  AP-1 = -log(1-score AP-1 ) Composite score

74 TTTGGCGCGAAA Selection of motifs with high frequency in a window WSG motif: window: [ ] Promoters of cell-cycle genes: Exon 2 sequences:............. } } Frequency of the motifs in the window.............

75 Motifs found in the local context of E2F sites in promoters of cell cycle-related genes Score of context: +1 1000 3000 5000 7000 9000 +1 1000 3000 5000 7000 9000 -1000 Human uracil DNA-glycosylase (E2F sites) + score of context ttTTTGCCGCGAAAag q=0.92 d=2.8 (known site)

76 SITEVIDEO system Building of E2F site recognition program (step 2)

77 SITEVIDEO system Building of E2F site recognition program (step 3)

78 Composite modules w... Start of transcription... Parameters of the model to be estimated K - number of TF matrixes

79 Composite modules w... Start of transcription... Parameters of the model to be estimated Genetic Algorithms

80 Exon-2 sequences Cell cycle-related promoters Composite module in promoters of cell cycle-related genes

81 Mouse c-fos promoter Cell cycle composite module

82 Computationally predicted E2F target genes confirmed by in vivo footprint Chromatin crosslinking Immunoprecipitation PCR

83 G1 G1/SSG2 G1 G1/SSG2 G1/S-cycle G1/S-growth

84 Results of selection of a specific combinations of sites that distinguish G1/S cycle and G1/S growth promoters. (microarray data) E2F and a set of additional factors can distinguish these two sets of promoters. AP-4 factors – an ubiquitous factor that have similar structure of DNA binding domains as E2F and Myc – main cell cycle regulators; IK3 (Ik-1...Ik-5 - a family of zink finger TF that play a role in development of the lymphocytes). Pax-2 factor is known to be involved in regulating cell cycle by inhibiting the p53 transcription. It is known that Oct-3 differentially phosphorylated during cell cycle and may have a role in the regulation of the G1/S growth promoters. As for Cup site, it was already speculated that the structure of the basal promoter may play an important role in differentiating gene expression during cell cycle

85 TGASTCA AP-1... Jun Fos

86 human TNF  promoter mast cells T-cells + ? dendritic cells T-cells -107-74 NFAT AP-1 NF-kB C/EBP AP-1 VDR

87 Fuzzy puzzle hypothesis of the multipurpose structure of the eukaryotic promoters: of coding multiple regulatory messages in the same DNA sequence. A,B,C and D,E,F – two sets of TF; 1,2 – two sites in DNA; BC – basal complex.

88 There‘s More Then One Way To Do It (Convergent evolution)

89 AXX list of genes

90 Extract promoters using TRANSGENOME AXX promoter set

91 ImportanceCore cut-off Matr. Cut-off ACMatrix --------------------------------------------- --------------------------------- 0.917751 0.877000 0.930000 M00062 V$IRF1_01 0.323077 1.000000 0.948000 M00339 V$ETS1_B 0.640828 0.989000 0.982000 M00199 V$AP1_C 0.276923 0.840000 0.853000 M00037 V$NFE2_01 1.000000 0.7560000.760000 M00481 V$AR_01 0.159172 0.869000 0.866000 M00699 V$ICSBP_Q6 Interferon regulatory factor 1 Ets factors AP-1 NF-E2 – an erythroid-specific factor Androgen receptor Interferon Consensus Sequence binding protein Composite module found in the AXX promoters

92

93

94 Insulin pathway ? InsR

95 Insulin Part of the insulin signaling network in TRANSPATH STAT1 Ras InsR Signaling network analysis

96 AhR targets Gene expression Log(Experiment/Control)

97 S41 distance = 0.417599 D2:0.658627 SIG:0.000000 MIN_LENGTH 300 0.000000 3.581248 1.000000 0.933000 M00026 V$AHR_Q5 2.942371 1.000000 0.917000 M00639 V$HNF6_Q6 0.798865 0.844000 0.900000 M00220 V$SREBP1_01 0.409376 0.962000 0.926000 M00173 V$AP1_Q2 0.055716 0.959000 0.989000 M00726 V$USF2_Q6 -1.329975 1.000000 0.959000 M00235 V$AHRARNT_01 -0.713625 1.000000 0.918000 M00156 V$RORA1_01 -0.668375 0.903000 0.854000 M00201 V$CEBP_C Composite model correlate with the expression level TSS -1000+1000 V$AHR_Q5 V$AHRARNT_01

98 0.0983 * V$TCF11MAFG_01(0.821) 0.0471 * V$FOXO4_01(0.961) 0.0301 * V$IPF1_Q4(0.852) 0.0410 * V$AR_01(0.851) 0.0766 * V$GR_Q6(0.971) 0.0482 * V$STAT1_02(0.995) 0.0508 * V$CEBPB_01(0.98) 0.0281 * V$STAT5A_02(0.826) 0.1040 * V$CETS1P54_02(0.949) -50- V$TCF4_Q5(0.908) 0.0751 * V$TCF1P_Q6(0.726) -50- V$STAT6_01(0.861) 0.0728 * V$SF1_Q6(0.684) -50- V$SMAD3_Q6(0.833) 0.0419 * V$ELK1_02(0.862) -50- V$GRE_C(0.842) Composite module found in promoters of differentially expressed genes in liver of growth hormone-deficient mice (Sma1). differentially expressed genes Non-changed genes

99 Results of the ArrayAnalyzer ™ search upstream from TFs resulting in identifying: growth hormone (GH) and receptor tyrosine kinases (RTK) as potential key molecules involved in differential expression of the genes in liver of growth hormone-deficient mice (Sma1).

100 TRANSPATH and tools, ArrayAnalyzer and PathwayBuilder 4 At the next step, one can map the transcription factors found at the previous step on the signaling network of the TRANSPATH. If the factors found are parts of the same cascades that have been suggested on the step 1, then probability is increased that those factors are responsible for the coordinated gene regulation.

101 Feedback loops in activating immune cells through NF-AT/AP-1

102 Network controlling S phase entry in response to a proliferative signal

103 Phylogenetic footprint of promoter regions of nucleolin genes HSNUCLEO - Homo sapiens; CSNUCLEO - Cricetulus griseus; MMNUCLEO - Mus musculus; RNNUCIA1 – Rattus norvegicus TFBS identification via pattern search

104 A T G C

105 A T G C A T G C A T G C 1) 2)3)

106 Result of comparison of four different pattern discovery programs on the sets of simulated sequences with implanted TF binding sites for one matrix; y-axis: the averaged sum of squared differences between reveled matrix and the original one; x-axis:  values, that are the probabilities of “consensus nucleotide” in each position of the matrix.

107 Gradual evolution by fixation of multiple substitutions (Protein functional centres) Edited bipolymer by fixation of a small number of substitutions (Protein folding) Evolution at once by fixation of single substitutions (Regulatory regions of eukaryotic genes) Three mechanisms of biopolymer evolution

108 Thank you ! www.biobase.de


Download ppt "From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany"

Similar presentations


Ads by Google