TRANSFAC Project Roadmap Discussion
Structure DNA-binding domain (DBD) The portion (domain) of the transcription factor that binds DNA Trans-activating domain (TAD) An optional signal sensing domain (SSD) DNA binding domain There are families Transcription Factors
Transcription Factors Classes by their (1) mechanism of action, (2) regulatory function, or (3) sequence homology
TRANSFAC Revisited TRANSFAC efforts Thanks to Kent and Ricky ’ s efforts, we have a good view of the tables related to TFBSs TRANSFAC information can be queried with customized SQL commands TFBSs data can be extracted according to our needs
TRANSFAC Revisited TRANSFAC suite TRANSCompel ® - specialized data on composite regulatory elements TRANSCompel ® TRANSPro ™ - a substantial collection of human, mouse and rat promoter sequences TRANSPro ™ PathoDB ® - a comprehensive database on pathologically relevant mutations in transcription factors or their binding sites PathoDB ® S/MARt DB ™ - widely-researched data on the scaffold or matrix attached regions (S/MARs) of eukaryotic genomes and their binding proteins S/MARt DB ™
TRANSFAC Revisited TRANSFAC suite TRANSCompel ® TRANSCompel ® TRANSPro ™ TRANSPro ™ PathoDB ® PathoDB ® S/MARt DB ™ S/MARt DB ™ X X
What TRANSFAC has addtionally TRANSPRO Promoters! MATCH (the searching algorithm) Matrix-based search With Tissue- (or state-) specific profiles GENE (direct) links between factor and target gene; Between gene and encoded factor
Beyond TRANSFAC CYTOMER (with links to TRANSFAC) Anatomical structures: organs, cells Developmental stages: physiological systems Description of expression patterns based on ESTs (1d) TRANSPATH Pathways (gene regulation) Many external links to other DBs
Preliminary Tasks Data collection Mastering of the TRANSFAC data organization and structures Collection of external data, e.g. genomes, ESTs, from GeneBank, Ensembl, SCPD, etc. Data generation for different purposes, e.g. modeling, data mining, datasets
Preliminary Tasks Data mining/analysis Statistics related to TFBSs, or more Approximate matching, n-grams involved Global features discovery Understanding of the TFBS mechanisms Database Curation Data cleansing Species specific Novel TFBSs retrieval
Preliminary Tasks Motif discovery New models (statistical) New motif discovery algorithms Gene Network Analysis Making use of the links between FACTOR and GENE, as well as the expression information in CYTOMER More …
Task assignment (very preliminary) TasksRemarksMembers Data Collection TRANSFAC structure; External data (Bio knowledge); dataset generation Ricky, Shaoke, Cyrus Data Mining Statistics (largely); Approximate matching Peter, Ricky, David, Ni Bing Database Curation Bio knowledge involved: species, tissue-specific, … Indexing involved; application development Shaoke, David, Ricky Motif Discovery Modeling; Repersentations; Algorithms Peter, Li Gang, Cyrus Gene Networks Incorporating TRANSFAC knowledge Li Gang, Cyrus More e.g. PathoDB about SNPs Phoenix
Discussions Assignment of tasks Thanks