Department of Plant Systems Biology Research at the Bioinformatics & Computational Biology research groups
2Yvan Saeys, Donostia 2004 Department of Plant Systems Biology Headed by Prof. Dirk Inzé –203 people (179 research staff, 24 technical/administrative staff) 6 Research Divisions –Biology (146) Molecular Genetics Division (87) Functional Genomics Division (19) Plant-Microbe Division (19) Genome Dynamics and Gene Regulation Division (19) –(Bio)Informatics (33) Bioinformatics and Evolutionary Genomics Division (24) Computational Biology Division (9)
3Yvan Saeys, Donostia “Computational” research groups Bioinformatics and Evolutionary Genomics (BEG) –Mainly deal with sequence data Comparative Genomics (Yves Van de Peer) Gene prediction & Annotation (Pierre Rouzé) Computational Biology Division (CBD) –Explore biological systems (networks) Headed by Martin Kuiper
4Yvan Saeys, Donostia 2004 Dr. Martin Kuiper Prof. Yves Van de Peer Dr. Pierre Rouzé Group Leaders
5Yvan Saeys, Donostia 2004 Research activities Comparative Genomics Gene Prediction & Genome Annotation Annotation of genomes Machine Learning Ancient large- scale gene duplications Functional divergence of duplicated genes Promoters and regulatory elements Transcription factors Bacterial comparative genomics Non coding RNAs Gene network modelling Heterosis
6Yvan Saeys, Donostia 2004 Ancient large-scale gene duplications Investigate major events during evolutionary past of genomes: –Large scale gene duplications –Genome duplications Research –Algorithms to detect colinear regions –Compare intra and inter species –Arabidopsis: 3 whole genome duplications –Comparisons between Arabidopsis and Rice –Duplications in vertebrate genomes Klaas Vandepoele Cedric Simillion
7Yvan Saeys, Donostia 2004 Large-scale duplications synteny ancient duplication HsaC1 HsaC9 recent duplication C2 C4 colinearity
8Yvan Saeys, Donostia 2004 Ancient large-scale gene duplications A B A B Building genomic profiles C Not significant ! C Not significant
9Yvan Saeys, Donostia 2004 Functional divergence of duplicated genes Duplications stimulate biological novelties –Investigate what happens to duplicated genes –Study of models for gene evolution –Genes are not individual entities, but members of gene families Research –Up to 65% of the genes in Arabidopsis belong to a gene family –Divergence at the regulatory/expression level –Divergence at the coding level. Tine CasneufJeroen Raes
10Yvan Saeys, Donostia 2004 Functional divergence of duplicated genes
11Yvan Saeys, Donostia 2004 Bacterial comparative genomics Investigation of multiple bacterial genomes –Genomes evolve over time, changing in subtle or radical ways, constantly adapting to the surrounding environment –Genomes can evolve gradually through vertical transmission of mutations, gene duplications, deletions, and rearrangements –Alternatively, they can evolve more suddenly and sporadically via horizontal transfer of genetic information between different microbial species Research –Assess the contribution of gene duplications to genome evolution in prokaryotes Dirk Gevers
12Yvan Saeys, Donostia 2004 Bacterial comparative genomics Functional Landscape of the Paranome (FLOP): Linking functional information to the paranome information Allows us to determine whether paralog retention is biased towards specific functional classes for each of the bacterial strains
13Yvan Saeys, Donostia 2004 Transcription factors Towards a better understanding of the link between evolution and development (evo-devo) –Transcription factors play a major role in the regulation of gene expression –Study the evolutionary and functional divergence of genes belonging to large transcription factor gene families Research –Structural and phylogenetic analyses of the MADS-box gene family –Comprehensive view on the regulatory role of MADS-box genes in plant development –Phylogenetic footprinting Stefanie De Bodt
14Yvan Saeys, Donostia 2004 Transcription factors
15Yvan Saeys, Donostia 2004 Genome Annotation Structural annotation of genes/genomes –Locate genes in genomes –Find the exact gene structures –Investigation of particular gene families Research –Development of an automatic annotation platform that can be applied to different genomes –Genomes: Arabidopsis, Poplar, Medicago, Ostrecoccus tauri Stephane Rombauts Lieven Sterck Steven Robbens
16Yvan Saeys, Donostia 2004 Genome Annotation platform RepeatMasker Coding potential search SplicePredictor Netstart NetGene2 BlastnBlastx EuGene Intrinsic approaches Extrinsic approaches Predicted Genes (structural annotation)
17Yvan Saeys, Donostia 2004 Dataset construction for Poplar Let EuGene make prediction based on extrinsic data EuGene Blastn RepeatMasker Blastx Extrinsic approaches IMM Splicing: WAM Start: const Intrinsic approaches EuGene framework Blast against Arabidopsis proteins with full length, discard cDNAs that have no hit Training set of mapped cDNAs Poplar IMM SpliceMachine Start prediction Select predicted genes covered by FL cDNA Final prediction of EuGene
18Yvan Saeys, Donostia 2004 Annotation of core cell cycle genes in Ostreococcus tauri The CDK gene family
19Yvan Saeys, Donostia 2004 Machine Learning (applied to genome annotation) Computational techniques to identify structural elements –Supervised classification methods –Support Vector Machines –Feature selection for knowledge extraction Research –New splice site prediction models –New feature selection techniques for gene prediction –Leads to more accurate gene models Sven Degroeve Yvan Saeys
20Yvan Saeys, Donostia 2004 Splice Machine
21Yvan Saeys, Donostia 2004 Feature selection for acceptor prediction
22Yvan Saeys, Donostia 2004 Promoter prediction Computational identification of promoter regions –Signal elements –Structural features –Still many false positives Research –Develop new tools and approaches for the automatic delineation of promoters –Motif detection –Detecting cis-regulatory elements –Phylogenetic footprinting Kobe Florquin
23Yvan Saeys, Donostia 2004 Promoter prediction
24Yvan Saeys, Donostia 2004 Non coding RNAs Many RNA molecules are not protein coding but instead function through their RNA form –Known a long time: transfer RNAs (tRNA), ribosomal RNAs (rRNA) –Only recently discovered: small interfering RNAs (siRNA), micro RNAs (miRNA), … –Regulate gene expression at the post-transcriptional level Research –Developing different computational tools and techniques to detect and characterize non-coding RNAs in Arabidopsis and other plant genomes Jan Wuyts Eric Bonnet
25Yvan Saeys, Donostia 2004 Non coding RNAs: MIRfinder
26Yvan Saeys, Donostia 2004 Comparison between plant species
27Yvan Saeys, Donostia 2004 Genetic networks Integrate functional genomics data of all types in a global network that reflects the regulatory wiring and modularity of an organism –Micro-array data from perturbation experiments –Leaf development Research –Novel methods, based on combinatorial statistics and graph theory –Unsupervised classification techniques (k-core clustering, Kohonen maps) Steven Maere Steven Vercruysse
28Yvan Saeys, Donostia 2004 Genetic networks Comb. p-value < 0.01 k-core clusteringGO labeling & visualization Gene profiles Experiments
29Yvan Saeys, Donostia 2004 Genetic networks Hierarchical clustering Many other algorithms… Self-organizing map - Regulatory interactions Goal : getting information about: - Protein function (same profile => same biol. process?)
30Yvan Saeys, Donostia 2004 Heterosis Modeling of “hybrid vigour” –Improved performance of F1 hybrids with respect to the parents –Dominance Model –Over-dominance Model –Epistatic Model –biometrics versus soft-computing approach Research –Additive versus dominance effects –Estimation of the molecular phenotype of the hybrid Jeroen MeeusElena Tsiporkova
31Yvan Saeys, Donostia 2004 Heterosis: Biometrics Approach genes 10 parents45 hybrids genes biomass leaf size … biomass leaf size … 10 parents 45 hybrids heteroticnon-heterotic Step 3 prediction Step 1 correlation hybrid-parents Step 2 correlation morphological- molecular phenotypes Step 2 correlation morphological- molecular phenotypes Molecular Phenotypes Morphological Phenotypes
32Yvan Saeys, Donostia 2004 Heterosis: Soft-Computing Approach genes 10 parents45 hybrids genes biomass leaf size … biomass leaf size … 10 parents 45 hybrids heteroticnon-heterotic direct classification simulation association Molecular Phenotypes Morphological Phenotypes
33Yvan Saeys, Donostia 2004 Databases European ribosomal RNA database / European Plant Promoter database (PlantCARE) PlantCARE/index.html European Federated Plant Database Network (Planet) Software Tree construction: TreeCon Tools: ForCon, SPADS, ZT, AFLPinSilico Large-scale duplications: Adhore, i-Adhore, ASaturA Website Francis Dierick: databases, webmaster, support Gert Sclep: CATMA and CAGE databases
34Yvan Saeys, Donostia 2004 “Part-time” Phd students Secretary Guy Baele: Modelling the covarion hypothesis Dirk Vandycke: Extrinsic gene prediction approaches Ann Bostyn
35Yvan Saeys, Donostia 2004 Thanks to…