Bioinformatics and Biostatistics in Limagrain / Biogemma JOBIM Conference, July 2015
An international agricultural cooperative group 4th largest seed company worldwide Nearly 2,000 farmer members Sales of nearly 2 billion Euros Nearly 9,000 employees Subsidiaries in 42 countries 13.5% of turnover re-invested in research A portfolio of strong brands
A group that specializes in seeds and cereal products Field Seeds Field Seeds Limagrain Coop Vegetable Seeds Vegetable Seeds Cereal Products Bakery Products Garden Products Cereal Ingredients
A European group open to the world 64% of sales 64% of workforce Nearly 9,000 employees 66 nationalities 69% of sales achieved outside France Subsidiaries in 42 countries 23% of sales 16% of workforce 7% of sales 12% of workforce Americas Asia & Pacific 6% of sales 8% of workforce Africa & Middle East
An innovative group 13.5% of turnover invested in research 200 M€ with collabora- tions) 13.5% 10.2%* 5.4%* 2.25%* Average industry Automobile industry Pharmaceutical industry Limagrain * Source : Leem - April 2013
BIOGEMMA, a research partnership Biotechnologies 9.5% 16% 55 % 10% Field Seeds
Biogemma Identification of genes associated with agronomic traits Development of GM varieties in cereals Development of tools and knowledge BIOINFORMATICS |
Bioinformatics for breeding Molecular Breeding Biostatistics Discover Associations Bioanalysis Explain Associations Tools Bioinformatics db Analyze NGS-based data Develop databases and tools to store and analyse biological data
HPLC Crystallo-graphy Omics analysis Phenotype Environment Chromatin Silencing Regulation of transcription miRNA, siRNA Protein modification, interaction, turnover Regulation of translation RNA stability What we measure Markers mRNA Transcription levels, DGE Protein Quantity, Activity levels Trait Phenome Regulation of expression How we Genotyping Sequencing RNA-Seq microarrays HPLC Crystallo-graphy IA, NIR, HPLC, eyeball DNA Genes, Genomes Biological material RNA mRNA, rRNA Transcriptome Enzyme Proteome Metabolome Transcription Translation Expression LD mapping, GWAS, GS
A great deal of complex information to correlate Environment Genotype Phenotype Data processing tools getting more and more sophisticated
Data analysis & processing Data Life Cycle Data production & acquisition Results interpretation & decision support field trials predicting cross value genotyping sequencing genomics LIMS, databases evaluation of individuals data retrieval quality control building predictive model statistical analyses Data analysis & processing
Data production & acquisition Sequencing NGS based: whole genome, targeted sequencing, transcriptome Deliverables: SNP, structural variations, gene expression level, genomes Genotyping High density chips 103 – 105 SNP 105 samples Automate calling / quality control Steem_Z30_rep1 Steem_Z30_rep2 Steem_Z32_rep1 Steem_Z32_rep2 Steem_Z65_rep1 Steem_Z65_rep2
Data production & acquisition Phenotypic data Automate data collection Sensors, images, NIR spectrometry… Adjustments/corrections by geostatistical methods Extraction of relevant information
Data production & acquisition Environmental data Local / internal: Sensors, airborne imagery, … Global / external: Databases, internet, satellite images, … Precise description of the growing conditions Air temperature Relative humidity Dew point
Modelling Molecular data Cost Availability Predict: genotype phenotype QTL/GWAS – identify genomic regions involved genomic selection – "black box" approach
Modelling Statistical methods Linear mixed models Bayesian approaches More and more complex models GxE Epistasis computationally intensive methods (from Van Eeuwijk et al., 2010)
Data management Integrative viewer for genomic data Databases BIG DATA: large volume of structured and unstructured data
Infrastructure Local on-the-premises computing "data-centric computing" Central enterprise resources Security NGS data analysis on BIOGEMMA HPC (912 cores) Elastic (cloud) flexibility low cost / hour CPU
Take Home Messages Bioinformatics: a major activity supporting a large range of applications in Limagrain Genomics Phenomics Enviromics Biostatistics, Modelling and Prediction Big Data (HPC, data management) Both R&D and Applied In a highly competitive and challenging research area Pied de page
More information… Pied de page
Thank you