Introduction to Bioinformatics February 13, 2017 Dr. ir. Perry Moerland Bioinformatics Laboratory Academic Medical Center p.d.moerland@amc.uva.nl Graduate School ‘Bioinformatics’
Aim of course Get acquainted with the basic principles and algorithms of commonly used bioinformatics tools Gain sufficient theoretical knowledge and practical skills to be able to apply bioinformatics adequately in your own work
Topics (I) Possibilities and limitations of public biological DBs Statistical concepts for ‘omics data analysis DNA microarray analysis Proteomics data analysis Metabolomics data analysis Pathways and networks Genetical genomics Capita selecta: DNA methylation array analysis, sample misannotation, …
Topics (I) Possibilities and limitations of public biological DBs Statistical concepts for ‘omics data analysis DNA microarray analysis Proteomics data analysis Metabolomics data analysis Pathways and networks Genetical genomics Capita selecta
Topics (II) Methods for the analysis of data generated with high-throughput technologies Microarrays Mass spectrometry Next generation sequencing Course: Bioinformatics Sequence Analysis
What has been left out Almost anything sequence-based Phylogenetics Construction of evolutionary trees Image from Florian Markowetz’s blog: https://scientificbsides.wordpress.com/2014/12/08/the-biggest-problem-in-cancer-evolution-that-mostly-people-like-me-are-doing-it/
What has been left out Almost anything sequence-based Phylogenetics Construction of evolutionary trees Modeling of intra-tumour heterogeneity Source: Florian Markowetz’s blog: https://scientificbsides.wordpress.com/2014/12/08/
What has been left out Almost anything sequence-based Phylogenetics Construction of evolutionary trees Modeling of intra-tumour heterogeneity Comparative genomics Protein modeling, protein docking Systems biology Information management Programming e-Science Multi-omic approaches, exception: eQTL
Related AMC Graduate School courses Computing in R Unix e-Science (Big Data) Bioinformatics Sequence Analysis Systems Medicine Practical Biostatistics Advanced Biostatistics Genetic Epidemiology BioSB Research School: http://biosb.nl/education/course-portfolio-2/ Pattern Recognition (Machine Learning) DNA Technology Mass Spectrometry, Proteomics and Protein Research
Possibilities and limitations of public biological databases Most high-throughput data is publicly available Often enforced by journals Possibilities Limitations Errors in databases GPL11012 (Gene Expression Omnibus)
Possibilities and limitations of public biological databases Most high-throughput data is publicly available Often enforced by journals Possibilities Limitations Errors in databases GPL11012 (Gene Expression Omnibus) Zeeberg et al., BMC Bioinformatics. 5:80 2004
Possibilities and limitations of public biological databases Most high-throughput data is publicly available Often enforced by journals Possibilities Limitations Errors in databases GPL11012 (Gene Expression Omnibus)
Statistical concepts for ‘omics data analysis High-dimensional data 10,000s of genes, transcript variants, proteins, metabolites 100,000s of single nucleotide polymorphisms, epigenetic markers In general, much less samples: ~100s Experimental design Quality control Pre-processing: normalization Differential expression: statistical tests, multiple testing Unsupervised: clustering Supervised: classification, prediction Widely applicable: next-generation sequence analysis, for example
‘Omics technologies Microarrays mRNA Single nucleotide polymorphisms Methylation Transcription factor binding Chromosal aberrations – aCGH (comparative genomic hybridization) Mass spectrometry Proteins: identification Metabolites: pre-processing
Pathways and networks activated pathways Interorgan coordination of the murine adaptive response to fasting : 5 tissues 5 timepoints 5 mice per timepoint Hakvoort et al., J Biol Chem, 286(18):16332-43, 2011
lipid steroid carbohydrates metabolism amino acid FoxOs cell turnover transcriptional network FASTING CHALLENGE ‘to serve and protect’ metabolic regulators central controller lipid steroid carbohydrates amino acid metabolism cell turnover immune response ox. stress defense cMyc Sp1 p53 EGF AP-1 HNF4α FoxOs NRs
Genetical genomics Locus SNP X modulates expression of gene Y = Expression quantitative trait locus (eQTL) SNP X TFIIB TFIIE TFIIH IN R TFIID TFIIF RNA polymerase II TFIIA TBP proximal promoter core distal promoter/ enhancer TF binding sites „DNA-looping“ TATA TF binding sites Expression gene Y Gene Y Genotype SNP X Source: Michiel Adriaens
Bioinformatics Laboratory Department of Clinical Epidemiology, Biostatistics and Bioinformatics You are welcome if you need bioinformatics expertise The earlier, the better! wiki.bioinformaticslaboratory.nl
Practical things Certificate Other things Attend all sessions (half a day can be skipped, ask for possibility for self-study) Active participation Other things Lunch is not included Coffee, tea, … is available at the machines (with your AMC badge) Slides and exercises will be made available on http://wiki.bioinformaticslaboratory.nl/ under ‘Education’
Schedule