X Y The significance of the structure of data on PLS predictions of protein involving both natural and human experimental design Åsmund Rinnan Lars Munck.

Slides:



Advertisements
Similar presentations
Richard Young Optronic Laboratories Kathleen Muray INPHORA
Advertisements

SUPPLEMENTARY FIGURES Figure S-I. Fluorescence intensity and subcellular localization of transfected EGFP fusion proteins HeLa cells were transferred into.
The world leader in serving science TQ ANALYST SOFTWARE Putting your applications on target.
Spectroscopic Analysis for biological samples : towards in situ sample analysis of body fluids Gilwon Yoon September 27, 2006 Seoul National University.
Design of Experiments Lecture I
Chemometric Working Group Recommendations and Summary of Discussion.
PCA for analysis of complex multivariate data. Interpretation of large data tables by PCA In industry, research and finance the amount of data is often.
Regression analysis Relating two data matrices/tables to each other Purpose: prediction and interpretation Y-data X-data.
Pullulanase Activity is Associated with Formation of Vitreous Endosperm in Quality Protein Maize Wu, Hao ; Clay, Kasi ; Thompson, Stephanie S. ; Love,
Variability in quality of wheat straw in terms of bio-ethanol production Jane Lindedam¹, Jacob Wagner Jensen², Sander Bruun¹, Claus Felby² and Jakob Magid¹.
Proteomics Examination Yvonne (Bonnie) Eyler Technology Center 1600 Art Unit 1646 (703)
« هو اللطیف » By : Atefe Malek. khatabi Spring 90.
1 Genetics The Study of Biological Information. 2 Chapter Outline DNA molecules encode the biological information fundamental to all life forms DNA molecules.
Pre-processing of NIR Åsmund Rinnan.
CALIBRATION Prof.Dr.Cevdet Demir
Introduction to BioInformatics GCB/CIS535
THIS IS With Host... Your DigestionNutritionLife Cycles Reproduction Respiration vs. Photosynthesis Genetics.
Elaine Martin Centre for Process Analytics and Control Technology University of Newcastle, England The Conjunction of Process and.
Preprocessing With focus on NIR
,. Sugar measurements in soybeans using Near Infrared Spectroscopy Introduction  Soluble carbohydrates are the third compound of soybeans by weight (11%),
Quick guide to pre-processing Use [Alt-Tab] to go to LatentiX (if running) Press [Page Down] or [Enter] to continue Press [ESC] to end the show.
Quantifying Sample DNA. Definition Quantifying DNA: a technique to calculate the quantity (weight) of DNA (deoxyribonucleic acid) in a sample. Using a.
SPECTRAL AND HYPERSPECTRAL INSPECTION OF BEEF AGEING STATE FERENC FIRTHA, ANITA JASPER, LÁSZLÓ FRIEDRICH Corvinus University of Budapest, Faculty of Food.
Automatic assignment of NMR spectral data from protein sequences using NeuroBayes Slavomira Stefkova, Michal Kreps and Rudolf A Roemer Department of Physics,
Bio 1000 Human Biology for Non-Majors. Introduction to Biology and Chemistry Biology is the study of life.
Permeation is the passage of contaminants through porous and non-metallic materials. Permeation phenomenon is a concern for buried waterlines where the.
1 Doing Statistics for Business Doing Statistics for Business Data, Inference, and Decision Making Marilyn K. Pelosi Theresa M. Sandifer Chapter 11 Regression.
Sirius™ version 6.0 Sirius™ is a software package for multivariate data analysis and experimental design. Application areas: Spectral analysis and calibration.
Demetris Kennes. Contents Aims Method(The Model) Genetic Component Cellular Component Evolution Test and results Conclusion Questions?
ANALYZING PROTEIN NETWORK ROBUSTNESS USING GRAPH SPECTRUM Jingchun Chen The Ohio State University, Columbus, Ohio Institute.
Threeway analysis Batch organic synthesis. Paul Geladi Head of Research NIRCE Chairperson NIR Nord Unit of Biomass Technology and Chemistry Swedish University.
The Unscrambler ® A Handy Tool for Doing Chemometrics Prof. Waltraud Kessler Prof. Dr. Rudolf Kessler Hochschule Reutlingen, School of Applied Chemistry.
Essentials of Life. Nutrients: Substances in food that your body needs Water - Helps in digestion absorption of food - regulates body temperature - carries.
Food Quality Evaluation Techniques Beyond the Visible Spectrum Murat Balaban Professor, and Chair of Food Process Engineering Chemical and Materials Engineering.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
Changes in Matter Chapter Eighteen: The Chemistry of Living Systems 18.1 The Chemistry of Carbon 18.2 Protein, Fats, and Nucleic Acids.
Food Science Event Division B & C Presented By:. Outline Introduction to Food Science Explanation of Rules Examples of Laboratories Examples of Quizzes.
Mechanisms for Diversity and Genetics Big Idea #3 In conjunction with Big Idea #2.
QUANTITATIVE ANALYSIS OF POLYMORPHIC MIXTURES USING INFRARED SPECTROSCOPY IR Spectroscopy Calibration –Homogeneous Solid-State Mixtures –Multivariate Calibration.
Nutrition Essential Nutrients
Nonlinear differential equation model for quantification of transcriptional regulation applied to microarray data of Saccharomyces cerevisiae Vu, T. T.,
Measuring Soil Properties in situ using Diffuse Reflectance Spectroscopy Travis H. Waiser, Cristine L. Morgan Texas A&M University, College Station, Texas.
Evaluation of soil and vegetation salinity in crops lands using reflectance spectroscopy. Study cases : cotton crops and tomato plants Goldshleger Naftaly.
AR Time until 10:29 1. Student Planner March 23, 2015 Place this in the proper place Study vocabulary words. Test Thursday You need planner, notes, pen/pencil.
Carbohydrates, Fats, and Proteins
Date of download: 6/22/2016 Copyright © 2016 SPIE. All rights reserved. Schematic representation of the near-infrared (NIR) structured illumination instrument,
The TDR Targets Database Prioritizing potential drug targets in complete genomes.
Studies on the feasibility of using chemometric modeling of spectral data for the determination of post-mortem interval of skeletal remains. Kenneth W.
Potential of Hyperspectral Imaging to Monitor Cheese Ripening
Stats Methods at IC Lecture 3: Regression.
High-throughput genomic profiling of tumor-infiltrating leukocytes
Chapter 13 Simple Linear Regression
EQTLs.
Term project for the coursework AE 569
Flávia de Souza Lins Borba, Ricardo Saldanha Honorato, Anna de Juan 
Relating Small Molecule Structure to Small Molecule Performance
Correlation and Regression
Interval selection complexity
The 6 Essential Nutrients

Example of PCR, interpretation of calibration equations
Section 1 Chemistry Is a Physical Science
A case study in the local estimation of shear-wave logs
Summary of the Standards of Learning
Volume 3, Issue 1, Pages (July 2016)
Michal Levin, Tamar Hashimshony, Florian Wagner, Itai Yanai 
Changes in Matter. Changes in Matter Chapter Eighteen: The Chemistry of Living Systems 18.1 The Chemistry of Carbon 18.2 Protein, Fats, and Nucleic.
Diego Calderon, Anand Bhaskar, David A
Understanding How the Ranking is Calculated
Marijn T.M. van Jaarsveld, Difan Deng, Erik A.C. Wiemer, Zhike Zi 
Presentation transcript:

X Y The significance of the structure of data on PLS predictions of protein involving both natural and human experimental design Åsmund Rinnan Lars Munck

Three Data-sets of barley B + C: The major substances protein, starch, cellulose, beta- glucan, fat and water are weighted to represent biological composition ABC NaturalSimulatedDoE All measured on NIR 6500 from nm with 2 nm intervals Rinnan Dataset Preprocessing PCA iPCA PLS Biology PLS - again Summary Munck Permutation Mutants Diff spec Data structure Genetics Conclusion Normal barley Protein mutants Carbohydrate mutants

Pre-processing of spectra Moving Window SNV with 130 nm window The nm spectral area visualizes the least differences between the three data sets Rinnan Dataset Preprocessing PCA iPCA PLS Biology PLS - again Summary Munck Permutation Mutants Diff spec Data structure Genetics Conclusion

PCA nm Rinnan Dataset Preprocessing PCA iPCA PLS Biology PLS - again Summary Munck Permutation Mutants Diff spec Data structure Genetics Conclusion

Interval PCA selects nm giving the least differences between datasets. Rinnan Dataset Preprocessing PCA iPCA PLS Biology PLS - again Summary Munck Permutation Mutants Diff spec Data structure Genetics Conclusion

Predicting protein Using the three datasets NatSimDoE RMSE r2r nLV 525 intercept slope Regression coefficients Rinnan Dataset Preprocessing PCA iPCA PLS Biology PLS - again Summary Munck Permutation Mutants Diff spec Data structure Genetics Conclusion

PLS diagnostics (to protein) A.Simple correlation coefficients: wave-length absorbtion to protein content. B.PLS Regression coefficients Natural Simulated DoE Rinnan Dataset Preprocessing PCA iPCA PLS Biology PLS - again Summary Munck Permutation Mutants Diff spec Data structure Genetics Conclusion

Isolating the chemical and biological components of the data-sets. ABC Natural Simulated Natural DoE Chemistry SimBiology RestBiology SimBiology Chemistry SimBiology = B – C RestBiology = (A – C) – (B – C) Rinnan Dataset Preprocessing PCA iPCA PLS Biology PLS - again Summary Munck Permutation Mutants Diff spec Data structure Genetics Conclusion

Predicting protein: by PLS: Chemistry and non simulated(rest) biology show high contributions while that of simulated biology is low. ChemistrySimBioRestBio RMSE R nLV 313 intercept slope Rinnan Dataset Preprocessing PCA iPCA PLS Biology PLS - again Summary Munck Permutation Mutants Diff spec Data structure Genetics Conclusion

Normalized regression coefficients Rinnan Dataset Preprocessing PCA iPCA PLS Biology PLS - again Summary Munck Permutation Mutants Diff spec Data structure Genetics Conclusion

Back to data, selected wavelengths Rinnan Dataset Preprocessing PCA iPCA PLS Biology PLS - again Summary Munck Permutation Mutants Diff spec Data structure Genetics Conclusion Full PLSCorrelation-PLS Wavelengths abs to protein Assignment PLS Phil Williams

Quick comparison Rinnan Dataset Preprocessing PCA iPCA PLS Biology PLS - again Summary Munck Permutation Mutants Diff spec Data structure Genetics Conclusion

Results: Summary Rinnan Dataset Preprocessing PCA iPCA PLS Biology PLS - again Summary Munck Permutation Mutants Diff spec Data structure Genetics Conclusion

Interpretation: We are working by ”Permutation science”: 1.By mathematical validation of models  permutation of data in chemometrics i.e cross- validation Rinnan Dataset Preprocessing PCA iPCA PLS Biology PLS - again Summary Munck Permutation Mutants Diff spec Data structure Genetics Conclusion

”Permutation science”: 2.Design of Experiments (DoE)  Permutation of data through experiments by human design. Rinnan Dataset Preprocessing PCA iPCA PLS Biology PLS - again Summary Munck Permutation Mutants Diff spec Data structure Genetics Conclusion

”Permutation science”: 1.By mathematical validation of models  permutation of data in chemometrics i.e. crossvalidation 2.Design of Experiments (DoE)  Permutation of data through experiments by human design. 3. Natural design  Permutation by selection of unique natural states where nature reveals its principles in data. Question: In chemometrics why not combine them all rather than focusing on mathematical permutation alone? All three permutation approaches are in the heart of chemometric validation of models! Why not use them together as we have done here. They are complementary. Rinnan Dataset Preprocessing PCA iPCA PLS Biology PLS - again Summary Munck Permutation Mutants Diff spec Data structure Genetics Conclusion

Principles of natural processes are reflected in data The solar eclipse reveals solar eruptions The NIR barley endosperm mutant model developed since 1965 with expression control of genetics and environment Two types of mutants: regulative protein mutants – P and carbohydrate (starch) mutants – C (normal barley – N) *) *) J.Chemometrics 24: (2010) Rinnan Dataset Preprocessing PCA iPCA PLS Biology PLS - again Summary Munck Permutation Mutants Diff spec Data structure Genetics Conclusion

How were the mutants found? By a bi-variate plot % protein to mmol DBC (Dye binding capacity by acilanorange) The Dyebinding Capacity (DBC) instrument for basic amino acids (lysine). Background: Development of screening methods for improving lysine and nutritional quality in barley LM at the nutritional laboratory of the Swedish seed Ass. Svalöf in High lysine Mutation Mutation recombinants Normal recombinants DBC % protein Rinnan Dataset Preprocessing PCA iPCA PLS Biology PLS - again Summary Munck Permutation Mutants Diff spec Data structure Genetics Conclusion

Selecting endosperm mutants J.Chemometrics 24: (2010) No data Vitamin E profileA/P vs. b-gulcan Conclusion: Each mutant produces a unique chemical fingerprint for each individual gene in a controlled genetic background (Bomi). The fingerprint is summerized on the level of chemical bonds by NIR spectroscopy. Cellular computation is soft like a PCA. Rinnan Dataset Preprocessing PCA iPCA PLS Biology PLS - again Summary Munck Permutation Mutants Diff spec Data structure Genetics Conclusion Any chemical (bi-)plot can select any mutant.

There are deterministic differential NIR spectra for each mutant to the gene background Bomi that reveals a spectral absorption reproducibility as high as MSC log 1/R for the P mutant lys3.a(blue) and the C mutant lys5.g (brown). Rinnan Dataset Preprocessing PCA iPCA PLS Biology PLS - again Summary Munck Permutation Mutants Diff spec Data structure Genetics Conclusion

Data structure is super-ordinate to chemometric analysis 3.2 3c 3a The 3a and 3c P mutants are differentiated in this PCA However, spectral differences in the area nm represent a much more finely tuned and informative change in β -glucan from 3.1% in 3a to 6.4% in 3c Rinnan Dataset Preprocessing PCA iPCA PLS Biology PLS - again Summary Munck Permutation Mutants Diff spec Data structure Genetics Conclusion

How is the chemical composition of the cell decided? Through soft modeling of intercellular dynamics of the whole cell by quantum and chemical cross-talk as revealed by the movements of chromosomes at mitosis (click at the left figure). Cell emergence is like music as directed by the whole chemical orchestra of the cell Rinnan Dataset Preprocessing PCA iPCA PLS Biology PLS - again Summary Munck Permutation Mutants Diff spec Data structure Genetics Conclusion

Biological macro data are basically deterministic calculated in situ by “set probability” controlled by the whole cell Holistic analysis is limited by uncertainty specified as irreducibility “top down” and indeterminacy “bottom up” The structure of data is the king that rules mathematical modeling by data inspection Because of the determinism that here is demonstrated, data development of gentle data models (such as MSC) and data inspection software are of essential importance in avoiding a reduction of information. Chemometrics is excellent for over- views but the results have to be checked by data inspection, Rinnan Dataset Preprocessing PCA iPCA PLS Biology PLS - again Summary Munck Permutation Mutants Diff spec Data structure Genetics Conclusion