Laboratory and clinical data for therapy outcome prediction an overview of presently available tools Francesca Incardona Russian Congress of Laboratory Medicine, Moscow October 2016
A foreword: The importance of sharing data Evidence based medicine Research results: higher number of studies Funding from companies Research results: different kind of studies when large data numbers are needed bioinformatics Integrating laboratory and clinical data
Problems with sharing data (at least in European perspective) Will: data is power, data is money Organisation: plethora of local IT systems, codes, permission rules, burocracy Technical issues: how to integrate different data-sets Privacy, ethical issues PAPER! From a research in Tuscany region (Italy) 60% of infectious diseases centres still collects data on paper and 60% of the labs is not connected with clinics Similar situation in the UK
Solutions for sharing data Will: Transparency of the governance, data ownership and authorship agreements Technical issues: physical versus virtual integration Ex. EuResist Integrated DataBase: 66.000 patients Ex. EuroCoord, COHERE: federative DBs: 300.000 patients (now closed) Ontology-based data integration, use of metadata STANDARDS: HL7, ICD10, HICDEP
Solution for sharing data Adoption of the same system at regional or national level with clinical health records integrated with laboratory and administrative IT platforms Possibly a validated Good Practice Ex. InfCare HIV: born at Karolinska Institute, now Swedish HIV registry + adopted in whole Baltic region + in centres in India and Africa
InfCare HIV and Hepatitis
InfCare HIV
The strength of integrated data “Data need to be made more widely and transparently available to people in a form that they are able to interpret and use. Information drives change, and data drive decisions and accountability. The missing piece is often the data.” ACTIONS FOR THE FUTURE - UNAIDS. How AIDS changed everything — August 2015 Retrospective - observational - epidemiological studies Bioinformatics studies Ex. Geno2Pheno Ex. EuResist Engine Ex. Phylogeotool
Background: HIV resistance and computer assisted therapy Lifelong treatment + extraordinary HIV variability → Resistance + toxicity, adherence → Choosing the best treatment can be problematic With expanded access to therapy, resistance has recently become more important in LMIC too [Hamers 2011, Hamers 2012, Gupta 2012] High dimensional problem (mutations in HIV genome, number of drugs in different combined regimens, patients’ characteristics), Today, HIV genotype analysis with computer-rules- based algorithms is state-of-the-art in clinical practice: ex. HIVDB, ANRS, Rega
HIVDB https://hivdb.stanford.edu
www.hivfrenchresistance.org https://rega.kuleuven.be Known as the “ANRS” system for predicting drug resistance associated to a sequence: only for Protease and Reverse Trascriptase https://rega.kuleuven.be Rega algorithm
The bioinformatics approach Machine learning techniques to provide objective, clinically relevant resistance data interpretation Need of large repository of data to train the models Historically, bioinformatics systems trained on genotypes which cultured in vitro showed certain phenotypes Since 2008, EuResist Engine: Integration of viral genomics with clinical data to predict response to anti HIV treatment
S M A LL Large data set Web interface Combined predictive system Integrating DBs from different sources S M A LL Large data set Combined predictive system Web interface Individual engines End users Connections used during project life and then for system updates Connections used by the final users
Geno2Pheno - Max Planck Institut Informatik The importance of sharing data The situation in Europe The two main solutions: physical integration and federation with examples The bioinformatics use of data in HIV: Geno2Pheno EuResist old EuResist new
Geno2pheno [resistance] 3.4
HIV tropism In 2008 Pfizer released a new antiviral, maraviroc, acting on a different region of the virus: the protein (gp120) binding to the CD4 receptor This binding needs also a coreceptor which can be of two types: ccr5 or cxcr4 Preference for one type of coreceptor or the other by the virus is called tropism Maraviroc is active only on ccr5 virus Very expensive tropism testing
HIV tropism
Geno2pheno [coreceptor] 2.5
Geno2Pheno [coreceptor] Uses a statistical method called Support Vector Machines (SVM) Has been trained on 1100 V3 sequences from 332 patients Has become a zero-cost substitution for genotypic tropism test -------------------------------------------------- Geno2Pheno [integrase], [HCV], [HBV] are rules based
Rega HIV-1 subtyping tool Uses phylogenetic methods to identify the subtype of a given sequence, for HIV 1 and HIV 2 The golden standard for subtyping, adopted by Stanford and several others regatools.med.kuleuven.be/typing
EuResist Network GEIE A European no-profit grouping Founded by Informa, Karolinska Institute, Max Planck Institute, University of Cologne, University of Siena, now has 16 partners worldwide It manages the EuResist Integrated Database (EIDB) which contains more than 66,000 patients and is open for research All partners are members of the Scientific Board which evaluates the research proposals that require access to the EIDB
The bioinformatics approach
EuResist Integrated DataBase More than 66,000 patients
The EuResist prediction Engine It is not rules based It is formed by three independent statistical models Each model uses SVM combined with specific features extraction and/or a resistance tree model Each model is trained on the EIDB where sequences are linked to clinical follow-up of the patient (initially 8 weeks) The three models are combined to form one robust prediction engine 25 25
The Prediction Engine (in English and Russian) 26 26
EuResist Engine 2015
EuResist versus expert: EVE study 25 HAART cases randomly selected form the EuResist db: Obsolete therapies excluded Wild type genotype excluded All clinical and virological information available 12 experts enrolled, response obtained from 10: On-line anonymous rating Only European vs. non-European setting traceable Use of any interpretation system allowed (and declared) Zazzi et al. HIVmed 2010 28 28
Performance Both the single engines and the combined engine outperform significantly Stanford HIVDB The AVERAGE is the best option for engine combination Rosen-Zvi Bioinformatics 2008 Altmann PlosOne 2008 Prosperi Antivir Ther 2009
Engine update 2015 Updated on the larger data set Time of clinical follow-up extended to 24 weeks Performance remains similar
New Engine 2016 New definition of therapy success: number of aviremic semesters Presented at ACM Conference on Bioinformatics and Computational Biology 2016 - Seattle, 2-7 October
Phylogeotool Visualisation tools can help interpret large datasets Eg. Geographic visualization of viral strains can be applied to surveillance for epidemics and outbreak of viral pathogens Tracking of individual variants with specific characteristics e.g. risk group, drug resistance, … can elaborate their relation to geographic or phylogenetic spread
Phylogeotool
Contact: f.incardona@informacro.info “A computer makes as many mistakes in two seconds as 20 men working 20 years make” Arthur Bloch Murphy's Laws for Technology CΠACИБO http://www.euresist.org http://engine.euresist.org Contact: f.incardona@informacro.info 34 34