Presentation is loading. Please wait.

Presentation is loading. Please wait.

Importance of Semantics in Precision Oncology at NCI

Similar presentations


Presentation on theme: "Importance of Semantics in Precision Oncology at NCI"— Presentation transcript:

1 Importance of Semantics in Precision Oncology at NCI
4/25/2017 National Cancer Institute U.S. DEPARTMENT OF HEALTH AND HUMAN SERVICES National Institutes of Health Importance of Semantics in Precision Oncology at NCI Sherri de Coronado, MS, MBA NCI CBIIT May 15, 2015 National Cancer Institute

2 Mind Map of Precision Oncology Space
4/25/2017 If Precision Oncology is the goal, good semantics is important. Crude mind map of the complex multi-dimensional space to explore areas where we are concerned about semantics. Basic Ingredients underlying all big data science Calls for Precision Medicine Science and Compute capability Clinical Research and Care Semantics related Challenges Mind Map of Precision Oncology Space May National Cancer Institute

3 4/25/2017 + Reusable +BD2K Precision medicine: Prevention and treatment strategies that take individual variability into account. Several of the talks have been directly or indirectly related to Precision Medicine. Using a mindmap to show some major NCI efforts towards Precision Medicine and how important semantics is and will be in the future to success. Open Science – BD2K activities apply here, as Phil Bourne discussed -- and to other areas of this mindmap as well. A research commons, that ties together data description and access, software description and access, description with standards… Interoperability through Metadata – NCI has worked in that space for many years with caDSR. Off the graph, examples of other important efforts – NIH CDE portal, PROMIS, NeuroQOL, NLM’s VSAC (Value Set Authority Center). Semantic Interoperability through Integration of ontologies – that is an active area, and also a challenge area – with needs to reuse existing, integrate existing, and integrate research and clinical data streams. Data Sharing critical – Towards that end, will describe the New Genomic Data Sharing Policy Resources (of course! Data sharing is hard and takes a lot of resources –people/time/money for IRBs, clearances, consenting, publishing,…) Tools – A challenge area. Of recent note, Pistoia Alliance Mapping effort to make it easier to map ontologies. Calls for Precision Medicine: See: A New Initiative on Precision Medicine Francis S. Collins, M.D., Ph.D., and Harold Varmus, M.D. N Engl J Med 2015; 372: February 26, 2015DOI: /NEJMp National Cancer Institute

4 4/25/2017 +BD2K TCGA = The Cancer Genome Anatomy Project – the project that needed to be done to move precision medicine forward -- comprehensively characterize the genomic and molecular features of ovarian and GBM, expanded to over 20 types. Systematic protocols to generate the data, etc. TARGET = Therapeutically Applicable Research To Generate Effective Treatments (Consortium effort with comprehensive approach to study genomic drivers of childhood cancers, identifying therapeutic targets and prognostic markers) ALCHEMIST = Adjuvant Lung Cancer Enrichment Marker Identification and Sequencing Trials (studying treatments for certain genetic changes in two genes, ALK and EGFR.) COSMIC = Catalogue of Somatic Mutations in Cancer MedDRA = Medical Dictionary of Regulatory Activities CTCAE= Common Terminology Criteria for Adverse Events MATCH = Molecular Analysis for Therapy Choice ICGC = International Cancer Gene Consortium National Cancer Institute

5 Semantics Related Opportunities

6 New Genomic Data Sharing Policy
The new Genomic Data Sharing (GDS) Policy was released in draft form in September 2013 (NOT-OD ) Draft Policy put out in Federal Register for a 60-day public comment period November 2013 public comments collected by the Office of Science Policy. Policy modified with feedback from the IC Directors and NIH GWAS data sharing Governance committees (TSDS, PPDM, SOC) The final Genomic Data Sharing (GDS) Policy was released August (NOT-OD ) NOT-HG Notice on Development of Data Sharing Policy for Sequence and Related Genomic Data

7 Trans-NCI Data Sharing WG
Responsible for the activities necessary for the Institute to implement and maintain the GDS policy framework Develop a plan & recommend any resources needed Propose governance needs Develop and disseminate materials for implementation Focus Areas Data Standards: Define baseline expectations, including data types & timelines Process: Develop processes and resources facilitate implementation and compliance. Resources: Consider all resource needs to implement and oversee policy expectations. Governance: Consider governance needs and procedures for adjudication of implementation issues, and oversight.

8 Extending Genomic Data Sharing Policies
GWAS Policy GDS Policy Scope Applies to human GWAS data Applies to all genomic data types, human and non-human Consent Standard -- Existing* Collections *Before the effective date of the GDS policy If research consent, IRB reviews for consistency. If no research consent exists, data may still be submitted to NIH databases. Same Consent Standard – Future* Collections *After the effective date of the GDS policy N/A Samples or cell lines should be consented for research use and broad data sharing. Exceptions can be requested. Data Submission Data submitted as soon as quality control procedures are completed Timelines vary by data type, but generally as soon quality control procedures are complete Data Release Immediate data release. 12 month publication embargo 6 month deferral of data release. No publication embargo Source: Elizabeth Gillanders, Ph.D., for NCAB Informatics WG , September 26th, 2014 GDS Policy expects (with exceptions) explicit consent for research use for materials collected after policy’s effective date. The GDS Policy is applicable to any NIH-funded research project involving non-human organisms or human specimens that generates genomic, metagenomic, epigenomic, or transcriptomic data. Quality control procedures include data cleaning – so from date data cleaning is completed.

9 4/25/2017 New NCI MATCH TRIAL "Precision Medicine uses genetic information from a person’s cancer to determine a patient’s treatment with a treatment targeted to that particular genetic abnormality."   MATCH is one of several NCI precision medicine initiatives: The initial set of trials will focus on different questions: (1) Exceptional Responders Initiative—why do a minority of patients with solid tumors or lymphoma respond very well to some drugs even if the majority do not?; (2) NCI MATCH trial—can molecular markers predict response to targeted therapies in patients with advanced cancer resistant to standard treatment?; (3) ALCHEMIST trial—will targeted epidermal growth factor receptor (EGFR) and anaplastic lymphoma kinase (ALK) inhibitors improve survival for adenocarcinoma of the lung in the adjuvant setting? National Cancer Institute

10 4/25/2017 NCI MATCH trial Question: Can molecular markers predict response to targeted therapies in patients with advanced cancer resistant to standard treatment? Biopsies from tumors from up to 3,000 patients to undergo DNA/RNA extraction; assay workflow to identify actionable mutations. ECOG-ACRIN leading study with NCI; Multiple arms, matching particular molecular profile to specific available drugs. Objectives: Assess response and time to progression based on tumor profile, regardless of tumor origin. See: Seminars in Oncology, Vol 41 No 3, June 2014, pp Up to 3000 patients from sites participating in NCTN network. Need to process and sequence many tumor biopsies, and match people to multiple trials / arms. 11-14 day Workflow assay process. Tumors from a patient may need to be sequenced multiple times to study genomic changes at progression. At progression, patient could move to a different arm. Pediatric MATCH still in development. (To be led by Children’s Oncology Group) National Cancer Institute

11 TCGA History About three years post-Human Genome Project – Large scale tumor profiling in a systematic way. Initiated in 2005, pilots 2006, extend 2009 Collaboration of NHGRI and NCI to examine GBM, Lung and Ovarian cancer using genomic techniques in 2006. Expanded to 20+ tumor types Began with Ovarian. Lung and GBM – extended to tumor types (33?). Most have sequenced exomes at this point. Our ability to understand the many mechanisms of gene regulation, protein maturation, and the ability to have data to support systems biology. The tools, techniques, and our ability to analyze these data have changed immensely since the beginning of TCGA in Our knowledge of cancer and the kinds of questions we want to ask, and can ask have also changed.

12 TCGA Drivers Provide high quality reference sets for 20+ tissue types
Provide a platform for systems biology and hypothesis generation Provide a test bed for understanding the real world implications of consent and data access policies on genomic and clinical data. Now, data collection over, but MANY users and many pan cancer and other papers. (>2700) Kinds of questions we want to ask and CAN ask have changed and grown.

13 We now understand underlying basic and cancer biology due to the human genome project and the technologies emerging from it. TCGA activities and analyses were built upon the success of the HGP

14 Genomic Data Commons (GDC)
In transition from The Cancer Genome Atlas (TCGA) to GDC, a Commons to host TCGA, TARGET and other future genomic data sets University of Chicago and NCI collaborating to initiate the Genomic Data Commons (GDC), (Robert Grossman, Dir) To enable any researcher to test their ideas, to bring their analytics to the data. From: Transforming Cancer Research: The Genomic Data Commons Posted on December 2, 2014 by Kevin Jiang in At the Bench Now, a wealth of data: “However, this wealth of data has come with limitations. These data are gathered by different research groups, with different technologies and protocols. They’re stored in different locations, using different software and management systems. They’re complex and just plain huge. A cancer researcher would need millions of dollars, several years and a dedicated team to set up the infrastructure necessary to analyze these datasets. Just downloading can take months. This has impeded research at all but the largest groups and institutions, and has stymied collaboration.

15 NCI Cancer Genomics Data Commons
. . . Genomic + clinical data GDC Cancer information donor NCI Genomics Data Commons

16 NCI Genomic Data Commons
4/25/2017 NCI Genomic Data Commons Unified repository for cancer genomics data Accept from both NCI Center for Cancer Genomics (CCG) and external projects Including submissions from small laboratories Unifying repository for cancer genomics data Perform reproducible, consistent bioinformatics pipelines to generate standard higher-level data (e.g., tumor variant calls) Pipelines designed and updated with community input to represent the best practices of the field The availability of genomic data will make it possible for researchers to better classify disease. NCI News Note NCI establishes Genomic Data Commons to facilitate identification of molecular subtypes of cancer and potential drug targets Posted: December 2, 2014 The GDC will facilitate access to data generated by many existing and forthcoming NCI programs The GDC will be built out over a number of years to ensure that results of individual projects can be combined to create broadly useful and accessible datasets and will be operated with funding from NCI to the University of Chicago under a subcontract from Leidos Biomedical Research at the Frederick National Laboratory for Cancer Research. NCI’s Center for Cancer Genomics is establishing the data service with the assistance of NCI’s bioinformatics and cloud research program in the NCI Center for Biomedical Informatics and Information Technology. Re: Consistent pipelines – importance of being able to specify all the software tools, data, parameters, and compute environment so that it will run the same way everytime for each set of input data to get a particular output product. National Cancer Institute

17 GDC Context From: Mark Jensen GDC

18 GDC ConOps From: Mark Jensen, GDC

19 Clinical Data at GDC Key issues: Ideal:
4/25/2017 Clinical Data at GDC Key issues: Low barriers to data submission Minimal number of required data elements Ongoing curation and semantic assignment Balance acceptance of submitter-provided semantic information with GDC curation Provide cross-project searches over clinical data elements to filter genomic data Allow users acquire data intuitively, but also provide semantic sources and IDs as available Ideal: Expose clinical data intuitively, but manage with rigorous semantic information National Cancer Institute

20 Cancer Genome Cloud Pilots
4/25/2017 Cancer Genome Cloud Pilots Three pilots, initiated Fall 2014, to be public "cancer knowledge clouds" in which data repositories would be co-located with advanced computing resources. Broad Institute, UCSC, UC Berkeley ISB-led team, Google, SRA Seven Bridges Genomics Begin piloting components and gathering feedback required by Jan 2016 Could be a template(s) for hosting public- multi-omics data. To host TCGA plus other optional data (e.g Genomes). National Cancer Institute

21 Cancer Genome Cloud Pilots
Goals: democratize access to large-scale data repositories and computational infrastructure co-locate data and compute to minimize unnecessary data transfer integrate public and private datasets allow web-based exploration of hosted data transform and accelerate collaborative cancer research Broad Team: (1) The Broad; (2) UCSC; (3) UC Berkeley ISB (Institute for Systems Biology) Team: (1) ISB; (2Google; (3) SRA Seven Bridges Genomics Team: 90 people team HQ Cambridge, MA – and London and Belgrade Developing an open standard for reproducible genomic pipelines rabix.org).

22 Cancer Genome Cloud Pilots
People can register at any or all of these sites, if they are interested in getting involved: Seven Bridges ‪cancergenomicscloud.org  Broad Firecloud.org Institute for Systems Biology cgc.systemsbiology.net  Broad Team: (1) The Broad; (2) UCSC; (3) UC Berkeley ISB (Institute for Systems Biology) Team: (1) ISB; (2Google; (3) SRA Seven Bridges Genomics Team: 90 people team HQ Cambridge, MA – and London and Belgrade Developing an open standard for reproducible genomic pipelines rabix.org).

23

24 Precision Medicine Opportunities involve Semantics
The era of precision medicine and precision oncology is predicated on the integration of research, care, and molecular medicine and the availability of data for modeling, risk analysis, and optimal care Warren Kibbe The promise of precision medicine will only be fully realized if the research community can adapt its clinical trials methodology to study molecularly characterized tumors instead of the traditional histologic classification. Abrams et al, National Cancer Institute's Precision Medicine Initiatives for the New National Clinical Trials Network, 2014

25 Semantic Opportunities: Heard from this meeting and beyond
4/25/2017 Semantic Opportunities: Heard from this meeting and beyond Imaging Pathology Imaging ontology gaps - terms/formal defs to characterize histopathology images and algorithms. NLP effort to automate image annotation with ontologies to create metadata for large image collections by training classifiers. QHIO- terms/relationships whole lifecycle of images Proteomics, Chris Kinsinger, CPTAC – better clinical biospecimen annotation Cancer Phenotypes Cohorts/ finding patients Cancer Pathology Protocol changes Modeling tumor micro environments – integration of multiscale cancer data –effort to model cancer state as an ecological problem Cancer classification Data Needs vs Ontological Classification Pan Cancer analyses can be improved using DO (Hive) Rebecca Crowley - Precise phenotype information is needed to advance translational cancer research, particularly to unravel the effects of genetic, epigenetic, and othe factors on tumor behavior and responsiveness. Examples of phenotypic variables in cancer include: tumor morphology (e.g. histopathologic diagnosis), co-morbid conditions (e.g. associated immune disease), laboratory findings (e.g. gene amplification status), specific tumor behaviors (e.g. metastasis) and response to treatment (e.g. effect of a chemotherapeutic agent on tumor). eMERGE - eMERGE is a national network organized and funded by the National Human Genome Research Institute (NHGRI) that combines DNA biorepositories with electronic medical record (EMR) systems for large scale, high-throughput genetic research in support of implementing genomic medicine. Ilya Golderge: urgent need to provide set of terms and formal definitions necessary to characterize both the histopathological images and the algorithms that operate on them.” National Cancer Institute

26 Semantic Opportunities (2): Heard from this meeting and beyond
4/25/2017 Semantic Opportunities (2): Heard from this meeting and beyond Tools/ Resources/Standards Getting usable, effective, efficient software into peoples hands will increase uptake of semantically well described metadata, terms and ontologies, and better integration of metadata and terminology Integrated use of a variety of ontologies Ways to manage research and clinical data streams, bridge Tools to help harmonize/ use/ metadata and terminology Provenance – use of checklists early on. Bottom up. Research Commons Rebecca Crowley - Precise phenotype information is needed to advance translational cancer research, particularly to unravel the effects of genetic, epigenetic, and othe factors on tumor behavior and responsiveness. Examples of phenotypic variables in cancer include: tumor morphology (e.g. histopathologic diagnosis), co-morbid conditions (e.g. associated immune disease), laboratory findings (e.g. gene amplification status), specific tumor behaviors (e.g. metastasis) and response to treatment (e.g. effect of a chemotherapeutic agent on tumor). eMERGE - eMERGE is a national network organized and funded by the National Human Genome Research Institute (NHGRI) that combines DNA biorepositories with electronic medical record (EMR) systems for large scale, high-throughput genetic research in support of implementing genomic medicine. Ilya Golderge: urgent need to provide set of terms and formal definitions necessary to characterize both the histopathological images and the algorithms that operate on them.” National Cancer Institute

27 Thank you Sherri de Coronado decorons@mail.nih.gov
4/25/2017 Thank you Sherri de Coronado Thanks to content contributors: Gilberto Fragoso, Mark Jensen, Warren Kibbe, Juli Klemm, Elizabeth Gillanders and others. National Cancer Institute

28 4/25/2017 National Cancer Institute


Download ppt "Importance of Semantics in Precision Oncology at NCI"

Similar presentations


Ads by Google