An Integrated and Comprehensive Data Mining System for Studying Environmental Impact of Nanomaterials: NEIMiner Nano Working Group Presentation 10/13/2011.

Slides:



Advertisements
Similar presentations
List of the “bullet points” in the first FP7 calls for proposals
Advertisements

Kensington Oracle Edition: Open Discovery Workflow Meets Oracle 10g Professor Yike Guo.
Workshop Highlights Relevant to Nano WG High Quality Data Sets - There is a critical need for high quality data sets as input to predictive models. This.
Canada/Australia Issues being faced in the regulation of nano-materials Deborah Willcocks – Department of Health and Ageing, Government of Australia Anne-Marie.
ONAMI’s - Safer Nanomaterials and Nanomanufacturing Initiative Recommendations for the FDA Nanotechnology Task Force Stacey Harper.
SOA Architecture Delivery Process by Dr. Robert Marcus SRI International 1100 Wilson Boulevard Arlington, VA
Mining data with PolyAnalyst © 1999 Megaputer intelligence, Inc. learn to profit from data.
MS DB Proposal Scott Canaan B. Thomas Golisano College of Computing & Information Sciences.
1 BrainWave Biosolutions Limited Accelerating Life Science Research through Technology.
Business Intelligence Dr. Mahdi Esmaeili 1. Technical Infrastructure Evaluation Hardware Network Middleware Database Management Systems Tools and Standards.
Overview Nano WG The National Cancer Institute (NCI) caBIG® Nanotechnology Working Group (Nano WG) Jessica M. Adamick 1, Nathan A. Baker 2, Alan R. Chappell.
DASHBOARDS Dashboard provides the managers with exactly the information they need in the correct format at the correct time. BI systems are the foundation.
1 Shannon M. Lloyd U.S. EPA 2004 Nanotechnology Science to Achieve Results (STAR) Progress Review Workshop – Nanotechnology and the Environment II Philadelphia,
LÊ QU Ố C HUY ID: QLU OUTLINE  What is data mining ?  Major issues in data mining 2.
ICONN DEEWR Nanotechnology OHS Research and Development Program & Nanotechnology OHS Regulation Dr Howard Morris Nanotechnology OHS R&D Program.
Data Warehouse Fundamentals Rabie A. Ramadan, PhD 2.
What is Business Intelligence? Business intelligence (BI) –Range of applications, practices, and technologies for the extraction, translation, integration,
JumpStart the Regulatory Review: Applying the Right Tools at the Right Time to the Right Audience Lilliam Rosario, Ph.D. Director Office of Computational.
Shilpa Seth.  What is Data Mining What is Data Mining  Applications of Data Mining Applications of Data Mining  KDD Process KDD Process  Architecture.
SharePoint 2010 Business Intelligence Module 6: Analysis Services.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Understanding Data Analytics and Data Mining Introduction.
1 Beyond California Water Plan Update 2005 California Water and Environmental Modeling Forum Annual Meeting, March 3 rd, 2005.
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
Nanomaterial Registry: Minimal Information Standards for Well-Characterized Nanomaterials in Environmental and Biological Studies Michele L. Ostraat April.
Applied Educational Systems ( or ) Explorations and Foundations in Technology tech center 21 Explorations and Foundations in Technology.
GEM/IRDR Social Vulnerability and Resilience Information System and Metadata Portal IRDR Scientific Board Meeting Chengdu 03/11/2012.
Nanomaterials Issue Paper Standard 61 Joint Committee Meeting December, 2013.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Student : Sheng-Hsuan Wang Department.
Nano WG 12 March Why in the world do we need a nanomaterials description system? – How CODATA and VAMAS answers that question Co-Chairs John Rumble.
Web-site:
The NCIP Nanotechnology Working Group Nano WG Fall 2013 Kick-Off.
Chapter 1 Introduction to Data Mining
1 Data Integration Community of Practice Meeting September 15, 2009 Science Data Integration.
World Congress on Safety and Health at Work Korea Promoting Safe Use of Nanotechnologies in Australian Workplaces: Nanotechnology OHS Research and.
(e)Business Process Management easyREMOTE DWH © Josef Schiefer, IBM Watson Process Warehousing Unified Business Framework... in concert.
Quality by Design (QbD) Myth : An expensive development tool ! Fact : A tool that makes product development and commercial scale manufacturing simple !
Minor Thesis A scalable schema matching framework for relational databases Student: Ahmed Saimon Adam ID: Award: MSc (Computer & Information.
New or increasing occupational exposure to chemical and biological agents Gérard Lasfargues Deputy Director General, Anses.
Data Warehousing Data Mining Privacy. Reading Bhavani Thuraisingham, Murat Kantarcioglu, and Srinivasan Iyer Extended RBAC-design and implementation.
CaNanoLab Users Group February 2012 Use of Informatics to Expedite and Validate the Application of Nanotechnology in Biomedicine.
Center for the Environmental Implications of Nanotechology Program Solicitation NSF Preliminary Proposal Due: December 10, 2007 National Science.
Decision Support and Date Warehouse Jingyi Lu. Outline Decision Support System OLAP vs. OLTP What is Date Warehouse? Dimensional Modeling Extract, Transform,
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.
Key Research Questions: The University of Wisconsin – Madison Nanoscale Science and Engineering Center Social, Legal and Environmental Impacts of Engineered.
BOĞAZİÇİ UNIVERSITY DEPARTMENT OF MANAGEMENT INFORMATION SYSTEMS MATLAB AS A DATA MINING ENVIRONMENT.
Copyright © 2011 by ABET, Inc. and TMS 1 December 2, 2008 ABET Update UMC Meeting April 6, 2015 San Francisco, CA Chester J. Van Tyne
Metadata By N.Gopinath AP/CSE Metadata and it’s role in the lifecycle. The collection, maintenance, and deployment of metadata Metadata and tool integration.
1 OSHA’s Approach to Nanotechnology: Developing a Searchable "Health Effects Matrix" Database for Nanomaterials Utilizing Existing Published Data Janet.
1 Nanoscale Materials Stewardship Program Environmental Summit May 20, 2008 Jim Alwood Chemical Control Division Office of Pollution Prevention and Toxics.
Advanced Database Concepts
1 1 EPA Nanotechnology Research Program – LCA Considerations Jeff Morris National Program Director for Nanotechnology 5 November 2009.
1 Mining Images of Material Nanostructure Data Aparna S. Varde, Jianyu Liang, Elke A. Rundensteiner and Richard D. Sisson Jr. ICDCIT December 2006 Bhubaneswar,
Environmental Risk Assessment of Engineered Nanomaterials
APPLICATION OF ONTOLOGIES IN CANCER NANOTECHNOLOGY RESEARCH Faculty of Engineering in Foreign Languages 1 Student: Andreea Buga Group: 1241E – FILS Coordinating.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Health and environmental impacts of nanoparticles: too early for a risk assessment framework? Prof Jim Bridges Emeritus Professor of Toxicology and Environmental.
Nanosafety ISO TC 229 Nanotechnologies Standardization in the field of nanotechnologies that includes either or both of the following:  1. Understanding.
Grant n° FP  Duration: 3 years  Type: Collaborative research project  Funding: ◦ Budget: € ◦ Funding: € ◦
BUSINESS INTELLIGENCE. The new technology for understanding the past & predicting the future … BI is broad category of technologies that allows for gathering,
A Semi-Automated Digital Preservation System based on Semantic Web Services Jane Hunter Sharmin Choudhury DSTC PTY LTD, Brisbane, Australia Slides by Ananta.
In The Name Of Allah Scope  Networking researchers, experts, research institutes and industrial unites which are active in the field of EHS  Determining.
Business Intelligence Overview
Kelci J. Miclaus, PhD Advanced Analytics R&D Manager JMP Life Sciences
Introduction C.Eng 714 Spring 2010.
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
پیشگیری و کنترل خطرات بهداشتی و ایمنی نانومواد
Data Warehousing and Data Mining
Resources for Teaching Nanoscience Across the Geoscience Curriculum
Presentation transcript:

An Integrated and Comprehensive Data Mining System for Studying Environmental Impact of Nanomaterials: NEIMiner Nano Working Group Presentation 10/13/2011 Kaizhi Tang, Ph.D., David Mihalcik, Thomas Wavering, Roger Xu Intelligent Automation Inc Prof. Stacey Harper, OSU Sue Pan, SAIC Sponsor Agency: Dr. Jeff Steevens, Army ERDC

Outline Motivation and proposed approach NEI modeling framework Design of NEIMiner information system NEIMiner

Motivation and proposed approach of NEIMiner NEED: To reduce the risk of nanomaterials in military use, NM environmental impact analysis requires a comprehensive NEI modeling framework, centralized NEI database, powerful model discovering tool and integrated model composition strategy. KEY COMPONENTS OF THE PROPOSED APPROACH Flexible data integration based on the ETL (Extract, Transform, Load) strategy of data warehouse. Integrated and collaborative data management utilizing modern content management system Optimized data mining process with many algorithms and parameters with huge computational burden Flexible model composition based on unified model abstraction reusing FRAMES DELIVERABLES Conceptual framework of NEI analysis Collaborative NEI information system with model discovery and composition capability VALUE TO THE CUSTOMER /TRANSITION CUSTOMER Environmental impact estimation tool for nanomaterials Easy access to large amount of NEI data in a centralized data warehouse and the available model generation tool Potentially useful evaluation models of NEI

Collaboratory of Structural Nanobiology NEI Data NEI Data Mining Models Scope of NEI Modeling

NEIMiner System Architecture NEI Data NEI Data Mining Models

Available NEI Data and Schemas Nanomaterial-Biological Interactions Knowledgebase – Cancer Nanotechnology Laboratory portal (caNanoLab) – NCI, ICON: International Council on Nanotechnology – Rice University, Nano-Tab – tab-delimited spreadsheet type based on EBI and ISA-TAB NanoParticle Ontology(NPO) – Implemented in OWL Most complete characterization capture Largest number of publications, limited characterization capture Wide range of characterization and health impact data Most complete characterization capture Largest number of publications, limited characterization capture

Other Data and Schemas OECD Database on Research into Safety of Manufactured Nanomaterials – National Institute for Occupational Safety and Health (NIOSH) – SAFENANO - Institute of Occupational Health (UK) – University of Wisconsin - Madison: Nanoscale Science and Engineering Center – National Reference Center for Bioethics Literature - Georgetown University, Kennedy Institute of Ethics – Nanomedicine Research Portal – Center on Nanotechnology and Society (Chicago-Kent College of Law in the Illinois Institute of Technology) –

Data Extraction Methods Data extraction via web services – Example: caNanoLab Data extraction via web scraping – Examples: ICON, NBI – Approaches Human copy-and-paste HTTP programming Text grepping and regular expression matching HTML parsers

Design philosophy of NEI data Warehouse Data Warehouse – Centralized data from multiple data sources for analysis => multiple nano risk related data sources with different formats – Consists of an ETL tool, a Database, a Reporting tool, Data Modeling => tools useful for NM data integration and mining – Subject oriented data organization => risk assessment for nano materials – Multi-dimensional => various nanomaterial properties – Star schema => extendible schema design

NEI Model Discovery Physical properties Material Type Particle size distribution PDI Shape Structure Chemical properties Surface reactivity Surface charge Water solubility Exposure and Study scenario Duration Continuity Exposure route Number of nanoparticles Number of ligands Biological Properties Species, age, gender, weight Environmental ecosystem response Fate and transport Bioavailability and uptake Biomagnificiation Biological response Genomic response Cell death Correlation? Prediction?

Interesting Mining Problems and Solutions How to handle missing data – Median on numerical values – Median-frequency categories – Classification or regression using existing data How to determine attribute significance – Compare gain ratio for classification – Compare relief ratio for numerical prediction How to select algorithms and their parameters for training – Meta-optimization on algorithms and parameters How to split the data sets for high-quality models – Comparing various splitting strategies – Clustering as a preprocessing step

Demonstration of NEIMiner 12