Deep Learning with Botanical Specimen Images

Slides:



Advertisements
Similar presentations
Nathan Wiebe, Ashish Kapoor and Krysta Svore Microsoft Research ASCR Workshop Washington DC Quantum Deep Learning.
Advertisements

Artificial Neural Networks (ANNs)
Redaction: redaction: PANAKOS ANDREAS. An Interactive Tool for Color Segmentation. An Interactive Tool for Color Segmentation. What is color segmentation?
Introduction to machine learning
LLNL-PRES This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.
Case Studies Dr Lee Nung Kion Faculty of Cognitive Sciences and Human Development UNIVERSITI MALAYSIA SARAWAK.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
School of Engineering and Computer Science Victoria University of Wellington Copyright: Peter Andreae, VUW Image Recognition COMP # 18.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
CSE 5331/7331 F'07© Prentice Hall1 CSE 5331/7331 Fall 2007 Machine Learning Margaret H. Dunham Department of Computer Science and Engineering Southern.
Artificial Intelligence, Expert Systems, and Neural Networks Group 10 Cameron Kinard Leaundre Zeno Heath Carley Megan Wiedmaier.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
Kim HS Introduction considering that the amount of MRI data to analyze in present-day clinical trials is often on the order of hundreds or.
Network Management Lecture 13. MACHINE LEARNING TECHNIQUES 2 Dr. Atiq Ahmed Université de Balouchistan.
DATA MINING and VISUALIZATION Instructor: Dr. Matthew Iklé, Adams State University Remote Instructor: Dr. Hong Liu, Embry-Riddle Aeronautical University.
Introduction to Machine Learning, its potential usage in network area,
Comparing TensorFlow Deep Learning Performance Using CPUs, GPUs, Local PCs and Cloud Pace University, Research Day, May 5, 2017 John Lawrence, Jonas Malmsten,
CSE 4705 Artificial Intelligence
Big data classification using neural network
CS 9633 Machine Learning Support Vector Machines
Machine Learning for Computer Security
Semi-Supervised Clustering
Convolutional Neural Network
Applying Deep Neural Network to Enhance EMPI Searching
Artificial Intelligence, P.II
The Relationship between Deep Learning and Brain Function
Histograms CSE 6363 – Machine Learning Vassilis Athitsos
Deep Learning Amin Sobhani.
Sentence Modeling Representation of sentences is the heart of Natural Language Processing A sentence model is a representation and analysis of semantic.
An Artificial Intelligence Approach to Precision Oncology
Intro to Machine Learning
School of Computer Science & Engineering
DeepCount Mark Lenson.
Goodfellow: Chap 1 Introduction
Classification: Logistic Regression
Alan D. Mead, Talent Algorithms Inc. Jialin Huang, Amazon
Deep Learning Insights and Open-ended Questions
Applications of Deep Learning and how to get started with implementation of deep learning Presentation By : Manaswi Advisor : Dr.Chinmay.
Machine Learning I & II.
Introduction to Data Science Lecture 7 Machine Learning Overview
Introductory Seminar on Research: Fall 2017
Machine Learning Basics
Bringing Organism Observations Into Bioinformatics Networks
Artificial Intelligence in Healthcare
Deep Belief Networks Psychology 209 February 22, 2013.
Goodfellow: Chap 1 Introduction
Machine Learning & Data Science
Using Tensorflow to Detect Objects in an Image
State-of-the-art face recognition systems
What is Pattern Recognition?
Basic Intro Tutorial on Machine Learning and Data Mining
Bird-species Recognition Using Convolutional Neural Network
Distributed Representation of Words, Sentences and Paragraphs
Team 2: Graham Leech, Austin Woods, Cory Smith, Brent Niemerski
Big Data.
Similarity based on Shape and Appearance
Pose Estimation for non-cooperative Spacecraft Rendevous using CNN
network of simple neuron-like computing elements
Machine Learning 101 Intro to AI, ML, Deep Learning
Using Tensorflow to Detect Objects in an Image
The Big Health Data–Intelligent Machine Paradox
Zip Codes and Neural Networks: Machine Learning for
Intro to Machine Learning
Presentation By: Eryk Helenowski PURE Mentor: Vincent Bindschaedler
Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton
Machine Learning for Space Systems: Are We Ready?
Machine Learning in Business John C. Hull
Active AI Projects at WIPO
Patterson: Chap 1 A Review of Machine Learning
Presentation transcript:

Deep Learning with Botanical Specimen Images A Voyage into Neural Networks Sylvia Orli orlis@si.edu @sylviaorli Department of Botany National Museum of Natural History

Digitization at US Herbarium 1970 – First digitization initiative 2001 – Images included in digital record 2015 - Digitization through conveyor Summer 2017 - 2.2 million inventory records, 1.4 million specimen images total

What information does a specimen image hold? Image Metadata Collection information Taxonomic determinations Plant material Paper

Partnerships NVIDIA - American technology company based in Santa Clara, California. It designs graphics processing units. Motivated by a desire to apply modern analytical techniques to the large datasets being produced by researchers and staff at various Smithsonian entities, a collaboration among the Botany department at NMNH, the Digitization Program Office, and NVIDIA formed around using deep learning on their collection of digital herbarium sheets. The data science team was invited to participate. The Botany team came up with several potential projects that could benefit from deep learning and we settled on a pilot project aimed at identifying mercury contaminated herbarium specimens. Early results are extremely promising with our best models accurately classifying these images at over 95%. HIPPA SECURE Service 2 Service 3 Service 4

Artificial Neural Networks “a computer system modeled on the human brain and nervous system.” Computational model used in machine learning Used for common every day applications GPU and image resource requirements A “deep learning” complex model is constructed between raw inputs and the resulting outputs Machine learning is a type of artificial intelligence (AI) that provides computers with the ability to learn without being explicitly programmed. Deep learning: is a subfield of machine learning concerned with algorithms inspired by the structure and function of the brain called artificial neural networks. Conventional machine-learning techniques are limited in their ability to process natural data in their raw form. For decades, constructing a pattern-recognition or machine-learning system required careful engineering and considerable domain expertise. The key aspect of deep learning is that these layers of features are not designed by human engineers: they are learned from data using a general-purpose learning procedure. Uses for deep learning / neural nets: Automatic Colorization of Black and White Images, adding sound to silent movies, object classification and detection in photographs, image caption generation, NGA Graphics Processing Unit and large amounts of data! -construct larger neural networks and train them with more and more data, their performance continues to increase. Graphic by Andrew Ng

Deep Learning Can botanical specimen images be analyzed using this model? Bear Bird Bee In deep machine learning, a complex model is constructed between raw inputs and the resulting outputs. The most standard process of performing deep learning is to curate a data set of inputs on which to train the model, say of images of different animals. Once the network is refined and trained, it will give a series of outputs that predict the image category. Often the most difficult part of the process is to gather enough good training data to train the neural net. With the influx of ”big data,” this has become easier, particularly for companies who have access to large amounts of data. What can we learn using neural nets with the botanical specimen images? HIPPA SECURE Service 2 Service 3 Service 4

Mercury contaminants Mercury added to specimens as insecticide Visibility markings show in older specimens Estimated 2-5% of specimens affected Hot spots in herbarium Mercury salt solutions have been used to preserve specimens from as early as 1687 right up until the 1980s. 

Challenges in identifying contaminated samples Requires large sample size Contaminated vs. non-contaminated samples need to be randomly selected Contamination can be unevenly dispersed on specimen Contamination is not always contamination Contamination not always oxidized and visible

NMNH Deep Learning Approach Create neural net model using Mathematica software (Wolfram language) Convolutional neural net Supervised learning Use highly stratified clean / contaminated image set Image set: 70% training, 20% validation and 10% test Convolutional - “inspired by the organization of the animal visual cortex” – emphasize certain layers of the image. In supervised learning, the output datasets are provided which are used to train the machine and get the desired outputs whereas in unsupervised learning no datasets are provided, instead the data is clustered into different classes .

Initial Approach – 2000 images Partition the high-res images into 128x128 px tiles to inflate training dataset Abel Brown – NVIDIA deep learning guru

Entropy Distribution of Labelled Tiles Entropy is usually synonymous with “information content” Entropy distributions for clean and contaminated tiles statistically significant which means there exists separable content • Separation of entropy distributions for labeled tiles is likely good indicator/predictor of successful classifier training

Classifying Tile Samples Alternative strategy: rather than use explicit category from classifier lets use the probability produced by the last softmax layer of the network

Probability of “Clean”

Distribution of mean probability (clean) over all test images Accuracy: 95% Contaminated Accuracy: 92% Creating a model on highly stratified images doesn’t work perfectly with more unclear specimens

Gathering more images 9380 images of contaminated sheets 9383 images of clean sheets Reduce Full Image dimensions to 256 x 256 Use full image instead of tile probability Images dimensions 1000x1400 pixels, 600dpi HIPPA SECURE Service 2 Service 3 Service 4

Entropy Distribution of Full Specimen Images

Image classifier (Confusion Matrix) 92% accuracy in detecting clean vs. contaminated specimens Higher accuracy with further tweaking of neural nets

Distribution of probability (clean) over all test images

Misclassified specimens

Further Deep Learning Uses Plant family differences Species Identification Transcription of specimen labels Collaborations?