Exploring Strategies For Optimizing Knowledge Derivation From Imagery

Slides:



Advertisements
Similar presentations
Designing Services for Grid-based Knowledge Discovery A. Congiusta, A. Pugliese, Domenico Talia, P. Trunfio DEIS University of Calabria ITALY
Advertisements

EHarmony in Cloud Subtitle Brian Ko. eHarmony Online subscription-based matchmaking service Available in United States, Canada, Australia and United Kingdom.
R and HDInsight in Microsoft Azure
4.1.5 System Management Background What is in System Management Resource control and scheduling Booting, reconfiguration, defining limits for resource.
A Java Architecture for the Internet of Things Noel Poore, Architect Pete St. Pierre, Product Manager Java Platform Group, Internet of Things September.
A Study in NoSQL & Distributed Database Systems John Hawkins.
This presentation was scheduled to be delivered by Brian Mitchell, Lead Architect, Microsoft Big Data COE Follow him Contact him.
U.S. Department of the Interior U.S. Geological Survey David V. Hill, Information Dynamics, Contractor to USGS/EROS 12/08/2011 Satellite Image Processing.
Ch 4. The Evolution of Analytic Scalability
Tyson Condie.
` tuplejump The data engineering platform. A startup with a vision to simplify data engineering and empower the next generation of data powered miracles!
From the Desktop to the Cloud Leveraging Hybrid Storage Architectures In Your Repository David Tarrant, Tim Brody.
CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.
Face Detection And Recognition For Distributed Systems Meng Lin and Ermin Hodžić 1.
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. HP Restricted. For HP.
The 2000 Decennial Census School District Project: Using Census Data for the School District Mapping System **** Development and Implementation Tai A.
| nectar.org.au NECTAR TRAINING Module 3 Common use cases.
The Future of the iPlant Cyberinfrastructure: Coming Attractions.
Data Mining Algorithms for Large-Scale Distributed Systems Presenter: Ran Wolff Joint work with Assaf Schuster 2003.
1 Removing Sequential Bottlenecks in Analysis of Next-Generation Sequencing Data Yi Wang, Gagan Agrawal, Gulcin Ozer and Kun Huang The Ohio State University.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Wednesday, March 29, 2000.
+ Big Data IST210 Class Lecture. + Big Data Summary by EMC Corporation ( More videos that.
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.
Stairway to the cloud or can we take the highway? Taivo Liik.
From the Desktop to the Cloud Leveraging Hybrid Storage Architectures In Your Repository David Tarrant, Tim Brody.
Big traffic data processing framework for intelligent monitoring and recording systems 學生 : 賴弘偉 教授 : 許毅然 作者 : Yingjie Xia a, JinlongChen a,b,n, XindaiLu.
| nectar.org.au NECTAR TRAINING Module 4 From PC To Cloud or HPC.
Computer Architecture Lecture 26 Past and Future Ralph Grishman November 2015 NYU.
SUPPLY CHAIN OF BIG DATA. WHAT IS BIG DATA?  A lot of data  Too much data for traditional methods  The 3Vs  Volume  Velocity  Variety.
Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies
| nectar.org.au NECTAR TRAINING Module 3 Common use cases.
Axis AI Solves Challenges of Complex Data Extraction and Document Classification through Advanced Natural Language Processing and Machine Learning MICROSOFT.
Big Data Analytics with Excel Peter Myers Bitwise Solutions.
Smart Grid Big Data: Automating Analysis of Distribution Systems Steve Pascoe Manager Business Development E&O - NISC.
Dato Confidential 1 Danny Bickson Co-Founder. Dato Confidential 2 Successful apps in 2015 must be intelligent Machine learning key to next-gen apps Recommenders.
An Introduction To Big Data For The SQL Server DBA.
Best Practices for Managing Processed Ortho Imagery Cody A. Benkelman.
Sharing Maps and Layers to Portal for ArcGIS Melanie Summers, Tom Shippee, Ty Fitzpatrick.
Leverage Big Data With Hadoop Analytics Presentation by Ravi Namboori Visit
BUILD BIG DATA ENTERPRISE SOLUTIONS FASTER ON AZURE HDINSIGHT
Accessing the VI-SEEM infrastructure
Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Parallel Hardware Dr. Xiao Qin Auburn.
MapReduce Compiler RHadoop
LECTURE 01: Introduction to Algorithms and Basic Linux Computing
Apache Spot (Incubating)
Docker Birthday #3.
GWE Core Grid Wizard Enterprise (
Spark Presentation.
Trial.iO Makes it Easy to Provision Software Trials, Demos and Training Environments in the Azure Cloud in One Click, Without Any IT Involvement MICROSOFT.
So far we have covered … Basic visualization algorithms
Operationalize your data lake Accelerate business insight
Future Data Architecture Cloud Hosting at USGS
Designed for Big Data Visual Analytics, Zoomdata Allows Business Users to Quickly Connect, Stream, and Visualize Data in the Microsoft Azure Platform MICROSOFT.
Scalable SoftNAS Cloud Protects Customers’ Mission-Critical Data in the Cloud with a Highly Available, Flexible Solution for Microsoft Azure MICROSOFT.
INFO 344 Web Tools And Development
CS110: Discussion about Spark
Ch 4. The Evolution of Analytic Scalability
Overview of big data tools
What's New in eCognition 9
AWS Cloud Computing Masaki.
Course Introduction CSC 576: Data Mining.
Overview of Workflows: Why Use Them?
Big DATA.
What's New in eCognition 9
Rohan Yadav and Charles Yuan (rohany) (chenhuiy)
What's New in eCognition 9
Designing and Using Cached Map Services
L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher
Best Practices for Managing Processed Ortho Imagery
Presentation transcript:

Exploring Strategies For Optimizing Knowledge Derivation From Imagery Dan Getman Geospatial Big Data Solutions DigitalGlobe 9/20/2016 Location Powers

Big Data, Take One 70 petabytes of satellite imagery ~30 gigabytes per satellite image Billion dollar satellites Covering the globe annually Expensive… Slow… Heavy… Big… Data 70 PB is a lot: big tech companies like Facebook have order of magnitude more data. But our data is HEAVY. 30 gigs per image. we’re not talking tweets that you can map and reduce all over the place with traditional big data systems. We’ll need new approaches. This is the opposite of realtime, fast, scalable, & cheap. This is old school DG

Big Data, Take Two All the Buildings, Planes, Cars, Boats, Roads, and Things All the Changes What does it all Mean… It means coordinating between Heavy/Slow Big Data and Light/Fast Big Data It means making image analysis faster, more dynamic, and less intrusive to Knowledge Discovery All that being said, we need to be ready to support the rest of the big data folks Folks who don’t really care about imagery, they just want the knowledge This is new school DG Important to note that we are dealing with many of the problems that plague big data analysis

Analysis Use Cases I care about spectral integrity vs. I would be just as happy with a JPEG I need the whole image strip vs. Why would I want more than a chip? I want to have the image forever and ever vs. I don’t want to see the image at all, just the answer please

All Trying to Accomplish the Same Thing: Make it fast Make it flexible Make it integrate with Data Science http://www.biografiasyvidas.com/monografia/einstein/fotos/einstein_rel.jpg

Analysis Paradigms (a few anyway) Data Provider Cloud Based Organized Image Store Cloud Based Scalable Compute Cloud Based WMS or Tile Store Cloud Based Scalable Compute

Analysis Paradigms at DigitalGlobe Cloud Based Image Processing Framework Image Catalog Provider Defined Processing User Defined Processing Scalable Compute Cloud Based Dynamic Tile Creation Object Based Image Store Provider Defined Processing User Defined Processing Cloud Based Scalable Compute

Analysis Paradigms: Whole Image Cloud Based Image Processing Framework Image Catalog Provider Defined Processing User Defined Processing Scalable Compute Analyst never touches or purchases imagery, just information Analyst can run their own algorithms or anyone else's Leverages compute size needed for each process Parallelized on nodes and through data distribution across nodes Configured for processing at the state, national, continental scale Configured for processing all imagery that meets certain specifications as it is collected

Analysis Paradigms: Whole Image Raw Image Starts with “Raw” Tile Orthorectify Compensate for Atmosphere Pan Sharpen User Defined Process Specified Through the API User Defined Function Other Standard Function Output can be imagery, vector, tabular data User Defined Function Create Output Rest Endpoint

We Get Captured at Different Times

We All Run In Parallel As Data Arrives

Analysis Paradigms: Parts and Pieces Cloud Based Dynamic Tile Creation Object Based Image Store Provider Defined Processing User Defined Processing Cloud Based Scalable Compute Chips are sized to significantly reduce compute costs No attached storage, just memory Object store => Highly Parallel data access (no compute needed) Deferred Processing => Highly flexible analysis and significantly reduced storage Restful => Easily integrated into Data Science/Big Data Paradigms

Analysis Paradigms: Parts and Pieces Raw Tile Obj Starts with “Raw” Tile Orthorectify Compensate for Atmosphere Pan Sharpen User Defined Process Specified Through the API User Defined Function Other Standard Function Output can be imagery, vector, tabular data User Defined Function Create Output Rest Endpoint

Analysis Paradigms: Parts and Pieces Raw Tile Obj Orthorectify Highly Available Super-high Bandwidth Super-high Parallelization Accessible at rest without compute/indexing in front Cost Effective Low/No Storage Processing on Very Small Instances Random Access Full-fidelity data Compensate for Atmosphere Pan Sharpen Other Standard Function User Defined Function Create Output Rest Endpoint

Analysis Paradigms: Parts and Pieces and ML Vector Store Model Training API For Streaming Compute Object Store (Imagery)

(format optimized for interpretation) Analysis Paradigms: Parts and Pieces and ML Vector Store Model Training Visualize Imagery (format optimized for interpretation) API For Streaming Compute Object Store (Imagery)

Analysis Paradigms: Parts and Pieces and ML Vector Store Select Training Areas Model Training API For Streaming Compute Object Store (Imagery)

Access Imagery (format optimized for model training) Analysis Paradigms: Parts and Pieces and ML Vector Store Train Samples (user defined model) Model Training Access Imagery (format optimized for model training) API For Streaming Compute Object Store (Imagery)

Compute and Visualize Detections in Selected Imagery. Repeat. Analysis Paradigms: Parts and Pieces and ML Vector Store Compute and Visualize Detections in Selected Imagery. Repeat. Model Training API For Streaming Compute Object Store (Imagery)

Analysis Paradigms: Parts and Pieces and ML This paradigm allows feature extraction and machine learning on imagery to be more easily integrated into existing big data and data science methodologies. Now that the data is smaller and more targeted, we can map and reduce it all we want. We can call imagery and knowledge from imagery directly as a service Rather than determining which strips to order and calling a sales rep, etc.

Summary of Investigations: Looking at future of data science and big data analysis to determine where we can meet it halfway Finding ways to ensure that getting data out of imagery is a fundamental tool of big data analysis without slowing that science down Balancing traditional image science and big data science so each can feed the other SpaceNet is an excellent example of this (https://aws.amazon.com/public-data-sets/spacenet/)

Any Questions So Far?