Object Recognition & Detection An empFinesseTM Fundamentals Solution
Regions of Interest (ROI) Key Terms Deep Learning Object Recognition Convolutional Neural Network (CNN) Object Detection Custom Vision API Regions of Interest (ROI) Object Classification MS CNTK
Detect like a Human Eye? 04 01 02 03 Locate objects? Recognize object /text in an image / video? Memorize and Train? Recollect / Remember?
Sample Use Cases Detect components of a motherboard of any device Recognize processor type Recognize missing parts Recognize individual employees in an office/ODC Detect in/out of resources just like an human eye Detect Non-employee vehicles in a parking lot Recognize a vehicle without any external tags Trace a car by tracing its entry/exit in each toll Identify defective stage of a product Early detection of defects
Object Detection Approach
About Solution ObjecTell Process an Image Extract ROIs Classify Regions Faster CNN flow http://hcltonazure.cloudapp.net/imagerecognition/Detect.aspx
Technology Perspectives MongoDB Detection UI Search UI Admin UI CNTK Python scripts ObjectTell WCF Service Azure Vision API Connector DAL Connector Objects Recognizer Search Component Dataset Manager Metadata Manager User Query Manager Resource Data Retriever Azure Computer Vision API Service Azure Custom Vision API Service ObjectTell Dataset Train UI Object Marker Usage Dashboard Character Recognizer NLP Core Video Parser Usage Analytics UI Layer Service Layer Data Access Layer Yet to be implemented
Current & Future State Products / Libraries Used Maturity Level MS CNTK 2.3 Library Anaconda3 4.1.1 Python 3.5 MS Custom Vision API Azure Computer Vision API for OCR OpenALPR API Stanford NLP core (yet to integrate) Maturity Level First stage: Training / Learning phase (currently got a decent model for cars dataset, grocery dataset) Second stage: Objects Recognition and Detection– Azure hosted web application is available now. We can improvise this solution to embed within a mobile as an mobile app and user can snap a picture directly through this app and recognize the objects in that picture. Third stage: Self-learning phase - We have to enhance our solution to self learn based on user corrections/suggestions. Future State Recognize other domain / industry related objects Parse video into individual frames and recognize objects / text Support English like query through NLP Embed this feature within a camera Generate Report on number of detections and accuracy
Business Model
38.92 B$ business by 2021 (image Processing) Business Relevance 38.92 B$ business by 2021 (image Processing)
Scalability # Scalability Need Current State Resource Needed 1 Process more images (10,000s) in few hours It takes around 6-9 hours to process 100+ images Deep Learning VM / GPU optimized VMs 2 Improve object Recognition and Detection accuracy Recognizes fully visible objects. If they overlap then accuracy and probability of detecting more objects is 50% 3 Cater to More domains Trained for Grocery and vehicles dataset, it can be extended to other domains as well Need VMs/servers with more hard disk space (>500 GB) 4 Cater to more users Have tested with 5 users – more number of parallel users have impact on detection time Deep Learning VM / GPU optimized VMs and design level change to create different threads for multiple parallel users 5 Cater to multiple customers at same time It can cater to 1 customer users per Azure site instance More HDD space and optimized VMs
Snapshots
Thank You