Project Midterm Presentation

Slides:



Advertisements
Similar presentations
Spike Sorting Goal: Extract neural spike trains from MEA electrode data Method 1: Convolution of template spikes Method 2: Sort by spikes features.
Advertisements

As applied to face recognition.  Detection vs. Recognition.
1 CS1001 Lecture Overview Java Programming Java Programming Midterm Review Midterm Review.
Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning
Introduction CSE 1310 – Introduction to Computers and Programming Vassilis Athitsos University of Texas at Arlington 1.
An Example of Course Project Face Identification.
Introduction CSE 1310 – Introduction to Computers and Programming Vassilis Athitsos University of Texas at Arlington 1.
GEOTHERMAL ENERGY POWERPOINT RESEARCH Research geothermal energy using the websites provided on my website. The research should focus on answering and.
Volunteering on Kids Who Care. Typical tasks Fieldwork support Logistics A lot of driving Managing the field staff Managing the data capturers Field and.
Introduction CSE 1310 – Introduction to Computers and Programming Vassilis Athitsos University of Texas at Arlington 1.
NOTE: To change the image on this slide, select the picture and delete it. Then click the Pictures icon in the placeholder to insert your own image. SHOW.
Parsing Natural Scenes and Natural Language with Recursive Neural Networks INTERNATIONAL CONFERENCE ON MACHINE LEARNING (ICML 2011) RICHARD SOCHER CLIFF.
Bassem Makni SML 16 Click to add text 1 Deep Learning of RDF rules Semantic Machine Learning.
Usman Roshan Dept. of Computer Science NJIT
Naifan Zhuang, Jun Ye, Kien A. Hua
When deep learning meets object detection: Introduction to two technologies: SSD and YOLO Wenchi Ma.
Unsupervised Learning of Video Representations using LSTMs
CS 4501: Introduction to Computer Vision Object Localization, Detection, Semantic Segmentation Connelly Barnes Some slides from Fei-Fei Li / Andrej Karpathy.
Learning to Compare Image Patches via Convolutional Neural Networks
What Convnets Make for Image Captioning?
CS 388: Natural Language Processing: LSTM Recurrent Neural Networks
CS 4501: Introduction to Computer Vision Computer Vision + Natural Language Connelly Barnes Some slides from Fei-Fei Li / Andrej Karpathy / Justin Johnson.
Computational Thinking, Problem-solving and Programming: General Principals IB Computer Science.
Theory and Applications
short introduction and overview
Convolutional Neural Fabrics by Shreyas Saxena, Jakob Verbeek
Recurrent Neural Networks for Natural Language Processing
Deep Compositional Cross-modal Learning to Rank via Local-Global Alignment Xinyang Jiang, Fei Wu, Xi Li, Zhou Zhao, Weiming Lu, Siliang Tang, Yueting.
Gender Classification Using Scaled Conjugate Gradient Back Propagation
Understanding and Predicting Image Memorability at a Large Scale
Show and Tell: A Neural Image Caption Generator (CVPR 2015)
CSCI 5922 Neural Networks and Deep Learning: Image Captioning
Classification with Perceptrons Reading:
Introduction CSE 1310 – Introduction to Computers and Programming
Intelligent Information System Lab
Synthesis of X-ray Projections via Deep Learning
mengye ren, ryan kiros, richard s. zemel
Adversarially Tuned Scene Generation
Master’s Thesis defense Ming Du Advisor: Dr. Yi Shang
Attention-based Caption Description Mun Jonghwan.
A critical review of RNN for sequence learning Zachary C
CSSE463: Image Recognition Day 20
Final Presentation: Neural Network Doc Summarization
Very Deep Convolutional Networks for Large-Scale Image Recognition
Optimization for Fully Connected Neural Network for FPGA application
Object Detection Creation from Scratch Samsung R&D Institute Ukraine
The Big Health Data–Intelligent Machine Paradox
Recurrent Encoder-Decoder Networks for Time-Varying Dense Predictions
LECTURE 42: AUTOMATIC INTERPRETATION OF EEGS
Lip movement Synthesis from Text
LECTURE 41: AUTOMATIC INTERPRETATION OF EEGS
CNN-based Action Recognition Using Adaptive Multiscale Depth Motion Maps And Stable Joint Distance Maps Junyou He, Hailun Xia, Chunyan Feng, Yunfei Chu.
Introduction to Object Tracking
Textual Video Prediction
Presentation By: Eryk Helenowski PURE Mentor: Vincent Bindschaedler
Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton
Attention for translation
Learn to Comment Mentor: Mahdi M. Kalayeh
Automatic Handwriting Generation
The experiments based on Recurrent Neural Networks
Presented By: Harshul Gupta
CRCV REU 2019 Kara Schatz.
Cengizhan Can Phoebe de Nooijer
Text-to-speech (TTS) Traditional approaches (before 2016) Neural TTS
Self-Supervised Cross-View Action Synthesis
End-to-End Speech-Driven Facial Animation with Temporal GANs
Week 7 Presentation Ngoc Ta Aidean Sharghi
LHC beam mode classification
Directional Occlusion with Neural Network
What is dyslexia?. What is dyslexia? Dyslexia is… difficulty learning to read, write and spell a learning difference Dyslexia is often described as.
Presentation transcript:

Project Midterm Presentation STAT 157 - Project #4: Image Captioning Project Midterm Presentation Group Members: Zabin Bashar, Jilin Cao, Mike Jin, Daniel Kim March 5, 2019

Flickr8k Dataset As we can see, there are some differences among them: The dataset contains 8000 of images from Flickr, an image and video hosting service, each of which has 5 captions by different people. The image is given 5 different captions: A boy runs as others play on a homemade slip and slide. Children in swimming clothes in a field. Little kids are playing outside with a water hose and are sliding down a water slide. Several children are playing outside with a wet tarp on the ground. Several children playing on a homemade water slide. As we can see, there are some differences among them: Caption 1 focuses on a boy running. “Children” vs “kids”. Caption 2 is not a grammatically correct sentence. Having different captions helps a model catch these subtleties and be able to generalize better.

Project Procedure Problem Statement Data Preprocessing The problem we want to solve is: Given an image, find the most probable sequence of words (sentence) describing the image. Data Preprocessing Convert each image to a 3-dimensional (height, width, color) vector. Convert words into numbers, e.g. a = 1, and = 2, pen = 3, boy = 4, etc. Image captioning model architecture CNN to extract images into high-level features (objects, background, etc.) Multi-layered Long Short Term Memory networks (RNN) to embed words. Training phase Loss function, optimization, batch training, etc.