Project Midterm Presentation

Slides:

Advertisements

Similar presentations

Spike Sorting Goal: Extract neural spike trains from MEA electrode data Method 1: Convolution of template spikes Method 2: Sort by spikes features.

Advertisements

As applied to face recognition.  Detection vs. Recognition.

1 CS1001 Lecture Overview Java Programming Java Programming Midterm Review Midterm Review.

Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning

Introduction CSE 1310 – Introduction to Computers and Programming Vassilis Athitsos University of Texas at Arlington 1.

An Example of Course Project Face Identification.

Introduction CSE 1310 – Introduction to Computers and Programming Vassilis Athitsos University of Texas at Arlington 1.

GEOTHERMAL ENERGY POWERPOINT RESEARCH Research geothermal energy using the websites provided on my website. The research should focus on answering and.

Volunteering on Kids Who Care. Typical tasks Fieldwork support Logistics A lot of driving Managing the field staff Managing the data capturers Field and.

Introduction CSE 1310 – Introduction to Computers and Programming Vassilis Athitsos University of Texas at Arlington 1.

NOTE: To change the image on this slide, select the picture and delete it. Then click the Pictures icon in the placeholder to insert your own image. SHOW.

Parsing Natural Scenes and Natural Language with Recursive Neural Networks INTERNATIONAL CONFERENCE ON MACHINE LEARNING (ICML 2011) RICHARD SOCHER CLIFF.

Bassem Makni SML 16 Click to add text 1 Deep Learning of RDF rules Semantic Machine Learning.

Usman Roshan Dept. of Computer Science NJIT

Naifan Zhuang, Jun Ye, Kien A. Hua

When deep learning meets object detection: Introduction to two technologies: SSD and YOLO Wenchi Ma.

Unsupervised Learning of Video Representations using LSTMs

CS 4501: Introduction to Computer Vision Object Localization, Detection, Semantic Segmentation Connelly Barnes Some slides from Fei-Fei Li / Andrej Karpathy.

Learning to Compare Image Patches via Convolutional Neural Networks

What Convnets Make for Image Captioning?

CS 388: Natural Language Processing: LSTM Recurrent Neural Networks

CS 4501: Introduction to Computer Vision Computer Vision + Natural Language Connelly Barnes Some slides from Fei-Fei Li / Andrej Karpathy / Justin Johnson.

Computational Thinking, Problem-solving and Programming: General Principals IB Computer Science.

Theory and Applications

short introduction and overview

Convolutional Neural Fabrics by Shreyas Saxena, Jakob Verbeek

Recurrent Neural Networks for Natural Language Processing

Deep Compositional Cross-modal Learning to Rank via Local-Global Alignment Xinyang Jiang, Fei Wu, Xi Li, Zhou Zhao, Weiming Lu, Siliang Tang, Yueting.

Gender Classification Using Scaled Conjugate Gradient Back Propagation

Understanding and Predicting Image Memorability at a Large Scale

Show and Tell: A Neural Image Caption Generator (CVPR 2015)

CSCI 5922 Neural Networks and Deep Learning: Image Captioning

Classification with Perceptrons Reading:

Introduction CSE 1310 – Introduction to Computers and Programming

Intelligent Information System Lab

Synthesis of X-ray Projections via Deep Learning

mengye ren, ryan kiros, richard s. zemel

Adversarially Tuned Scene Generation

Master’s Thesis defense Ming Du Advisor: Dr. Yi Shang

Attention-based Caption Description Mun Jonghwan.

A critical review of RNN for sequence learning Zachary C

CSSE463: Image Recognition Day 20

Final Presentation: Neural Network Doc Summarization

Very Deep Convolutional Networks for Large-Scale Image Recognition

Optimization for Fully Connected Neural Network for FPGA application

Object Detection Creation from Scratch Samsung R&D Institute Ukraine

The Big Health Data–Intelligent Machine Paradox

Recurrent Encoder-Decoder Networks for Time-Varying Dense Predictions

LECTURE 42: AUTOMATIC INTERPRETATION OF EEGS

Lip movement Synthesis from Text

LECTURE 41: AUTOMATIC INTERPRETATION OF EEGS

CNN-based Action Recognition Using Adaptive Multiscale Depth Motion Maps And Stable Joint Distance Maps Junyou He, Hailun Xia, Chunyan Feng, Yunfei Chu.

Introduction to Object Tracking

Textual Video Prediction

Presentation By: Eryk Helenowski PURE Mentor: Vincent Bindschaedler

Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton

Attention for translation

Learn to Comment Mentor: Mahdi M. Kalayeh

Automatic Handwriting Generation

The experiments based on Recurrent Neural Networks

Presented By: Harshul Gupta

CRCV REU 2019 Kara Schatz.

Cengizhan Can Phoebe de Nooijer

Text-to-speech (TTS) Traditional approaches (before 2016) Neural TTS

Self-Supervised Cross-View Action Synthesis

End-to-End Speech-Driven Facial Animation with Temporal GANs

Week 7 Presentation Ngoc Ta Aidean Sharghi

LHC beam mode classification

Directional Occlusion with Neural Network

What is dyslexia?. What is dyslexia? Dyslexia is… difficulty learning to read, write and spell a learning difference Dyslexia is often described as.

Presentation transcript:

Project Midterm Presentation STAT 157 - Project #4: Image Captioning Project Midterm Presentation Group Members: Zabin Bashar, Jilin Cao, Mike Jin, Daniel Kim March 5, 2019

Flickr8k Dataset As we can see, there are some differences among them: The dataset contains 8000 of images from Flickr, an image and video hosting service, each of which has 5 captions by different people. The image is given 5 different captions: A boy runs as others play on a homemade slip and slide. Children in swimming clothes in a field. Little kids are playing outside with a water hose and are sliding down a water slide. Several children are playing outside with a wet tarp on the ground. Several children playing on a homemade water slide. As we can see, there are some differences among them: Caption 1 focuses on a boy running. “Children” vs “kids”. Caption 2 is not a grammatically correct sentence. Having different captions helps a model catch these subtleties and be able to generalize better.

Project Procedure Problem Statement Data Preprocessing The problem we want to solve is: Given an image, find the most probable sequence of words (sentence) describing the image. Data Preprocessing Convert each image to a 3-dimensional (height, width, color) vector. Convert words into numbers, e.g. a = 1, and = 2, pen = 3, boy = 4, etc. Image captioning model architecture CNN to extract images into high-level features (objects, background, etc.) Multi-layered Long Short Term Memory networks (RNN) to embed words. Training phase Loss function, optimization, batch training, etc.