Attention Is All You Need

Slides:

Advertisements

Similar presentations

Open Powerpoint and go to Slide Show in the top menu. Click VIEW SHOW. Then type in the white boxes. Type Name Date:

Advertisements

Date : 2014/06/10 Author :Shahab Kamali Frank Wm. Tompa Source : SIGIR’13 Advisor : Jia-ling Koh Speaker : Shao-Chun Peng Retrieving Documents With Mathematical.

Diversity Maximization Under Matroid Constraints Date : 2013/11/06 Source : KDD’13 Authors : Zeinab Abbassi, Vahab S. Mirrokni, Mayur Thakur Advisor :

Company Presentation. Slide subtitle Heading Conclusion Title […] ‒ […] ― […] 2.

Write Your Project Title Here VU Logo Here Group Members Introduction Write your group members introduction here with names and VU Id.

Fast vector quantization image coding by mean value predictive algorithm Authors: Yung-Gi Wu, Kuo-Lun Fan Source: Journal of Electronic Imaging 13(2),

Block-LDPC: A Practical LDPC Coding System Design Approach

Title of Presentation Name Date 2011 Presentation Template.

ORec : An Opinion-Based Point-of-Interest Recommendation Framework

Trainers name(s) here.

Convolutional Sequence to Sequence Learning

Customized of Social Media Contents using Focused Topic Hierarchy

Click Through Rate Prediction for Local Search Results

Where Did You Go: Personalized Annotation of Mobility Records

Open question answering over curated and extracted knowledge bases

Please use the top of the slide. The area that will be printed is 4

Attention Is All You Need

continued on next slide

Attention Is All You Need

Picode: A New Picture-Embedding 2D Barcode

On the Generative Discovery of Structured Medical Knowledge

Attention Is All You Need

مراقبت خانواده محور در NICU

continued on next slide

continued on next slide

Affiliation of presenter

Speaker: Jim-an tsai advisor: professor jia-lin koh

A Large Scale Prediction Engine for App Install Clicks and Conversions

دانشگاه شهیدرجایی تهران

كار همراه با آسودگي و امنيت

HELLO THERE. THIS IS A TEST SLIDE SLIDE NUMBER 1.

تعهدات مشتری در کنوانسیون بیع بین المللی

بسمه تعالی کارگاه ارزشیابی پیشرفت تحصیلی

CHANGING SPEAKERS Active Speaker Following Speaker

Measuring the Latency of Depression Detection in Social Media

Speaker: Jim-An Tsai Advisor: Professor Jia-ling Koh

Speaker: Jim-an tsai advisor: professor jia-lin koh

Identifying Decision Makers from Professional Social Networks

Advisor: Jia-Ling Koh Presenter: Yin-Hsiang Liao Source: ACL 2018

Authors：Debiao He, Sherali Zeadally, Neeraj Kumar and Wei Wu

Topic of the final thesis

Intent-Aware Semantic Query Annotation

Topic of the diploma thesis

Name of Event Name of Event

Date : 2013/1/10 Author : Lanbo Zhang, Yi Zhang, Yunfei Chen

Date: 2012/11/15 Author: Jin Young Kim, Kevyn Collins-Thompson,

The Title of the Bachelor’s Thesis

HeteroMed: Heterogeneous Information Network for Medical Diagnosis

TOPTRAC: Topical Trajectory Pattern Mining

Sourse: Multimedia Tools and Applications, 2018, pp 1–17

Research Paper Overview.

A Data-Driven Question Generation Model for Educational Content

-- Ray Mooney, Association for Computational Linguistics (ACL) 2014

Speaker: Ming-Chieh, Chiang

Deep Interest Network for Click-Through Rate Prediction

Type your title here Type author Names here *Affiliation 1

TITLE OF THE PRESENTATION

The title of the bachelor’s thesis

Neural Joint Model for Transition-based Chinese Syntactic Analysis

Discovering Important Nodes through Graph Entropy

Ask and Answer Questions

continued on next slide

Question Answering System

Heterogeneous Graph Attention Network

Preference Based Evaluation Measures for Novelty and Diversity

Speaker name Title Title

continued on next slide

Speaker name Title Title

Presentation transcript:

Attention Is All You Need Speaker: Yin Hsiang Liao Advisor: Jia Ling Koh Date: 2019 April 29th Source: NIPS 2017

Outline 1. Introduction 2. Model Architecture 2.1 Overview 2.2 Encoder & Decoder (1) 2.3 Attention 2.4 Encoder & Decoder (2) 3. Result & Conclusion

Let’s start with the RNN-based structure. Introduction 1 Let’s start with the RNN-based structure.

1. No long-term dependency 2. Less parallelizable Seq2Seq 1. No long-term dependency 2. Less parallelizable

1. No long-term dependency Seq2Seq & Attention 1. No long-term dependency 2. Less parallelizable: slower, constraint by RAM

1. Long-term dependency 2. Parallelizable Transformer https://jalammar.github.io/illustrated-transformer/

Model Architecture 2 A top-down approach

Overview

Overview http://nlp.seas.harvard.edu/2018/04/03/attention.html

Encoder & Decoder (1)

Encoder & Decoder (1)

Self (Multi-head) Attention

Self (Multi-head) Attention

Self (Multi-head) Attention

Self (Multi-head) Attention

Self (Multi-head) Attention

Self (Multi-head) Attention

Self (Multi-head) Attention

Self (Multi-head) Attention

Self (Multi-head) Attention

Self (Multi-head) Attention 3 2 1

Overview

Encoder & Decoder (2)

Encoder & Decoder (2)

Encoder & Decoder (2)

Let’s start with the first set of slides Result & Conclusion 3 Let’s start with the first set of slides

Result

Thank you !