Attention Is All You Need

Slides:



Advertisements
Similar presentations
Open Powerpoint and go to Slide Show in the top menu. Click VIEW SHOW. Then type in the white boxes. Type Name Date:
Advertisements

Date : 2014/06/10 Author :Shahab Kamali Frank Wm. Tompa Source : SIGIR’13 Advisor : Jia-ling Koh Speaker : Shao-Chun Peng Retrieving Documents With Mathematical.
Diversity Maximization Under Matroid Constraints Date : 2013/11/06 Source : KDD’13 Authors : Zeinab Abbassi, Vahab S. Mirrokni, Mayur Thakur Advisor :
Company Presentation. Slide subtitle Heading Conclusion Title […] ‒ […] ― […] 2.
Write Your Project Title Here VU Logo Here Group Members Introduction Write your group members introduction here with names and VU Id.
Fast vector quantization image coding by mean value predictive algorithm Authors: Yung-Gi Wu, Kuo-Lun Fan Source: Journal of Electronic Imaging 13(2),
Block-LDPC: A Practical LDPC Coding System Design Approach
Title of Presentation Name Date 2011 Presentation Template.
ORec : An Opinion-Based Point-of-Interest Recommendation Framework
Trainers name(s) here.
Convolutional Sequence to Sequence Learning
Customized of Social Media Contents using Focused Topic Hierarchy
Click Through Rate Prediction for Local Search Results
Where Did You Go: Personalized Annotation of Mobility Records
Open question answering over curated and extracted knowledge bases
Please use the top of the slide. The area that will be printed is 4
Attention Is All You Need
continued on next slide
Attention Is All You Need
Picode: A New Picture-Embedding 2D Barcode
On the Generative Discovery of Structured Medical Knowledge
Attention Is All You Need
                                                                                                                                                                                                                                                
مراقبت خانواده محور در NICU
continued on next slide
continued on next slide
Affiliation of presenter
Speaker: Jim-an tsai advisor: professor jia-lin koh
A Large Scale Prediction Engine for App Install Clicks and Conversions
دانشگاه شهیدرجایی تهران
كار همراه با آسودگي و امنيت
HELLO THERE. THIS IS A TEST SLIDE SLIDE NUMBER 1.
تعهدات مشتری در کنوانسیون بیع بین المللی
10:00.
بسمه تعالی کارگاه ارزشیابی پیشرفت تحصیلی
CHANGING SPEAKERS Active Speaker Following Speaker
Measuring the Latency of Depression Detection in Social Media
Speaker: Jim-An Tsai Advisor: Professor Jia-ling Koh
Speaker: Jim-an tsai advisor: professor jia-lin koh
Identifying Decision Makers from Professional Social Networks
Advisor: Jia-Ling Koh Presenter: Yin-Hsiang Liao Source: ACL 2018
Authors:Debiao He, Sherali Zeadally, Neeraj Kumar and Wei Wu
Topic of the final thesis
Intent-Aware Semantic Query Annotation
Topic of the diploma thesis
Name of Event Name of Event
Date : 2013/1/10 Author : Lanbo Zhang, Yi Zhang, Yunfei Chen
Date: 2012/11/15 Author: Jin Young Kim, Kevyn Collins-Thompson,
The Title of the Bachelor’s Thesis
HeteroMed: Heterogeneous Information Network for Medical Diagnosis
TOPTRAC: Topical Trajectory Pattern Mining
Sourse: Multimedia Tools and Applications, 2018, pp 1–17
Research Paper Overview.
A Data-Driven Question Generation Model for Educational Content
-- Ray Mooney, Association for Computational Linguistics (ACL) 2014
Speaker: Ming-Chieh, Chiang
Deep Interest Network for Click-Through Rate Prediction
Type your title here Type author Names here *Affiliation 1
TITLE OF THE PRESENTATION
The title of the bachelor’s thesis
Neural Joint Model for Transition-based Chinese Syntactic Analysis
Discovering Important Nodes through Graph Entropy
Ask and Answer Questions
continued on next slide
Question Answering System
Heterogeneous Graph Attention Network
Preference Based Evaluation Measures for Novelty and Diversity
Speaker name Title Title
continued on next slide
Speaker name Title Title
Presentation transcript:

Attention Is All You Need Speaker: Yin Hsiang Liao Advisor: Jia Ling Koh Date: 2019 April 29th Source: NIPS 2017

Outline 1. Introduction 2. Model Architecture 2.1 Overview 2.2 Encoder & Decoder (1) 2.3 Attention 2.4 Encoder & Decoder (2) 3. Result & Conclusion

Let’s start with the RNN-based structure. Introduction 1 Let’s start with the RNN-based structure.

1. No long-term dependency 2. Less parallelizable Seq2Seq 1. No long-term dependency 2. Less parallelizable

1. No long-term dependency Seq2Seq & Attention 1. No long-term dependency 2. Less parallelizable: slower, constraint by RAM

1. Long-term dependency 2. Parallelizable Transformer https://jalammar.github.io/illustrated-transformer/

Model Architecture 2 A top-down approach

Overview

Overview http://nlp.seas.harvard.edu/2018/04/03/attention.html

Encoder & Decoder (1)

Encoder & Decoder (1)

Self (Multi-head) Attention

Self (Multi-head) Attention

Self (Multi-head) Attention

Self (Multi-head) Attention

Self (Multi-head) Attention

Self (Multi-head) Attention

Self (Multi-head) Attention

Self (Multi-head) Attention

Self (Multi-head) Attention

Self (Multi-head) Attention 3 2 1

Overview

Encoder & Decoder (2)

Encoder & Decoder (2)

Encoder & Decoder (2)

Let’s start with the first set of slides Result & Conclusion 3 Let’s start with the first set of slides

Result

Thank you !