Neural machine translation by jointly learning to align and translate

cleanUrl: /papers/seq2seq-with-attention
share: true

원본

복사본

이전에 리뷰했던 seq2seq 논문 의 1 저자이신 조경현 교수님의 또다른 논문 입니다. 사실 seq2seq 의 단점을 극복하겠다 하면서 나온 논문이였는데, 그 방식으로 쓰인 attention mechanism 가 최근 NLP sota 에서는 거의 필수적으로 사용되고있어 매우 중요한 논문이 되었다. 이번 논문에서는 seq2seq 의 단점을 attention 으로 어떻게 극복하였고, attention 이 어떤것인지 알아봅시다.

이 논문에서 중심적으로 다룬 문제는 바로 context vector 그 자체 이다, seq2seq 에서 encoder 와 decoder 를 이어주는 context vector 가 fixed size vector 이고, 가장 마지막 hidden state 만 사용하므로 이 하나의 vector 에 time 에 따른 dynamic 한 정보를 담을 수 없다 라고 한다.

생각해보면 맞는 말이다.

예를들어, I love apple. 이란 문장이 있다고 치자. 한국어로 나는 사과를 사랑합니다. 로 번역할 경우, 나는 을 generate 할 때 집중해야 하는 단어는 I 이고, 사과를 을 generate 할때 apple 에 집중해야합니다.

하지만 seq2seq 에서는 이 모든 정보를 context vector 라는 fixed size vector 에 담아두고 전체 time 의 generate 에 사용합니다.

이러한 문제점을 해결하고자, dynamic context vector 를 구현하는것이 attention mechanism 의 목표 입니다.

논문에서는 이렇게 말하고 있습니다.

encoder–decoders and encode a source sentence into a fixed-length vector from which a decoder generates a translation.

potential issue with this encoder–decoder approach is that a neural network needs to be able to compress all the necessary information of a source sentence into a fixed-length vector.

그에 따른 해결책으로 다음과같이 말합니다.

Each time the proposed model generates a word in a translation, it (soft-)searches for a set of positions in a source sentence where the most relevant information is concentrated.