Skip to main content

Attention Implementation

·118 words·1 min
Paper Reading Neuron Network attention mechanism

It is better to use row to denote the input dimension and column to denote the channel dimension.

Cross Attention #

In the example of UNINEXT. We have a Features extracted from visual image and prompt say $F_v$ and $F_p$.

  • $F_v$ is of shape $n$, $m$.
  • $F_p$ is of shape $L$, $d$.

We want to use Cross Attention with to update $F_v$ with $F_p$ by the following formula: $$ F_v’ = F_v + CrossAttention(F_v, F_p) $$ here we mean use $F_v$ as query and $F_p$ as key and value.

Alt text
Alt text
最后的输出是一个$N$,$d$的矩阵,其中$N$是输入的序列长度,$d_output$是输出的维度。 如果d_output 和 d相同,可以直接加上去,如果不同,可以用一个线性层来映射到d_output的维度。 $$ Atten(F_v, F_p) = softmax(\frac{Q_{F_v},K_{F_p}^T}{\sqrt{d}})V_{F_p} $$

$$ F_v’ = F_v + CrossAttention(F_v, F_p) $$ Bi-directional Attention 还要反过来 $$ F_p’ = F_p + CrossAttention(F_p, F_v) $$

Related

Markdown Cheat Sheet
·251 words·2 mins
Tutorial markdown
Psychology III
·387 words·2 mins
Psychology English
UNINEXT论文
·1473 words·7 mins
论文 Video Instance Segmentation