Attentions

attention module

class mindnlp.modules.attentions.AdditiveAttention(hidden_dims, dropout=0.9)[source]

Bases: Cell

Additive Attention Additive Attention proposed in “Neural Machine Translation by Jointly Learning to Align and Translate” paper

\[Attention(Q,K,V) = (W_v)T *(tanh(W_q * Q + W_k * K))\]
Parameters:
  • hidden_dims (int) – The dimesion of hidden state vector

  • dropout (float) – The keep rate, greater than 0 and less equal than 1. E.g. rate=0.9, dropping out 10% of input units. Default: 0.9.

Examples

>>> import mindspore
>>> from mindspore import Tensor
>>> from mindspore.text.modules.attentions import AdditiveAttention
>>> model = AdditiveAttention(hidden_dims=512, dropout=0.9)
>>> q = Tensor(np.ones((2, 32, 512)), mindspore.float32)
>>> k = Tensor(np.ones((2, 20, 512)), mindspore.float32)
>>> v = Tensor(np.ones((2, 20, 512)), mindspore.float32)
>>> mask_shape = (2, 32, 20)
>>> mask = Tensor(np.ones(mask_shape), mindspore.bool_)
>>> output, attn = model(q, k, v, mask)
>>> print(output.shape, attn.shape)
(2, 32, 512) (2, 32, 20)
construct(query, key, value, mask: Optional[Tensor] = None)[source]

Additive attention network construction.

Parameters:
  • query (mindspore.Tensor) – The query vector. [batch_size, query_size, hidden_size]

  • key (mindspore.Tensor) – The key vector. [batch_size, key_size, hidden_size]

  • value (mindspore.Tensor) – The value vector. [batch_size, seq_len, value_hidden_size]

  • Optional[mindspore.Tensor[bool]] (mask) – The mask vector. [batch_size, query_size, key_size]

Returns:

  • output (mindspore.Tensor) - The output of linear attention. [batch_size, query_size, value_hidden_size]

  • attn (mindspore.Tensor) - The last layer of attention weights. [batch_size, query_size, key_size]

class mindnlp.modules.attentions.BinaryAttention[source]

Bases: Cell

Binary Attention, For a given sequence of two vectors : x_i and y_j, the BiAttention module will compute the attention result by the following equation:

\[\begin{split} \begin{array}{ll} \\ e_{ij} = {x}^{\mathrm{T}}_{i}{y}_{j} \\ {\hat{x}}_{i} = \sum_{j=1}^{\mathcal{l}_{y}}{\frac{ \mathrm{exp}(e_{ij})}{\sum_{k=1}^{\mathcal{l}_{y}}{\mathrm{exp}(e_{ik})}}}{y}_{j} \\ {\hat{y}}_{j} = \sum_{i=1}^{\mathcal{l}_{x}}{\frac{ \mathrm{exp}(e_{ij})}{\sum_{k=1}^{\mathcal{l}_{x}}{\mathrm{exp}(e_{ik})}}}{x}_{i} \\ \end{array}\end{split}\]

Examples

>>> import mindspore
>>> import mindspore.numpy as np
>>> from mindspore import ops
>>> from mindspore import Tensor
>>> from mindspore.text.modules.attentions import BinaryAttention
>>> model = BinaryAttention()
>>> standard_normal = ops.StandardNormal(seed=0)
>>> x = standard_normal((2, 30, 512))
>>> y = standard_normal((2, 20, 512))
>>> x_mask = Tensor(np.zeros_like(x.shape[:-1]), mindspore.float32)
>>> y_mask = Tensor(np.zeros_like(y.shape[:-1]), mindspore.float32)
>>> output_x, output_y = model(x, x_mask, y, y_mask)
>>> print(output_x.shape, output_y.shape)
(2, 30, 512) (2, 20, 512)
construct(x_batch, x_mask, y_batch, y_mask)[source]

Compute the attention result

Parameters:
  • x_batch (mindspore.Tensor) – [batch_size, x_seq_len, hidden_size]

  • x_mask (mindspore.Tensor) – [batch_size, x_seq_len]

  • y_batch (mindspore.Tensor) – [batch_size, y_seq_len, hidden_size]

  • y_mask (mindspore.Tensor) – [batch_size, y_seq_len]

Returns:

  • attended_x (mindspore.Tensor) - The output of the attention_x.

  • attended_y (mindspore.Tensor) - The output of the attention_y.

class mindnlp.modules.attentions.CosineAttention(dropout=0.9)[source]

Bases: Cell

Cosine Attention Cosine Attention proposed in “Neural Turing Machines” paper

\[Sim(Q, K) = (Q * (K)T) / |Q| * |K| Attention(Q,K,V) = softmax(Sim(Q, K)) * V\]
Parameters:

dropout (float) – The keep rate, greater than 0 and less equal than 1. E.g. rate=0.9, dropping out 10% of input units. Default: 0.9.

Examples

>>> import mindspore
>>> from mindspore import Tensor
>>> from mindspore.text.modules.attentions import CosineAttention
>>> model = CosineAttention(dropout=0.9)
>>> q = Tensor(np.ones((2, 32, 512)), mindspore.float32)
>>> k = Tensor(np.ones((2, 20, 512)), mindspore.float32)
>>> v = Tensor(np.ones((2, 20, 512)), mindspore.float32)
>>> mask_shape = (2, 32, 20)
>>> mask = Tensor(np.ones(mask_shape), mindspore.bool_)
>>> output, attn = model(q, k, v, mask)
>>> print(output.shape, attn.shape)
(2, 32, 512) (2, 32, 20)
construct(query, key, value, mask: Optional[Tensor] = None)[source]

Consine attention network construction.

Parameters:
  • query (mindspore.Tensor) – The query vector. [batch_size, query_size, hidden_size]

  • key (mindspore.Tensor) – The key vector. [batch_size, key_size, hidden_size]

  • value (mindspore.Tensor) – The value vector. [batch_size, seq_len, value_hidden_size]

  • Optional[mindspore.Tensor[bool]] (mask) – The mask vector. [batch_size, query_size, key_size]

Returns:

  • output (mindspore.Tensor) - The output of linear attention. [batch_size, query_size, value_hidden_size]

  • attn (mindspore.Tensor) - The last layer of attention weights. [batch_size, query_size, key_size]

class mindnlp.modules.attentions.LinearAttention(query_dim, key_dim, hidden_dim, dropout=0.9)[source]

Bases: Cell

Linear attention computes attention by concat the query and key vector.

Parameters:
  • query_size (int) – The sentence length of query. Usually query.shape[-2]

  • key_size (int) – The sentence length of key. Usually key.shape[-2]

  • hidden_dim (int) – The dimension of hidden vector

  • dropout (float) – The keep rate, greater than 0 and less equal than 1. Default: 0.9.

Examples

>>> import mindspore
>>> import mindspore.numpy as np
>>> from mindspore import ops
>>> from mindspore import Tensor
>>> from mindspore.text.modules.attentions import LinearAttention
>>> standard_normal = ops.StandardNormal(seed=0)
>>> query = standard_normal((2, 32, 512))
>>> key = standard_normal((2, 20, 512))
>>> value = standard_normal((2, 20, 500))
>>> net = LinearAttention(batch_size=2, query_dim=32, key_dim=20, hidden_dim=512)
>>> mask_shape = (2, 32, 20)
>>> mask = Tensor(np.ones(mask_shape), mindspore.bool_)
>>> output, attn = net(query, key, value, mask)
>>> print(output.shape, attn.shape)
(2, 32, 512) (2, 32, 20)
construct(query, key, value, mask: Optional[Tensor] = None)[source]

linear attention with concatenate construction

Parameters:
  • query (mindspore.Tensor) – The query vector. [batch_size, query_size, hidden_size]

  • key (mindspore.Tensor) – The key vector. [batch_size, key_size, hidden_size]

  • value (mindspore.Tensor) – The value vector. [batch_size, seq_len, value_hidden_size]

  • Optional[mindspore.Tensor[bool]] (mask) – The mask vector. [batch_size, query_size, key_size]

Returns:

  • output (mindspore.Tensor) - The output of linear attention. [batch_size, query_size, value_hidden_size]

  • attn (mindspore.Tensor) - The last layer of attention weights. [batch_size, query_size, key_size]

class mindnlp.modules.attentions.LocationAwareAttention(hidden_dim, smoothing=False)[source]

Bases: Cell

Location Aware Attention Location Aware Attention proposed in “Attention-Based Models for Speech Recognition”

Parameters:
  • hidden_dim (int) – The dimension of the hidden states

  • smoothing (bool) – Smoothing label from “Attention-Based Models for Speech Recognition”

Examples

>>> import mindspore
>>> import mindspore.numpy as np
>>> from mindspore import ops, Tensor
>>> from mindspore.text.modules.attentions import LocationAwareAttention
>>> hidden_dim = 20
>>> standard_normal = ops.StandardNormal(seed=0)
>>> query = standard_normal((batch_size, 1, hidden_dims))
>>> value = standard_normal((batch_size, seq_len, hidden_dims))
>>> last_attn = standard_normal((batch_size, seq_len))
>>> net = LocationAwareAttention(
    hidden_dim=20,
    smoothing=False)
>>> mask_shape = (batch_size, seq_len)
>>> mask = Tensor(np.ones(mask_shape), mindspore.bool_)
>>> net.set_mask(mask)
>>> cont, attn = net(query, value, last_attn)
>>> print(cont.shape, attn.shape)
(2, 1, 20) (2, 40)
construct(query, value, last_attn=None)[source]

Location aware attention network construction.

Parameters:
  • query (mindspore.Tensor) – Decoder hidden states, Shape=(batch_size, 1, decoder_dim).

  • value (mindspore.Tensor) – Encoder outputs, Shape=(batch_size, seq_len, encoder_dim).

  • last_attn (mindspore.Tensor) – Attention weight of previous step, Shape=(batch_size, seq_len).

Returns:

  • context (mindspore.Tensor) - The context vector, Shape=(batch_size, 1, decoder_dim).

  • attn (mindspore.Tensor) - Attention weight of this step, Shape=(batch_size, seq_len).

set_mask(mask)[source]

Set the mask

Args: mask mindspore.Tensor[bool]: The mask vector.

class mindnlp.modules.attentions.ScaledDotAttention(dropout=0.9)[source]

Bases: Cell

Scaled Dot-Product Attention Scaled Dot-Product Attention proposed in “Attention Is All You Need”

\[Attention(Q,K,V)=softmax(\frac{QK^T}{\sqrt{d_k}})V\]
Parameters:

dropout (float) – The keep rate, greater than 0 and less equal than 1. E.g. rate=0.9, dropping out 10% of input units. Default: 0.9.

Examples

>>> import mindspore
>>> from mindspore import Tensor
>>> from mindspore.text.modules.attentions import ScaledDotAttention
>>> model = ScaledDotAttention(dropout=0.9)
>>> q = Tensor(np.ones((2, 32, 512)), mindspore.float32)
>>> k = Tensor(np.ones((2, 20, 512)), mindspore.float32)
>>> v = Tensor(np.ones((2, 20, 400)), mindspore.float32)
>>> output, att = model(q, k, v)
>>> print(output.shape)
# (2, 1024, 512)
>>> print(att.shape)
# (2, 1024, 32)
construct(query, key, value, mask: Optional[Tensor] = None)[source]

Scaled dot-product attention network construction.

Parameters:
  • query (mindspore.Tensor) – The query vector. [batch_size, query_size, hidden_size]

  • key (mindspore.Tensor) – The key vector. [batch_size, key_size, hidden_size]

  • value (mindspore.Tensor) – The value vector. [batch_size, seq_len, value_hidden_size]

  • Optional[mindspore.Tensor[bool]] (mask) – The mask vector. [batch_size, query_size, key_size]

Returns:

  • output (mindspore.Tensor) - The output of linear attention. [batch_size, query_size, value_hidden_size]

  • attn (mindspore.Tensor) - The last layer of attention weights. [batch_size, query_size, key_size]

class mindnlp.modules.attentions.SelfAttention(d_model=512, dropout_rate=0.1, bias=False, attention_mode='dot')[source]

Bases: Cell

Self attention is from the paper “attention is all you need”

Parameters:
  • d_model (int) – The query, key and value vectors dimensions. Default: 512.

  • dropout (float) – The keep rate, greater than 0 and less equal than 1. Default: 0.9.

  • bias (bool) – whether to use a bias vector. Default: True.

  • attention_mode (str) – attention mode. Default: “dot”.

Examples

>>> import mindspore
>>> import mindspore.numpy as np
>>> from mindspore import ops
>>> from mindspore import Tensor
>>> from mindspore.text.modules.attentions import SelfAttention
>>> standard_normal = ops.StandardNormal(seed=0)
>>> query = standard_normal((2, 32, 512))
>>> key = standard_normal((2, 20, 512))
>>> value = standard_normal((2, 20, 512))
>>> mask_shape = (2, 32, 20)
>>> mask = Tensor(np.ones(mask_shape), mindspore.bool_)
>>> net = SelfAttention()
>>> output, attn = net(query, key, value, mask)
>>> print(x.shape, attn.shape)
(2, 32, 512) (2, 32, 20)
construct(query, key, value, mask: Optional[Tensor] = None)[source]

Get self-attention output and attention weights.

Parameters:
  • query (mindspore.Tensor) – The query vector.

  • key (mindspore.Tensor) – The key vector.

  • value (mindspore.Tensor) – The value vector. [batch_size, seq_len, d_model]

  • Optional[mindspore.Tensor[bool]] (mask) – The mask vector. [batch_size, seq_len, seq_len]

Returns:

  • output (mindspore.Tensor) - The output of self attention.

  • attn (mindspore.Tensor) - The last layer of attention weights