Attentions
attention module
- class mindnlp.modules.attentions.AdditiveAttention(hidden_dims, dropout=0.9)[source]
Bases:
CellAdditive Attention Additive Attention proposed in “Neural Machine Translation by Jointly Learning to Align and Translate” paper
\[Attention(Q,K,V) = (W_v)T *(tanh(W_q * Q + W_k * K))\]- Parameters:
hidden_dims (int) – The dimesion of hidden state vector
dropout (float) – The keep rate, greater than 0 and less equal than 1. E.g. rate=0.9, dropping out 10% of input units. Default: 0.9.
Examples
>>> import mindspore >>> from mindspore import Tensor >>> from mindspore.text.modules.attentions import AdditiveAttention >>> model = AdditiveAttention(hidden_dims=512, dropout=0.9) >>> q = Tensor(np.ones((2, 32, 512)), mindspore.float32) >>> k = Tensor(np.ones((2, 20, 512)), mindspore.float32) >>> v = Tensor(np.ones((2, 20, 512)), mindspore.float32) >>> mask_shape = (2, 32, 20) >>> mask = Tensor(np.ones(mask_shape), mindspore.bool_) >>> output, attn = model(q, k, v, mask) >>> print(output.shape, attn.shape) (2, 32, 512) (2, 32, 20)
- construct(query, key, value, mask: Optional[Tensor] = None)[source]
Additive attention network construction.
- Parameters:
query (mindspore.Tensor) – The query vector. [batch_size, query_size, hidden_size]
key (mindspore.Tensor) – The key vector. [batch_size, key_size, hidden_size]
value (mindspore.Tensor) – The value vector. [batch_size, seq_len, value_hidden_size]
Optional[mindspore.Tensor[bool]] (mask) – The mask vector. [batch_size, query_size, key_size]
- Returns:
output (mindspore.Tensor) - The output of linear attention. [batch_size, query_size, value_hidden_size]
attn (mindspore.Tensor) - The last layer of attention weights. [batch_size, query_size, key_size]
- class mindnlp.modules.attentions.BinaryAttention[source]
Bases:
CellBinary Attention, For a given sequence of two vectors : x_i and y_j, the BiAttention module will compute the attention result by the following equation:
\[\begin{split} \begin{array}{ll} \\ e_{ij} = {x}^{\mathrm{T}}_{i}{y}_{j} \\ {\hat{x}}_{i} = \sum_{j=1}^{\mathcal{l}_{y}}{\frac{ \mathrm{exp}(e_{ij})}{\sum_{k=1}^{\mathcal{l}_{y}}{\mathrm{exp}(e_{ik})}}}{y}_{j} \\ {\hat{y}}_{j} = \sum_{i=1}^{\mathcal{l}_{x}}{\frac{ \mathrm{exp}(e_{ij})}{\sum_{k=1}^{\mathcal{l}_{x}}{\mathrm{exp}(e_{ik})}}}{x}_{i} \\ \end{array}\end{split}\]Examples
>>> import mindspore >>> import mindspore.numpy as np >>> from mindspore import ops >>> from mindspore import Tensor >>> from mindspore.text.modules.attentions import BinaryAttention >>> model = BinaryAttention() >>> standard_normal = ops.StandardNormal(seed=0) >>> x = standard_normal((2, 30, 512)) >>> y = standard_normal((2, 20, 512)) >>> x_mask = Tensor(np.zeros_like(x.shape[:-1]), mindspore.float32) >>> y_mask = Tensor(np.zeros_like(y.shape[:-1]), mindspore.float32) >>> output_x, output_y = model(x, x_mask, y, y_mask) >>> print(output_x.shape, output_y.shape) (2, 30, 512) (2, 20, 512)
- construct(x_batch, x_mask, y_batch, y_mask)[source]
Compute the attention result
- Parameters:
x_batch (mindspore.Tensor) – [batch_size, x_seq_len, hidden_size]
x_mask (mindspore.Tensor) – [batch_size, x_seq_len]
y_batch (mindspore.Tensor) – [batch_size, y_seq_len, hidden_size]
y_mask (mindspore.Tensor) – [batch_size, y_seq_len]
- Returns:
attended_x (mindspore.Tensor) - The output of the attention_x.
attended_y (mindspore.Tensor) - The output of the attention_y.
- class mindnlp.modules.attentions.CosineAttention(dropout=0.9)[source]
Bases:
CellCosine Attention Cosine Attention proposed in “Neural Turing Machines” paper
\[Sim(Q, K) = (Q * (K)T) / |Q| * |K| Attention(Q,K,V) = softmax(Sim(Q, K)) * V\]- Parameters:
dropout (float) – The keep rate, greater than 0 and less equal than 1. E.g. rate=0.9, dropping out 10% of input units. Default: 0.9.
Examples
>>> import mindspore >>> from mindspore import Tensor >>> from mindspore.text.modules.attentions import CosineAttention >>> model = CosineAttention(dropout=0.9) >>> q = Tensor(np.ones((2, 32, 512)), mindspore.float32) >>> k = Tensor(np.ones((2, 20, 512)), mindspore.float32) >>> v = Tensor(np.ones((2, 20, 512)), mindspore.float32) >>> mask_shape = (2, 32, 20) >>> mask = Tensor(np.ones(mask_shape), mindspore.bool_) >>> output, attn = model(q, k, v, mask) >>> print(output.shape, attn.shape) (2, 32, 512) (2, 32, 20)
- construct(query, key, value, mask: Optional[Tensor] = None)[source]
Consine attention network construction.
- Parameters:
query (mindspore.Tensor) – The query vector. [batch_size, query_size, hidden_size]
key (mindspore.Tensor) – The key vector. [batch_size, key_size, hidden_size]
value (mindspore.Tensor) – The value vector. [batch_size, seq_len, value_hidden_size]
Optional[mindspore.Tensor[bool]] (mask) – The mask vector. [batch_size, query_size, key_size]
- Returns:
output (mindspore.Tensor) - The output of linear attention. [batch_size, query_size, value_hidden_size]
attn (mindspore.Tensor) - The last layer of attention weights. [batch_size, query_size, key_size]
- class mindnlp.modules.attentions.LinearAttention(query_dim, key_dim, hidden_dim, dropout=0.9)[source]
Bases:
CellLinear attention computes attention by concat the query and key vector.
- Parameters:
query_size (int) – The sentence length of query. Usually query.shape[-2]
key_size (int) – The sentence length of key. Usually key.shape[-2]
hidden_dim (int) – The dimension of hidden vector
dropout (float) – The keep rate, greater than 0 and less equal than 1. Default: 0.9.
Examples
>>> import mindspore >>> import mindspore.numpy as np >>> from mindspore import ops >>> from mindspore import Tensor >>> from mindspore.text.modules.attentions import LinearAttention >>> standard_normal = ops.StandardNormal(seed=0) >>> query = standard_normal((2, 32, 512)) >>> key = standard_normal((2, 20, 512)) >>> value = standard_normal((2, 20, 500)) >>> net = LinearAttention(batch_size=2, query_dim=32, key_dim=20, hidden_dim=512) >>> mask_shape = (2, 32, 20) >>> mask = Tensor(np.ones(mask_shape), mindspore.bool_) >>> output, attn = net(query, key, value, mask) >>> print(output.shape, attn.shape) (2, 32, 512) (2, 32, 20)
- construct(query, key, value, mask: Optional[Tensor] = None)[source]
linear attention with concatenate construction
- Parameters:
query (mindspore.Tensor) – The query vector. [batch_size, query_size, hidden_size]
key (mindspore.Tensor) – The key vector. [batch_size, key_size, hidden_size]
value (mindspore.Tensor) – The value vector. [batch_size, seq_len, value_hidden_size]
Optional[mindspore.Tensor[bool]] (mask) – The mask vector. [batch_size, query_size, key_size]
- Returns:
output (mindspore.Tensor) - The output of linear attention. [batch_size, query_size, value_hidden_size]
attn (mindspore.Tensor) - The last layer of attention weights. [batch_size, query_size, key_size]
- class mindnlp.modules.attentions.LocationAwareAttention(hidden_dim, smoothing=False)[source]
Bases:
CellLocation Aware Attention Location Aware Attention proposed in “Attention-Based Models for Speech Recognition”
- Parameters:
hidden_dim (int) – The dimension of the hidden states
smoothing (bool) – Smoothing label from “Attention-Based Models for Speech Recognition”
Examples
>>> import mindspore >>> import mindspore.numpy as np >>> from mindspore import ops, Tensor >>> from mindspore.text.modules.attentions import LocationAwareAttention >>> hidden_dim = 20 >>> standard_normal = ops.StandardNormal(seed=0) >>> query = standard_normal((batch_size, 1, hidden_dims)) >>> value = standard_normal((batch_size, seq_len, hidden_dims)) >>> last_attn = standard_normal((batch_size, seq_len)) >>> net = LocationAwareAttention( hidden_dim=20, smoothing=False) >>> mask_shape = (batch_size, seq_len) >>> mask = Tensor(np.ones(mask_shape), mindspore.bool_) >>> net.set_mask(mask) >>> cont, attn = net(query, value, last_attn) >>> print(cont.shape, attn.shape) (2, 1, 20) (2, 40)
- construct(query, value, last_attn=None)[source]
Location aware attention network construction.
- Parameters:
query (mindspore.Tensor) – Decoder hidden states, Shape=(batch_size, 1, decoder_dim).
value (mindspore.Tensor) – Encoder outputs, Shape=(batch_size, seq_len, encoder_dim).
last_attn (mindspore.Tensor) – Attention weight of previous step, Shape=(batch_size, seq_len).
- Returns:
context (mindspore.Tensor) - The context vector, Shape=(batch_size, 1, decoder_dim).
attn (mindspore.Tensor) - Attention weight of this step, Shape=(batch_size, seq_len).
- class mindnlp.modules.attentions.ScaledDotAttention(dropout=0.9)[source]
Bases:
CellScaled Dot-Product Attention Scaled Dot-Product Attention proposed in “Attention Is All You Need”
\[Attention(Q,K,V)=softmax(\frac{QK^T}{\sqrt{d_k}})V\]- Parameters:
dropout (float) – The keep rate, greater than 0 and less equal than 1. E.g. rate=0.9, dropping out 10% of input units. Default: 0.9.
Examples
>>> import mindspore >>> from mindspore import Tensor >>> from mindspore.text.modules.attentions import ScaledDotAttention >>> model = ScaledDotAttention(dropout=0.9) >>> q = Tensor(np.ones((2, 32, 512)), mindspore.float32) >>> k = Tensor(np.ones((2, 20, 512)), mindspore.float32) >>> v = Tensor(np.ones((2, 20, 400)), mindspore.float32) >>> output, att = model(q, k, v) >>> print(output.shape) # (2, 1024, 512) >>> print(att.shape) # (2, 1024, 32)
- construct(query, key, value, mask: Optional[Tensor] = None)[source]
Scaled dot-product attention network construction.
- Parameters:
query (mindspore.Tensor) – The query vector. [batch_size, query_size, hidden_size]
key (mindspore.Tensor) – The key vector. [batch_size, key_size, hidden_size]
value (mindspore.Tensor) – The value vector. [batch_size, seq_len, value_hidden_size]
Optional[mindspore.Tensor[bool]] (mask) – The mask vector. [batch_size, query_size, key_size]
- Returns:
output (mindspore.Tensor) - The output of linear attention. [batch_size, query_size, value_hidden_size]
attn (mindspore.Tensor) - The last layer of attention weights. [batch_size, query_size, key_size]
- class mindnlp.modules.attentions.SelfAttention(d_model=512, dropout_rate=0.1, bias=False, attention_mode='dot')[source]
Bases:
CellSelf attention is from the paper “attention is all you need”
- Parameters:
d_model (int) – The query, key and value vectors dimensions. Default: 512.
dropout (float) – The keep rate, greater than 0 and less equal than 1. Default: 0.9.
bias (bool) – whether to use a bias vector. Default: True.
attention_mode (str) – attention mode. Default: “dot”.
Examples
>>> import mindspore >>> import mindspore.numpy as np >>> from mindspore import ops >>> from mindspore import Tensor >>> from mindspore.text.modules.attentions import SelfAttention >>> standard_normal = ops.StandardNormal(seed=0) >>> query = standard_normal((2, 32, 512)) >>> key = standard_normal((2, 20, 512)) >>> value = standard_normal((2, 20, 512)) >>> mask_shape = (2, 32, 20) >>> mask = Tensor(np.ones(mask_shape), mindspore.bool_) >>> net = SelfAttention() >>> output, attn = net(query, key, value, mask) >>> print(x.shape, attn.shape) (2, 32, 512) (2, 32, 20)
- construct(query, key, value, mask: Optional[Tensor] = None)[source]
Get self-attention output and attention weights.
- Parameters:
query (mindspore.Tensor) – The query vector.
key (mindspore.Tensor) – The key vector.
value (mindspore.Tensor) – The value vector. [batch_size, seq_len, d_model]
Optional[mindspore.Tensor[bool]] (mask) – The mask vector. [batch_size, seq_len, seq_len]
- Returns:
output (mindspore.Tensor) - The output of self attention.
attn (mindspore.Tensor) - The last layer of attention weights