Parallel

layers

Tensor Parallel Layers

class mindnlp.parallel.layers.ColumnParallelLinear(in_features: int, out_features: int, bias: bool = True, gather_output: bool = True, init_method: ~typing.Union[str, ~mindspore.common.initializer.Initializer] = 'normal', dtype: <module 'mindspore.common.dtype' from '/home/docs/checkouts/readthedocs.org/user_builds/mindnlpdoc/envs/latest/lib/python3.7/site-packages/mindspore/common/dtype.py'> = mindspore.float32, stride: int = 1, keep_master_weight_for_test: bool = False)[source]

Bases: Cell

Linear layer with column parallelism.

The linear layer is defined as Y = XA + b. A is parallelized along its second dimension as A = [A_1, …, A_p].

Parameters:

in_features – first dimension of matrix A.
out_features – second dimension of matrix A.
bias – If true, add bias
gather_output – If true, call all-gether on output and make Y avaiable to all GPUs, otherwise, every GPU will have its output which is Y_i = XA_i
init_method – method to initialize weights. Note that bias is always set to zero.
stride – For the strided linear layers.
keep_master_weight_for_test – This was added for testing and should be set to False. It returns the master weights used for initialization.

construct(input_: Tensor) → Tensor[source]

Defines the computation to be performed. This method must be overridden by all subclasses.

Note

It is not supported currently that inputs contain both tuple and non-tuple types at same time.

Parameters:

args (tuple) – Tuple of variable parameters.
kwargs (dict) – Dictionary of variable keyword parameters.

Returns:

Tensor, returns the computed result.

get_master_weight() → Tensor[source]: get master weight of ColumnParallelLinear

class mindnlp.parallel.layers.ParallelEmbedding(vocab_size: int, embedding_size: int, padding_idx: ~typing.Optional[int] = None, init_method: ~typing.Union[str, ~mindspore.common.initializer.Initializer] = 'normal', dtype: <module 'mindspore.common.dtype' from '/home/docs/checkouts/readthedocs.org/user_builds/mindnlpdoc/envs/latest/lib/python3.7/site-packages/mindspore/common/dtype.py'> = mindspore.float32)[source]

Bases: Cell

Embedding parallelized in the embedding dimension.

This is mainly adapted from mindspore.nn.Embedding and all the default values are kept. :param vocab_size: vocabulary size. :param embedding_size: size of hidden state. :param init_method: method to initialize weights.

construct(input_: Tensor) → Tensor[source]

Defines the computation to be performed. This method must be overridden by all subclasses.

Note

It is not supported currently that inputs contain both tuple and non-tuple types at same time.

Parameters:

args (tuple) – Tuple of variable parameters.
kwargs (dict) – Dictionary of variable keyword parameters.

Returns:

Tensor, returns the computed result.

class mindnlp.parallel.layers.RowParallelLinear(in_features: int, out_features: int, bias: bool = True, input_is_parallel: bool = False, init_method: ~typing.Union[str, ~mindspore.common.initializer.Initializer] = 'normal', dtype: <module 'mindspore.common.dtype' from '/home/docs/checkouts/readthedocs.org/user_builds/mindnlpdoc/envs/latest/lib/python3.7/site-packages/mindspore/common/dtype.py'> = mindspore.float32, stride: int = 1, keep_master_weight_for_test: bool = False)[source]

Bases: Cell

Linear layer with row parallelism.

The linear layer is defined as Y = XA + b. A is parallelized along its first dimension and X along its second dimension as:

A_1 |

. |

A = | . | X = [X_1, …, X_p]

. |

A_p | - -

Parameters:

in_features – first dimension of matrix A.
out_features – second dimension of matrix A.
bias – If true, add bias. Note that bias is not parallelized.
input_is_parallel – If true, we assume that the input is already split across the GPUs and we do not split again.
init_method – method to initialize weights. Note that bias is always set to zero.
stride – For the strided linear layers.
keep_master_weight_for_test – This was added for testing and should be set to False. It returns the master weights used for initialization.

construct(input_: Tensor) → Tensor[source]

Defines the computation to be performed. This method must be overridden by all subclasses.

Note

It is not supported currently that inputs contain both tuple and non-tuple types at same time.

Parameters:

args (tuple) – Tuple of variable parameters.
kwargs (dict) – Dictionary of variable keyword parameters.

Returns:

Tensor, returns the computed result.

get_master_weight() → Tensor[source]: get master weight of RowParallelLinear

class mindnlp.parallel.layers.VocabParallelEmbedding(vocab_size: int, embedding_size: int, padding_idx: ~typing.Optional[int] = None, init_method: ~typing.Union[str, ~mindspore.common.initializer.Initializer] = 'normal', dtype: <module 'mindspore.common.dtype' from '/home/docs/checkouts/readthedocs.org/user_builds/mindnlpdoc/envs/latest/lib/python3.7/site-packages/mindspore/common/dtype.py'> = mindspore.float32)[source]

Bases: Cell

Embedding parallelized in the vocabulary dimension.

This is mainly adapted from mindspore.nn.Embedding and all the default values are kept. :param vocab_size: vocabulary size. :param embedding_size: size of hidden state. :param init_method: method to initialize weights.

construct(input_: Tensor) → Tensor[source]

Defines the computation to be performed. This method must be overridden by all subclasses.

Note

It is not supported currently that inputs contain both tuple and non-tuple types at same time.

Parameters:

args (tuple) – Tuple of variable parameters.
kwargs (dict) – Dictionary of variable keyword parameters.

Returns:

Tensor, returns the computed result.