Parallel

layers

Tensor Parallel Layers

class mindnlp.parallel.layers.ColumnParallelLinear(in_features: int, out_features: int, bias: bool = True, gather_output: bool = True, init_method: ~typing.Union[str, ~mindspore.common.initializer.Initializer] = 'normal', dtype: <module 'mindspore.common.dtype' from '/home/docs/checkouts/readthedocs.org/user_builds/mindnlpdoc/envs/latest/lib/python3.7/site-packages/mindspore/common/dtype.py'> = mindspore.float32, stride: int = 1, keep_master_weight_for_test: bool = False)[source]

Bases: Cell

Linear layer with column parallelism.

The linear layer is defined as Y = XA + b. A is parallelized along its second dimension as A = [A_1, …, A_p].

Parameters:
  • in_features – first dimension of matrix A.

  • out_features – second dimension of matrix A.

  • bias – If true, add bias

  • gather_output – If true, call all-gether on output and make Y avaiable to all GPUs, otherwise, every GPU will have its output which is Y_i = XA_i

  • init_method – method to initialize weights. Note that bias is always set to zero.

  • stride – For the strided linear layers.

  • keep_master_weight_for_test – This was added for testing and should be set to False. It returns the master weights used for initialization.

construct(input_: Tensor) Tensor[source]

Defines the computation to be performed. This method must be overridden by all subclasses.

Note

It is not supported currently that inputs contain both tuple and non-tuple types at same time.

Parameters:
  • args (tuple) – Tuple of variable parameters.

  • kwargs (dict) – Dictionary of variable keyword parameters.

Returns:

Tensor, returns the computed result.

get_master_weight() Tensor[source]

get master weight of ColumnParallelLinear

class mindnlp.parallel.layers.ParallelEmbedding(vocab_size: int, embedding_size: int, padding_idx: ~typing.Optional[int] = None, init_method: ~typing.Union[str, ~mindspore.common.initializer.Initializer] = 'normal', dtype: <module 'mindspore.common.dtype' from '/home/docs/checkouts/readthedocs.org/user_builds/mindnlpdoc/envs/latest/lib/python3.7/site-packages/mindspore/common/dtype.py'> = mindspore.float32)[source]

Bases: Cell

Embedding parallelized in the embedding dimension.

This is mainly adapted from mindspore.nn.Embedding and all the default values are kept. :param vocab_size: vocabulary size. :param embedding_size: size of hidden state. :param init_method: method to initialize weights.

construct(input_: Tensor) Tensor[source]

Defines the computation to be performed. This method must be overridden by all subclasses.

Note

It is not supported currently that inputs contain both tuple and non-tuple types at same time.

Parameters:
  • args (tuple) – Tuple of variable parameters.

  • kwargs (dict) – Dictionary of variable keyword parameters.

Returns:

Tensor, returns the computed result.

class mindnlp.parallel.layers.RowParallelLinear(in_features: int, out_features: int, bias: bool = True, input_is_parallel: bool = False, init_method: ~typing.Union[str, ~mindspore.common.initializer.Initializer] = 'normal', dtype: <module 'mindspore.common.dtype' from '/home/docs/checkouts/readthedocs.org/user_builds/mindnlpdoc/envs/latest/lib/python3.7/site-packages/mindspore/common/dtype.py'> = mindspore.float32, stride: int = 1, keep_master_weight_for_test: bool = False)[source]

Bases: Cell

Linear layer with row parallelism.

The linear layer is defined as Y = XA + b. A is parallelized along its first dimension and X along its second dimension as:

A_1 |
. |
A = | . | X = [X_1, …, X_p]
. |
A_p | - -
Parameters:
  • in_features – first dimension of matrix A.

  • out_features – second dimension of matrix A.

  • bias – If true, add bias. Note that bias is not parallelized.

  • input_is_parallel – If true, we assume that the input is already split across the GPUs and we do not split again.

  • init_method – method to initialize weights. Note that bias is always set to zero.

  • stride – For the strided linear layers.

  • keep_master_weight_for_test – This was added for testing and should be set to False. It returns the master weights used for initialization.

construct(input_: Tensor) Tensor[source]

Defines the computation to be performed. This method must be overridden by all subclasses.

Note

It is not supported currently that inputs contain both tuple and non-tuple types at same time.

Parameters:
  • args (tuple) – Tuple of variable parameters.

  • kwargs (dict) – Dictionary of variable keyword parameters.

Returns:

Tensor, returns the computed result.

get_master_weight() Tensor[source]

get master weight of RowParallelLinear

class mindnlp.parallel.layers.VocabParallelEmbedding(vocab_size: int, embedding_size: int, padding_idx: ~typing.Optional[int] = None, init_method: ~typing.Union[str, ~mindspore.common.initializer.Initializer] = 'normal', dtype: <module 'mindspore.common.dtype' from '/home/docs/checkouts/readthedocs.org/user_builds/mindnlpdoc/envs/latest/lib/python3.7/site-packages/mindspore/common/dtype.py'> = mindspore.float32)[source]

Bases: Cell

Embedding parallelized in the vocabulary dimension.

This is mainly adapted from mindspore.nn.Embedding and all the default values are kept. :param vocab_size: vocabulary size. :param embedding_size: size of hidden state. :param init_method: method to initialize weights.

construct(input_: Tensor) Tensor[source]

Defines the computation to be performed. This method must be overridden by all subclasses.

Note

It is not supported currently that inputs contain both tuple and non-tuple types at same time.

Parameters:
  • args (tuple) – Tuple of variable parameters.

  • kwargs (dict) – Dictionary of variable keyword parameters.

Returns:

Tensor, returns the computed result.