Parallel
layers
Tensor Parallel Layers
- class mindnlp.parallel.layers.ColumnParallelLinear(in_features: int, out_features: int, bias: bool = True, gather_output: bool = True, init_method: ~typing.Union[str, ~mindspore.common.initializer.Initializer] = 'normal', dtype: <module 'mindspore.common.dtype' from '/home/docs/checkouts/readthedocs.org/user_builds/mindnlpdoc/envs/latest/lib/python3.7/site-packages/mindspore/common/dtype.py'> = mindspore.float32, stride: int = 1, keep_master_weight_for_test: bool = False)[source]
Bases:
CellLinear layer with column parallelism.
The linear layer is defined as Y = XA + b. A is parallelized along its second dimension as A = [A_1, …, A_p].
- Parameters:
in_features – first dimension of matrix A.
out_features – second dimension of matrix A.
bias – If true, add bias
gather_output – If true, call all-gether on output and make Y avaiable to all GPUs, otherwise, every GPU will have its output which is Y_i = XA_i
init_method – method to initialize weights. Note that bias is always set to zero.
stride – For the strided linear layers.
keep_master_weight_for_test – This was added for testing and should be set to False. It returns the master weights used for initialization.
- construct(input_: Tensor) Tensor[source]
Defines the computation to be performed. This method must be overridden by all subclasses.
Note
It is not supported currently that inputs contain both tuple and non-tuple types at same time.
- Parameters:
args (tuple) – Tuple of variable parameters.
kwargs (dict) – Dictionary of variable keyword parameters.
- Returns:
Tensor, returns the computed result.
- class mindnlp.parallel.layers.ParallelEmbedding(vocab_size: int, embedding_size: int, padding_idx: ~typing.Optional[int] = None, init_method: ~typing.Union[str, ~mindspore.common.initializer.Initializer] = 'normal', dtype: <module 'mindspore.common.dtype' from '/home/docs/checkouts/readthedocs.org/user_builds/mindnlpdoc/envs/latest/lib/python3.7/site-packages/mindspore/common/dtype.py'> = mindspore.float32)[source]
Bases:
CellEmbedding parallelized in the embedding dimension.
This is mainly adapted from mindspore.nn.Embedding and all the default values are kept. :param vocab_size: vocabulary size. :param embedding_size: size of hidden state. :param init_method: method to initialize weights.
- construct(input_: Tensor) Tensor[source]
Defines the computation to be performed. This method must be overridden by all subclasses.
Note
It is not supported currently that inputs contain both tuple and non-tuple types at same time.
- Parameters:
args (tuple) – Tuple of variable parameters.
kwargs (dict) – Dictionary of variable keyword parameters.
- Returns:
Tensor, returns the computed result.
- class mindnlp.parallel.layers.RowParallelLinear(in_features: int, out_features: int, bias: bool = True, input_is_parallel: bool = False, init_method: ~typing.Union[str, ~mindspore.common.initializer.Initializer] = 'normal', dtype: <module 'mindspore.common.dtype' from '/home/docs/checkouts/readthedocs.org/user_builds/mindnlpdoc/envs/latest/lib/python3.7/site-packages/mindspore/common/dtype.py'> = mindspore.float32, stride: int = 1, keep_master_weight_for_test: bool = False)[source]
Bases:
CellLinear layer with row parallelism.
The linear layer is defined as Y = XA + b. A is parallelized along its first dimension and X along its second dimension as:
A_1 |. |- A = | . | X = [X_1, …, X_p]
- . |A_p | - -
- Parameters:
in_features – first dimension of matrix A.
out_features – second dimension of matrix A.
bias – If true, add bias. Note that bias is not parallelized.
input_is_parallel – If true, we assume that the input is already split across the GPUs and we do not split again.
init_method – method to initialize weights. Note that bias is always set to zero.
stride – For the strided linear layers.
keep_master_weight_for_test – This was added for testing and should be set to False. It returns the master weights used for initialization.
- construct(input_: Tensor) Tensor[source]
Defines the computation to be performed. This method must be overridden by all subclasses.
Note
It is not supported currently that inputs contain both tuple and non-tuple types at same time.
- Parameters:
args (tuple) – Tuple of variable parameters.
kwargs (dict) – Dictionary of variable keyword parameters.
- Returns:
Tensor, returns the computed result.
- class mindnlp.parallel.layers.VocabParallelEmbedding(vocab_size: int, embedding_size: int, padding_idx: ~typing.Optional[int] = None, init_method: ~typing.Union[str, ~mindspore.common.initializer.Initializer] = 'normal', dtype: <module 'mindspore.common.dtype' from '/home/docs/checkouts/readthedocs.org/user_builds/mindnlpdoc/envs/latest/lib/python3.7/site-packages/mindspore/common/dtype.py'> = mindspore.float32)[source]
Bases:
CellEmbedding parallelized in the vocabulary dimension.
This is mainly adapted from mindspore.nn.Embedding and all the default values are kept. :param vocab_size: vocabulary size. :param embedding_size: size of hidden state. :param init_method: method to initialize weights.
- construct(input_: Tensor) Tensor[source]
Defines the computation to be performed. This method must be overridden by all subclasses.
Note
It is not supported currently that inputs contain both tuple and non-tuple types at same time.
- Parameters:
args (tuple) – Tuple of variable parameters.
kwargs (dict) – Dictionary of variable keyword parameters.
- Returns:
Tensor, returns the computed result.