Text Generation

lcsts

LCSTS load function

mindnlp.dataset.text_generation.lcsts.LCSTS(root: str = '/home/docs/checkouts/readthedocs.org/user_builds/mindnlpdoc/checkouts/latest/docs/.mindnlp', split: Union[Tuple[str], str] = ('train', 'dev'), proxies=None)[source]

Load the LCSTS dataset

Parameters:
  • root (str) – Directory where the datasets are saved.

  • split (str|Tuple[str]) – Split or splits to be returned. Default:(‘train’, ‘dev’).

  • proxies (dict) – a dict to identify proxies,for example: {“https”: “https://127.0.0.1:7890”}.

Returns:

  • datasets_list (list) -A list of loaded datasets. If only one type of dataset is specified,such as ‘trian’, this dataset is returned instead of a list of datasets.

Raises:
  • TypeError – If root is not a string.

  • TypeError – If split is not a string or Tuple[str].

Examples

>>> root = "~/.mindnlp"
>>> split = ('train', 'dev')
>>> dataset_train, dataset_dev = LCSTS(root, split)
>>> train_iter = dataset_train.create_dict_iterator()
>>> print(next(train_iter))
{'source': Tensor(shape=[], dtype=String, value= '一辆小轿车,一名女司机,\
    竟造成9死24伤。日前,深圳市交警局对事故进行通报:从目前证据看,事故系司机超速行驶且操作不当导致。\
        目前24名伤员已有6名治愈出院,其余正接受治疗,预计事故赔偿费或超一千万元。'),
'target': Tensor(shape=[], dtype=String, value= '深圳机场9死24伤续:司机全责赔偿或超千万')}
class mindnlp.dataset.text_generation.lcsts.Lcsts(path)[source]

Bases: object

LCSTS dataset source

penntreebank

PennTreebank load function

mindnlp.dataset.text_generation.penntreebank.PennTreebank(root: str = '/home/docs/checkouts/readthedocs.org/user_builds/mindnlpdoc/checkouts/latest/docs/.mindnlp', split: Union[Tuple[str], str] = ('train', 'valid', 'test'), proxies=None)[source]

Load the PennTreebank dataset

Parameters:
  • root (str) – Directory where the datasets are saved.

  • split (str|Tuple[str]) – Split or splits to be returned. Default:(‘train’, ‘valid’, ‘test’).

  • proxies (dict) – a dict to identify proxies,for example: {“https”: “https://127.0.0.1:7890”}.

Returns:

  • datasets_list (list) -A list of loaded datasets. If only one type of dataset is specified,such as ‘trian’, this dataset is returned instead of a list of datasets.

Raises:
  • TypeError – If root is not a string.

  • TypeError – If split is not a string or Tuple[str].

Examples

>>> root = "~/.mindnlp"
>>> split = ('train', 'valid', 'test')
>>> dataset_train, dataset_valid, dataset_test = PennTreebank(root, split)
>>> train_iter = dataset_train.create_tuple_iterator()
>>> print(next(train_iter))
[Tensor(shape=[], dtype=String, value= ' aer banknote berlitz calloway centrust \
    cluett fromstein gitano guterman hydro-quebec ipo kia memotec mlx nahb punts \
        rake regatta rubens sim snack-food ssangyong swapo wachter ')]

wikitext103

WikiText103 load function

mindnlp.dataset.text_generation.wikitext103.WikiText103(root: str = '/home/docs/checkouts/readthedocs.org/user_builds/mindnlpdoc/checkouts/latest/docs/.mindnlp', split: Union[Tuple[str], str] = ('train', 'valid', 'test'), proxies=None)[source]

Load the WikiText103 dataset

Parameters:
  • root (str) – Directory where the datasets are saved.

  • split (str|Tuple[str]) – Split or splits to be returned. Default:(‘train’, ‘valid’, ‘test’).

  • proxies (dict) – a dict to identify proxies,for example: {“https”: “https://127.0.0.1:7890”}.

Returns:

  • datasets_list (list) -A list of loaded datasets. If only one type of dataset is specified,such as ‘trian’, this dataset is returned instead of a list of datasets.

Raises:
  • TypeError – If root is not a string.

  • TypeError – If split is not a string or Tuple[str].

Examples

>>> root = "~/.mindnlp"
>>> split = ('train', 'valid', 'test')
>>> dataset_train, dataset_valid, dataset_test = WikiText103(root, split)
>>> train_iter = dataset_train.create_tuple_iterator()
>>> print(next(train_iter))
>>> print(next(train_iter))
[Tensor(shape=[], dtype=String, value= ' ')]
[Tensor(shape=[], dtype=String, value= ' = Valkyria Chronicles III = ')]

wikitext2

WikiText2 load function

mindnlp.dataset.text_generation.wikitext2.WikiText2(root: str = '/home/docs/checkouts/readthedocs.org/user_builds/mindnlpdoc/checkouts/latest/docs/.mindnlp', split: Union[Tuple[str], str] = ('train', 'valid', 'test'), proxies=None)[source]

Load the WikiText2 dataset

Parameters:
  • root (str) – Directory where the datasets are saved.

  • split (str|Tuple[str]) – Split or splits to be returned. Default:(‘train’, ‘valid’, ‘test’).

  • proxies (dict) – a dict to identify proxies,for example: {“https”: “https://127.0.0.1:7890”}.

Returns:

  • datasets_list (list) -A list of loaded datasets. If only one type of dataset is specified,such as ‘trian’, this dataset is returned instead of a list of datasets.

Raises:
  • TypeError – If root is not a string.

  • TypeError – If split is not a string or Tuple[str].

Examples

>>> root = "~/.mindnlp"
>>> split = ('train', 'valid', 'test')
>>> dataset_train, dataset_valid, dataset_test = WikiText2(root, split)
>>> train_iter = dataset_train.create_tuple_iterator()
>>> print(next(train_iter))
>>> print(next(train_iter))
[Tensor(shape=[], dtype=String, value= ' ')]
[Tensor(shape=[], dtype=String, value= ' = Valkyria Chronicles III = ')]