Utils
decompress
Decompress functions
- mindnlp.utils.decompress.ungz(file_path: str, unzip_path: Optional[str] = None)[source]
Untar .gz file
- Parameters:
file_path (str) – The path where the .gz file is located.
unzip_path (str) – The directory where the files were unzipped.
- Returns:
The directory where the files were unzipped.
- Return type:
unzip_path (str)
- Raises:
TypeError – If file_path is not a string.
TypeError – If untar_path is not a string.
- mindnlp.utils.decompress.untar(file_path: str, untar_path: str)[source]
Untar tar.gz file
- Parameters:
file_path (str) – The path where the tgz file is located.
multiple (str) – The directory where the files were unzipped.
- Returns:
names (list) -All filenames in the tar.gz file.
- Raises:
TypeError – If file_path is not a string.
TypeError – If untar_path is not a string.
Examples
>>> file_path = "./mindnlp/datasets/IWSLT2016/2016-01.tgz" >>> untar_path = "./mindnlp/datasets/IWSLT2016" >>> output = untar(file_path,untar_path) >>> print(output[0]) '2016-01'
- mindnlp.utils.decompress.unzip(file_path: str, unzip_path: str)[source]
Untar .zip file
- Parameters:
file_path (str) – The path where the .zip file is located.
unzip_path (str) – The directory where the files were unzipped.
- Returns:
names (list) -All filenames in the .zip file.
- Raises:
TypeError – If file_path is not a string.
TypeError – If untar_path is not a string.
download
Download functions
- mindnlp.utils.download.cache_file(filename: str, cache_dir: Optional[str] = None, url: Optional[str] = None, md5sum=None, download_file_name=None, proxies=None)[source]
If there is the file in cache_dir, return the path; if there is no such file, use the url to download.
- Parameters:
filename (str) – The name of the required dataset file.
cache_dir (str) – The path of save the file.
url (str) – The url of the required dataset file.
md5sum (str) – The true md5sum of download file.
download_file_name (str) – The name of the downloaded file.(This parameter is required if the end of the link is not the downloaded file name.)
proxies (dict) – a dict to identify proxies,for example: {“https”: “https://127.0.0.1:7890”}.
- Returns:
str, If path is a folder containing a file, return {path}{filename}; if path is a folder containing multiple files or a single file, return path.
- Raises:
TypeError – If filename is not a string.
TypeError – If cache_dir is not a string.
TypeError – If url is not a string.
RuntimeError – If filename is None.
Examples
>>> filename = 'aclImdb_v1' >>> path, filename = cache_file(filename) >>> print(path, filename) '{home}\.text' 'aclImdb_v1.tar.gz'
- mindnlp.utils.download.cached_path(filename_or_url: str, cache_dir: Optional[str] = None, md5sum=None, download_file_name=None, proxies=None)[source]
If there is the file in cache_dir, return the path; if there is no such file, use the url to download.
- Parameters:
filename_or_url (str) – The name or url of the required file .
cache_dir (str) – The path of save the file.
folder_name (str) – The additional folder to which the dataset is cached.(under the cache_dir)
md5sum (str) – The true md5sum of download file.
download_file_name (str) – The name of the downloaded file.(This parameter is required if the end of the link is not the downloaded file name.)
proxies (dict) – a dict to identify proxies,for example: {“https”: “https://127.0.0.1:7890”}.
- Returns:
str, If path is a folder containing a file, return {path}{filename}; if path is a folder containing multiple files or a single file, return path.
- Raises:
TypeError – If path is not a string.
RuntimeError – If path is None.
Examples
>>> path = "https://mindspore-website.obs.myhuaweicloud.com/notebook/datasets/aclImdb_v1.tar.gz" >>> path, filename = cached_path(path) >>> print(path, filename) '{home}\.text\aclImdb_v1.tar.gz' 'aclImdb_v1.tar.gz'
- mindnlp.utils.download.check_md5(filename: str, md5sum=None)[source]
Check md5 of download file.
- Parameters:
filename (str) – The fullname of download file.
md5sum (str) – The true md5sum of download file.
- Returns:
bool, the md5 check result.
- Raises:
TypeError – If filename is not a string.
RuntimeError – If filename is None.
Examples
>>> filename = 'test' >>> check_md5_result = check_md5(filename) True
- mindnlp.utils.download.get_cache_path()[source]
Get the storage path of the default cache. If the environment ‘cache_path’ is set, use the environment variable.
- Parameters:
None –
- Returns:
str, the path of default or the environment ‘cache_path’.
Examples
>>> default_cache_path = get_cache_path() >>> print(default_cache_path) '{home}\.mindnlp'
- mindnlp.utils.download.get_checkpoint_shard_files(index_filename, cache_dir=None, url=None, force_download=False, proxies=None)[source]
For a given model:
download and cache all the shards of a sharded checkpoint if pretrained_model_name_or_path is a model ID on the Hub
returns the list of paths to all the shards, as well as some metadata.
For the description of each arg, see [PreTrainedModel.from_pretrained]. index_filename is the full path to the index (downloaded and cached if pretrained_model_name_or_path is a model ID on the Hub).
- mindnlp.utils.download.get_filepath(path: str)[source]
Get the filepath of file.
- Parameters:
path (str) – The path of the required file.
- Returns:
str, If path is a folder containing a file, return {path}{filename}; if path is a folder containing multiple files or a single file, return path.
- Raises:
TypeError – If path is not a string.
RuntimeError – If path is None.
Examples
>>> path = '{home}\.text' >>> get_filepath_result = get_filepath(path) >>> print(get_filepath_result) '{home}\.text'
- mindnlp.utils.download.get_from_cache(url: str, cache_dir: Optional[str] = None, md5sum=None, download_file_name=None, proxies=None)[source]
If there is the file in cache_dir, return the path; if there is no such file, use the url to download.
- Parameters:
url (str) – The path to download the file.
cache_dir (str) – The path of save the file.
md5sum (str) – The true md5sum of download file.
download_file_name (str) – The name of the downloaded file.(This parameter is required if the end of the link is not the downloaded file name.)
proxies (dict) – a dict to identify proxies,for example: {“https”: “https://127.0.0.1:7890”}.
- Returns:
str, The path of save the downloaded file.
str, The name of downloaded file.
- Raises:
TypeError – If url is not a string.
TypeError – If cache_dir is not a Path.
RuntimeError – If url is None.
Examples
>>> path = "https://mindspore-website.obs.myhuaweicloud.com/notebook/datasets/aclImdb_v1.tar.gz" >>> path, filename = cached_path(path) >>> print(path, filename) '{home}\.text' 'aclImdb_v1.tar.gz'
- mindnlp.utils.download.http_get(url, path=None, md5sum=None, download_file_name=None, proxies=None)[source]
Download from given url, save to path.
- Parameters:
url (str) – download url
path (str) – download to given path (default value: ‘{home}.text’)
md5sum (str) – The true md5sum of download file.
download_file_name (str) – The name of the downloaded file.(This para meter is required if the end of the link is not the downloaded file name.)
proxies (dict) – a dict to identify proxies,for example: {“https”: “https://127.0.0.1:7890”}.
- Returns:
str, the path of default or the environment ‘cache_path’.
- Raises:
TypeError – If url is not a String.
RuntimeError – If url is None.
Examples
>>> url = 'https://mindspore-website.obs.myhuaweicloud.com/notebook/datasets/aclImdb_v1.tar.gz' >>> cache_path = http_get(url) >>> print(cache_path) ('{home}\.text', '{home}\aclImdb_v1.tar.gz')
- mindnlp.utils.download.match_file(filename: str, cache_dir: str) str[source]
If there is the file in cache_dir, return the path; otherwise, return empty string or error.
- Parameters:
filename (str) – The name of the required file.
cache_dir (str) – The path of save the file.
- Returns:
str, If there is the file in cache_dir, return filename; if there is no such file, return empty string ‘’; if there are two or more matching file, report an error.
- Raises:
TypeError – If filename is not a string.
TypeError – If cache_dir is not a string.
RuntimeError – If filename is None.
RuntimeError – If cache_dir is None.
Examples
>>> name = 'aclImdb_v1.tar.gz' >>> path = get_cache_path() >>> match_file_result = match_file(name, path)
- mindnlp.utils.download.try_to_load_from_cache(filename: str, cache_dir: Optional[Union[str, Path]] = None) Optional[str][source]
Explores the cache to return the latest cached file for a given revision if found.
This function will not raise any exception if the file in not cached.
- Parameters:
cache_dir (str or os.PathLike) – The folder where the cached files lie.
filename (str) – The filename to look for inside repo_id.
- Returns:
Will return None if the file was not cached. Otherwise: - The exact path to the cached file if it’s found in the cache - A special value _CACHED_NO_EXIST if the file does not exist at the given commit hash and this fact was
cached.
- Return type:
Optional[str] or _CACHED_NO_EXIST