rics.utility.misc#
Miscellaneous utility methods for Python applications.
Functions
|
Combine |
|
Retrieve the path of a local file, downloading it if needed. |
|
Read an environment variable if arg if prefixed by env_marker, otherwise return arg as-is. |
|
Check if obj is serializable using Pickle. |
|
Get name of method or class. |
- get_by_full_name(name: str, default_module: Optional[Union[str, module]] = None) Any[source]#
Combine
import_module()andgetattr()to retrieve items by name.- Parameters
name – A name or fully qualified name.
default_module – A namespace to search for name if name is not fully qualified, ie it contains no
'-characters).
- Returns
An object with the fully qualified name name.
- Raises
ValueError – If name does not contain any dots and
default_module=None.
- tname(arg: Optional[Union[Type[Any], Any, Callable]]) str[source]#
Get name of method or class.
- Parameters
arg – Something get a name for.
- Returns
A type name.
- get_local_or_remote(file: Union[str, bytes, PathLike], remote_root: Union[str, bytes, PathLike], local_root: Union[str, bytes, PathLike] = '.', force: bool = False, postprocessor: Optional[Callable[[str], Any]] = None, show_progress: bool = False) Path[source]#
Retrieve the path of a local file, downloading it if needed.
If file is not available at the local root path, it will be downloaded using requests.get. A postprocessor may be given in which case the name of the final file will be
local_root/<name-of-postprocessor>/file. Removing a raw local file (ielocal_root/file) will invalidate postprocessed files as well.- Parameters
file – A file to retrieve or download.
remote_root – Remote URL where the data may be retrieved using
requests.get.local_root – Local directory where the file may be cached.
force – If
True, always download and apply processing (if applicable). Existing files will be overwritten.postprocessor – A function which takes a single argument input_path and returns a pickleable type.
show_progress – If
True, show a progress bar. Requires the tqdm package.
- Returns
An absolute path to the data.
- Raises
ValueError – If local root path does not exist or is not a directory.
ValueError – If the local file does not exist and
remote=None.ModuleNotFoundError – If the
tqdmpackage is not installed butshow_progress=True.
Examples
Fetch the Title Basics table (a CSV file) of the IMDb dataset.
>>> from rics.utility.misc import get_local_or_remote >>> import pandas as pd >>> >>> file = "name.basics.tsv.gz" >>> local_root = "my-data" # default = "." >>> remote_root = "https://datasets.imdbws.com" >>> path = get_local_or_remote(file, remote_root, local_root, show_progress=True) >>> pd.read_csv(path, sep="\t").shape https://datasets.imdbws.com/name.basics.tsv.gz: 100%|██████████| 214M/214M [00:05<00:00, 39.3MiB/s] (11453719, 6)
We had download name.basics.tsv.gz the first time, but
get_local_or_remotereturns immediately the second time it is called. Fetching can be forced usingforce_remote=True.>>> path = get_local_or_remote(file, remote_root, local_root, show_progress=True) >>> pd.read_csv(path, sep="\t").shape (11453719, 6)
- read_env_or_literal(arg: str, default: ~typing.Union[~typing.Literal[<_NoDefault.NO_DEFAULT: '<no-default>'>], str] = _NoDefault.NO_DEFAULT, env_marker: str = '@') str[source]#
Read an environment variable if arg if prefixed by env_marker, otherwise return arg as-is.
- Parameters
arg – A literal value or environment variable to read.
env_marker – A prefix which indicates that arg should be interpreted as environment variable name.
default – Default value to use if the variable denoted by arg doesn’t exist.
- Returns
A processed version arg where the final response is
ans_type(processed-arg).- Raises
ValueError – If arg does not start with env_marker and enforce_env_var is
True.
Notes
The constructor of desired_return_type may raise errors not listed here.