rics.misc#
Miscellaneous utility methods for Python applications.
Functions
|
Combine |
|
Retrieve the path of a local file, downloading it if needed. |
|
Interpolate environment variables in a string s. |
|
Check if obj is serializable using Pickle. |
|
Get name of method or class. |
- interpolate_environment_variables(s: str, *, allow_nested: bool = True, allow_blank: bool = False) str[source]#
Interpolate environment variables in a string s.
This function replaces references to environment variables with the actual value of the variable, or a default if specified. The syntax is similar to Bash string interpolation; use
${<var>}for mandatory variables, and${<var>:default}for optional variables.- Parameters:
s – A string in which to interpolate.
allow_blank – If
True, allow variables to be set but empty.allow_nested – If
Trueallow using another environment variable as the default value. This option will not verify whether the actual values are interpolation-strings.
- Returns:
A copy of s, after environment variable interpolation.
- Raises:
ValueError – If nested variables are discovered (only when
allow_nested=False).UnsetVariableError – If any required environment variables are unset or blank (only when
allow_blank=False).
See also
The
rics.envinterpmodule, which this function wraps.
- get_by_full_name(name: str, default_module: Optional[Union[str, module]] = None) Any[source]#
Combine
import_module()andgetattr()to retrieve items by name.- Parameters:
name – A name or fully qualified name.
default_module – A namespace to search if name is not fully qualified (contains no
'.'-characters).
- Returns:
An object with the fully qualified name name.
- Raises:
ValueError – If name does not contain any dots and
default_module=None.
- tname(arg: Optional[Union[Type[Any], Any]], prefix_classname: bool = False) str[source]#
Get name of method or class.
- Parameters:
arg – Something get a name for.
prefix_classname – If
True, prepend the class name if arg belongs to a class.
- Returns:
A name for arg.
- Raises:
ValueError – If no name could be derived for arg.
- get_local_or_remote(file: Union[str, bytes, PathLike], remote_root: Union[str, bytes, PathLike], local_root: Union[str, bytes, PathLike] = '.', force: bool = False, postprocessor: Optional[Callable[[str], Any]] = None, show_progress: bool = False) Path[source]#
Retrieve the path of a local file, downloading it if needed.
If file is not available at the local root path, it will be downloaded using requests.get. A postprocessor may be given in which case the name of the final file will be
local_root/<name-of-postprocessor>/file. Removing a raw local file (i.e.local_root/file) will invalidate postprocessed files as well.- Parameters:
file – A file to retrieve or download.
remote_root – Remote URL where the data may be retrieved using
requests.get.local_root – Local directory where the file may be cached.
force – If
True, always download and apply processing (if applicable). Existing files will be overwritten.postprocessor – A function which takes a single argument input_path and returns a pickleable type.
show_progress – If
True, show a progress bar. Requires the tqdm package.
- Returns:
An absolute path to the data.
- Raises:
ValueError – If local root path does not exist or is not a directory.
ValueError – If the local file does not exist and
remote=None.ModuleNotFoundError – If the
tqdmpackage is not installed butshow_progress=True.
Examples
Fetch the Title Basics table (a CSV file) of the IMDb dataset.
>>> from rics.misc import get_local_or_remote >>> import pandas as pd >>> >>> file = "name.basics.tsv.gz" >>> local_root = "my-data" # default = "." >>> remote_root = "https://datasets.imdbws.com" >>> path = get_local_or_remote(file, remote_root, local_root, show_progress=True) >>> pd.read_csv(path, sep="\t").shape https://datasets.imdbws.com/name.basics.tsv.gz: 100%|██████████| 214M/214M [00:05<00:00, 39.3MiB/s] (11453719, 6)
We had download name.basics.tsv.gz the first time, but
get_local_or_remotereturns immediately the second time it is called. Fetching can be forced usingforce_remote=True.>>> path = get_local_or_remote(file, remote_root, local_root, show_progress=True) >>> pd.read_csv(path, sep="\t").shape (11453719, 6)