rics.utility.misc#

Miscellaneous utility methods for Python applications.

Functions

get_by_full_name(name[, default_module])

Combine import_module() and getattr() to retrieve items by name.

get_local_or_remote(file, remote_root[, ...])

Retrieve the path of a local file, downloading it if needed.

read_env_or_literal(arg[, default, env_marker])

Read an environment variable if arg if prefixed by env_marker, otherwise return arg as-is.

serializable(obj)

Check if obj is serializable using Pickle.

tname(arg)

Get name of method or class.

get_by_full_name(name: str, default_module: Optional[Union[str, module]] = None) Any[source]#

Combine import_module() and getattr() to retrieve items by name.

Parameters
  • name – A name or fully qualified name.

  • default_module – A namespace to search if name is not fully qualified (contains no '.'-characters).

Returns

An object with the fully qualified name name.

Raises

ValueError – If name does not contain any dots and default_module=None.

tname(arg: Optional[Union[Type[Any], Any, Callable]]) str[source]#

Get name of method or class.

Parameters

arg – Something get a name for.

Returns

A type name.

get_local_or_remote(file: Union[str, bytes, PathLike], remote_root: Union[str, bytes, PathLike], local_root: Union[str, bytes, PathLike] = '.', force: bool = False, postprocessor: Optional[Callable[[str], Any]] = None, show_progress: bool = False) Path[source]#

Retrieve the path of a local file, downloading it if needed.

If file is not available at the local root path, it will be downloaded using requests.get. A postprocessor may be given in which case the name of the final file will be local_root/<name-of-postprocessor>/file. Removing a raw local file (i.e. local_root/file) will invalidate postprocessed files as well.

Parameters
  • file – A file to retrieve or download.

  • remote_root – Remote URL where the data may be retrieved using requests.get.

  • local_root – Local directory where the file may be cached.

  • force – If True, always download and apply processing (if applicable). Existing files will be overwritten.

  • postprocessor – A function which takes a single argument input_path and returns a pickleable type.

  • show_progress – If True, show a progress bar. Requires the tqdm package.

Returns

An absolute path to the data.

Raises
  • ValueError – If local root path does not exist or is not a directory.

  • ValueError – If the local file does not exist and remote=None.

  • ModuleNotFoundError – If the tqdm package is not installed but show_progress=True.

Examples

Fetch the Title Basics table (a CSV file) of the IMDb dataset.

>>> from rics.utility.misc import get_local_or_remote
>>> import pandas as pd
>>>
>>> file = "name.basics.tsv.gz"
>>> local_root = "my-data"  # default = "."
>>> remote_root = "https://datasets.imdbws.com"
>>> path = get_local_or_remote(file, remote_root, local_root, show_progress=True) 
>>> pd.read_csv(path, sep="\t").shape 
https://datasets.imdbws.com/name.basics.tsv.gz: 100%|██████████| 214M/214M [00:05<00:00, 39.3MiB/s]
(11453719, 6)

We had download name.basics.tsv.gz the first time, but get_local_or_remote returns immediately the second time it is called. Fetching can be forced using force_remote=True.

>>> path = get_local_or_remote(file, remote_root, local_root, show_progress=True) 
>>> pd.read_csv(path, sep="\t").shape 
(11453719, 6)
read_env_or_literal(arg: str, default: ~typing.Union[~typing.Literal[<_NoDefault.NO_DEFAULT: '<no-default>'>], str] = _NoDefault.NO_DEFAULT, env_marker: str = '@') str[source]#

Read an environment variable if arg if prefixed by env_marker, otherwise return arg as-is.

Parameters
  • arg – A literal value or environment variable to read.

  • env_marker – A prefix which indicates that arg should be interpreted as environment variable name.

  • default – Default value to use if the variable denoted by arg doesn’t exist.

Returns

A processed version arg where the final response is ans_type(processed-arg).

Raises

ValueError – If arg does not start with env_marker and enforce_env_var is True.

Notes

The constructor of desired_return_type may raise errors not listed here.

serializable(obj: Any) bool[source]#

Check if obj is serializable using Pickle.

Serializes to memory for speed.

Parameters

obj – An object to attempt to serialize.

Returns

True if obj was pickled without issues.