rics.misc#

Miscellaneous utility methods for Python applications.

Module Attributes

GBFNReturnType

Output type for get_by_full_name() when using one of instance_of and subclass_of.

Functions

format_kwargs(kwargs, *[, max_value_length])

Format keyword arguments.

get_by_full_name()

Combine import_module() and getattr() to retrieve items by name.

get_local_or_remote(file, *, remote_root[, ...])

Retrieve the path of a local file, downloading it if needed.

get_public_module(obj[, resolve_reexport, ...])

Get the public module of obj.

interpolate_environment_variables(s, *[, ...])

Interpolate environment variables in a string s.

serializable(obj)

Check if obj is serializable using Pickle.

tname(arg[, prefix_classname, attrs])

Get name of method or class.

interpolate_environment_variables(s: str, *, allow_nested: bool = True, allow_blank: bool = False) str[source]#

Interpolate environment variables in a string s.

This function replaces references to environment variables with the actual value of the variable, or a default if specified. The syntax is similar to Bash string interpolation; use ${<var>} for mandatory variables, and ${<var>:default} for optional variables.

Parameters:
  • s – A string in which to interpolate.

  • allow_blank – If True, allow variables to be set but empty.

  • allow_nested – If True allow using another environment variable as the default value. This option will not verify whether the actual values are interpolation-strings.

Returns:

A copy of s, after environment variable interpolation.

Raises:
  • ValueError – If nested variables are discovered (only when allow_nested=False).

  • UnsetVariableError – If any required environment variables are unset or blank (only when allow_blank=False).

See also

The rics.envinterp module, which this function wraps.

class GBFNReturnType#

Output type for get_by_full_name() when using one of instance_of and subclass_of.

alias of TypeVar(‘GBFNReturnType’)

get_by_full_name(name: str, default_module: str | module = None, *, instance_of: Literal[None] = None, subclass_of: Literal[None] = None) Any[source]#
get_by_full_name(name: str, default_module: str | module = None, *, instance_of: Type[GBFNReturnType], subclass_of: Literal[None] = None) GBFNReturnType
get_by_full_name(name: str, default_module: str | module = None, *, instance_of: Literal[None] = None, subclass_of: Type[GBFNReturnType]) Type[GBFNReturnType]

Combine import_module() and getattr() to retrieve items by name.

Parameters:
  • name – A name or fully qualified name.

  • default_module – A namespace to search if name is not fully qualified (contains no '.'-characters).

  • instance_of – If given, perform isinstance() check on name.

  • subclass_of – If given, perform issubclass() check on name.

Returns:

An object with the fully qualified name name.

Raises:
  • ValueError – If name does not contain any dots and default_module=None.

  • ValueError – If both instance_of and subclass_of are given.

  • TypeError – If an isinstance or issubclass check fails.

Examples

Retrieving a numpy function by name.

>>> get_by_full_name("numpy.isnan")
<ufunc 'isnan'>

Validating the return type. In the example below, we ensure that logging.INFO really is an int, and that the logging.Logger class inherits from logging.Filterer.

>>> import logging
>>> get_by_full_name("logging.INFO", instance_of=int)
20
>>> get_by_full_name("logging.Logger", subclass_of=logging.Filterer)
<class 'logging.Logger'>

Falling back to builtins.

>>> get_by_full_name("int", default_module="builtins")
<class 'int'>
get_public_module(obj: Any, resolve_reexport: bool = False, include_name: bool = False) str[source]#

Get the public module of obj.

Parameters:
  • obj – An object to resolve a public module for.

  • resolve_reexport – If True, traverse the module hierarchy and look for the earliest where obj is reexported. This may be expensive.

  • include_name – If True, include the name of obj reexported from a parent module. The first instance found will be used if obj is reexported multiple times.

Returns:

Public module of obj.

Examples

Public module of pandas.DataFrame.

>>> from pandas import DataFrame as obj
>>> get_public_module(obj)
'pandas.core.frame'
>>> get_public_module(obj, resolve_reexport=True)
'pandas'
>>> get_public_module(obj, resolve_reexport=True, include_name=True)
'pandas.DataFrame'
Raises:

ValueError – If include_name is given without resolve_reexport.

See also

The analogous get_by_full_name()-function.

tname(arg: Type[Any] | Any | None, prefix_classname: bool = False, attrs: str | Iterable[str] | None = 'func') str[source]#

Get name of method or class.

Parameters:
  • arg – Something get a name for.

  • prefix_classname – If True, prepend the class name if arg belongs to a class.

  • attrs – Attribute names to search for wrapped functions. The default, ‘func’, is the name used by the built-in functools.partial() wrapper. May cause infinite recursion.

Returns:

A name for arg.

Raises:

ValueError – If no name could be derived for arg.

format_kwargs(kwargs: Mapping[str, Any], *, max_value_length: int = 80) str[source]#

Format keyword arguments.

Parameters:
  • kwargs – Arguments to format.

  • max_value_length – If given, replace repr(value) with tname(value) if repr is longer than max_value_length characters.

Returns:

A string on the form ‘key0=repr(value0), key1=repr(value1)’.

Raises:

ValueError – For keys in kwargs that are not valid Python argument names.

Examples

>>> format_kwargs({"an_int": 1, "a_string": "Hello!"})
"an_int=1, a_string='Hello!'"
get_local_or_remote(file: str | PathLike, *, remote_root: str | PathLike, local_root: str | PathLike = '.', force: bool = False, postprocessor: Callable[[str], Any] | None = None, show_progress: bool = False) Path[source]#

Retrieve the path of a local file, downloading it if needed.

If file is not available at the local root path, it will be downloaded using requests.get. A postprocessor may be given in which case the name of the final file will be local_root/<name-of-postprocessor>/file. Removing a raw local file (i.e. local_root/file) will invalidate postprocessed files as well.

Parameters:
  • file – A file to retrieve or download.

  • remote_root – Remote URL where the data may be retrieved using requests.get.

  • local_root – Local directory where the file may be cached.

  • force – If True, always download and apply processing (if applicable). Existing files will be overwritten.

  • postprocessor – A function which takes a single argument input_path and returns a pickleable type.

  • show_progress – If True, show a progress bar. Requires the tqdm package.

Returns:

An absolute path to the data.

Raises:
  • ValueError – If local root path does not exist or is not a directory.

  • ValueError – If the local file does not exist and remote=None.

  • ModuleNotFoundError – If the tqdm package is not installed but show_progress=True.

Examples

Fetch the Title Basics table (a CSV file) of the IMDb dataset.

>>> from rics.misc import get_local_or_remote
>>> import pandas as pd
>>>
>>> file = "name.basics.tsv.gz"
>>> local_root = "my-data"  # default = "."
>>> remote_root = "https://datasets.imdbws.com"
>>> path = get_local_or_remote(
...     file, remote_root, local_root, show_progress=True
... )  
>>> pd.read_csv(path, sep="\t").shape  
https://datasets.imdbws.com/name.basics.tsv.gz: 100%|██████████| 214M/214M [00:05<00:00, 39.3MiB/s]
(11453719, 6)

We had download name.basics.tsv.gz the first time, but get_local_or_remote returns immediately the second time it is called. Fetching can be forced using force_remote=True.

>>> path = get_local_or_remote(
...     file, remote_root, local_root, show_progress=True
... )  
>>> pd.read_csv(path, sep="\t").shape  
(11453719, 6)
serializable(obj: object) bool[source]#

Check if obj is serializable using Pickle.

Parameters:

obj – Object to test.

Returns:

True if obj was pickled without issues.