rics.misc#
Miscellaneous utility methods for Python applications.
Module Attributes
Output type for |
Functions
|
Format keyword arguments. |
Combine |
|
|
Retrieve the path of a local file, downloading it if needed. |
|
Get the public module of obj. |
|
Interpolate environment variables in a string s. |
|
Check if obj is serializable using Pickle. |
|
Get name of method or class. |
- interpolate_environment_variables(s: str, *, allow_nested: bool = True, allow_blank: bool = False) str [source]#
Interpolate environment variables in a string s.
This function replaces references to environment variables with the actual value of the variable, or a default if specified. The syntax is similar to Bash string interpolation; use
${<var>}
for mandatory variables, and${<var>:default}
for optional variables.- Parameters:
s – A string in which to interpolate.
allow_blank – If
True
, allow variables to be set but empty.allow_nested – If
True
allow using another environment variable as the default value. This option will not verify whether the actual values are interpolation-strings.
- Returns:
A copy of s, after environment variable interpolation.
- Raises:
ValueError – If nested variables are discovered (only when
allow_nested=False
).UnsetVariableError – If any required environment variables are unset or blank (only when
allow_blank=False
).
See also
The
rics.envinterp
module, which this function wraps.
- class GBFNReturnType#
Output type for
get_by_full_name()
when using one of instance_of and subclass_of.alias of TypeVar(‘GBFNReturnType’)
- get_by_full_name(name: str, default_module: str | module = None, *, instance_of: Literal[None] = None, subclass_of: Literal[None] = None) Any [source]#
- get_by_full_name(name: str, default_module: str | module = None, *, instance_of: Type[GBFNReturnType], subclass_of: Literal[None] = None) GBFNReturnType
- get_by_full_name(name: str, default_module: str | module = None, *, instance_of: Literal[None] = None, subclass_of: Type[GBFNReturnType]) Type[GBFNReturnType]
Combine
import_module()
andgetattr()
to retrieve items by name.- Parameters:
name – A name or fully qualified name.
default_module – A namespace to search if name is not fully qualified (contains no
'.'
-characters).instance_of – If given, perform
isinstance()
check on name.subclass_of – If given, perform
issubclass()
check on name.
- Returns:
An object with the fully qualified name name.
- Raises:
ValueError – If name does not contain any dots and
default_module=None
.ValueError – If both instance_of and subclass_of are given.
TypeError – If an
isinstance
orissubclass
check fails.
Examples
Retrieving a
numpy
function by name.>>> get_by_full_name("numpy.isnan") <ufunc 'isnan'>
Validating the return type. In the example below, we ensure that
logging.INFO
really is anint
, and that thelogging.Logger
class inherits fromlogging.Filterer
.>>> import logging >>> get_by_full_name("logging.INFO", instance_of=int) 20 >>> get_by_full_name("logging.Logger", subclass_of=logging.Filterer) <class 'logging.Logger'>
Falling back to builtins.
>>> get_by_full_name("int", default_module="builtins") <class 'int'>
- get_public_module(obj: Any, resolve_reexport: bool = False, include_name: bool = False) str [source]#
Get the public module of obj.
- Parameters:
obj – An object to resolve a public module for.
resolve_reexport – If
True
, traverse the module hierarchy and look for the earliest where obj is reexported. This may be expensive.include_name – If
True
, include the name of obj reexported from a parent module. The first instance found will be used if obj is reexported multiple times.
- Returns:
Public module of obj.
Examples
Public module of
pandas.DataFrame
.>>> from pandas import DataFrame as obj >>> get_public_module(obj) 'pandas.core.frame' >>> get_public_module(obj, resolve_reexport=True) 'pandas' >>> get_public_module(obj, resolve_reexport=True, include_name=True) 'pandas.DataFrame'
- Raises:
ValueError – If include_name is given without resolve_reexport.
See also
The analogous
get_by_full_name()
-function.
- tname(arg: Type[Any] | Any | None, prefix_classname: bool = False, attrs: str | Iterable[str] | None = 'func') str [source]#
Get name of method or class.
- Parameters:
arg – Something get a name for.
prefix_classname – If
True
, prepend the class name if arg belongs to a class.attrs – Attribute names to search for wrapped functions. The default, ‘func’, is the name used by the built-in
functools.partial()
wrapper. May cause infinite recursion.
- Returns:
A name for arg.
- Raises:
ValueError – If no name could be derived for arg.
- format_kwargs(kwargs: Mapping[str, Any], *, max_value_length: int = 80) str [source]#
Format keyword arguments.
- Parameters:
kwargs – Arguments to format.
max_value_length – If given, replace
repr(value)
withtname(value)
if repr is longer than max_value_length characters.
- Returns:
A string on the form ‘key0=repr(value0), key1=repr(value1)’.
- Raises:
ValueError – For keys in kwargs that are not valid Python argument names.
Examples
>>> format_kwargs({"an_int": 1, "a_string": "Hello!"}) "an_int=1, a_string='Hello!'"
- get_local_or_remote(file: str | PathLike, *, remote_root: str | PathLike, local_root: str | PathLike = '.', force: bool = False, postprocessor: Callable[[str], Any] | None = None, show_progress: bool = False) Path [source]#
Retrieve the path of a local file, downloading it if needed.
If file is not available at the local root path, it will be downloaded using requests.get. A postprocessor may be given in which case the name of the final file will be
local_root/<name-of-postprocessor>/file
. Removing a raw local file (i.e.local_root/file
) will invalidate postprocessed files as well.- Parameters:
file – A file to retrieve or download.
remote_root – Remote URL where the data may be retrieved using
requests.get
.local_root – Local directory where the file may be cached.
force – If
True
, always download and apply processing (if applicable). Existing files will be overwritten.postprocessor – A function which takes a single argument input_path and returns a pickleable type.
show_progress – If
True
, show a progress bar. Requires the tqdm package.
- Returns:
An absolute path to the data.
- Raises:
ValueError – If local root path does not exist or is not a directory.
ValueError – If the local file does not exist and
remote=None
.ModuleNotFoundError – If the
tqdm
package is not installed butshow_progress=True
.
Examples
Fetch the Title Basics table (a CSV file) of the IMDb dataset.
>>> from rics.misc import get_local_or_remote >>> import pandas as pd >>> >>> file = "name.basics.tsv.gz" >>> local_root = "my-data" # default = "." >>> remote_root = "https://datasets.imdbws.com" >>> path = get_local_or_remote( ... file, remote_root, local_root, show_progress=True ... ) >>> pd.read_csv(path, sep="\t").shape https://datasets.imdbws.com/name.basics.tsv.gz: 100%|██████████| 214M/214M [00:05<00:00, 39.3MiB/s] (11453719, 6)
We had download name.basics.tsv.gz the first time, but
get_local_or_remote
returns immediately the second time it is called. Fetching can be forced usingforce_remote=True
.>>> path = get_local_or_remote( ... file, remote_root, local_root, show_progress=True ... ) >>> pd.read_csv(path, sep="\t").shape (11453719, 6)