rics.misc#
Miscellaneous utility methods for Python applications.
Module Attributes
Output type for |
Functions
|
Format keyword arguments. |
Combine |
|
|
Retrieve the path of a local file, downloading it if needed. |
|
Get the public module of obj. |
|
|
|
Check if obj is serializable using Pickle. |
|
Get name of method or class. |
- interpolate_environment_variables(s: str, *, allow_nested: bool = True, allow_blank: bool = False) str[source]#
- class GBFNReturnType#
Output type for
get_by_full_name()when using one of instance_of and subclass_of.alias of TypeVar(‘GBFNReturnType’)
- get_by_full_name(name: str, default_module: str | ModuleType | object = None, *, instance_of: Literal[None] = None, subclass_of: Literal[None] = None) Any[source]#
- get_by_full_name(name: str, default_module: str | ModuleType | object = None, *, instance_of: type[GBFNReturnType], subclass_of: Literal[None] = None) GBFNReturnType
- get_by_full_name(name: str, default_module: str | ModuleType | object = None, *, instance_of: Literal[None] = None, subclass_of: type[GBFNReturnType]) type[GBFNReturnType]
Combine
importlib.import_module()andgetattr()to retrieve items by name.You may retrieve top-level module members such as classes by adding the class name after the module path (e.g.
logging.Logger). The member may also be specifying using the':'-syntax, e.g.logging:Logger. The':'-syntax also supports retrieving object members e.g.logging:Logger.info. This is not possible using the dot-based syntax sincelogging.Loggercannot be imported by theimportlib.import_modulefunction.- Parameters:
name – A name or fully qualified name.
default_module – A namespace to search if name does not contain any
'.'or':'characters.instance_of – If given, perform
isinstance()check on name.subclass_of – If given, perform
issubclass()check on name.
- Returns:
The object specified by name.
- Raises:
ValueError – If name does not contain any dots and
default_module=None.ValueError – If both instance_of and subclass_of are given.
TypeError – If an
isinstanceorissubclasscheck fails.
Examples
Retrieving a
numpyfunction by name.>>> get_by_full_name("numpy.isnan") <ufunc 'isnan'>
Validating the return type. In the example below, we ensure that
logging.INFOreally is anint, and that thelogging.Loggerclass inherits fromlogging.Filterer.>>> import logging >>> get_by_full_name("logging.INFO", instance_of=int) 20 >>> get_by_full_name("logging.Logger", subclass_of=logging.Filterer) <class 'logging.Logger'>
Default namespaces may be specified using the default_module keyword argument.
>>> get_by_full_name("Logger", default_module="logging") <class 'logging.Logger'>
The default namespace doesn’t have to be a module.
>>> get_by_full_name("info", default_module="logging.Logger").__qualname__ 'Logger.info'
To retrieve an attribute of anything other than a module (e.g. a class member), you must use the
:-syntax to separate the module path from the attribute path.>>> get_by_full_name("logging:Logger.info").__qualname__ 'Logger.info'
When using this syntax, the path to the left of the separator must be a valid module path.
- get_public_module(obj: Any, resolve_reexport: bool = False, include_name: bool = False) str[source]#
Get the public module of obj.
- Parameters:
obj – An object to resolve a public module for.
resolve_reexport – If
True, traverse the module hierarchy and look for the earliest where obj is reexported. This may be expensive.include_name – If
True, include the name of obj reexported from a parent module. The first instance found will be used if obj is reexported multiple times.
- Returns:
Public module of obj.
Examples
Public module of
pandas.DataFrame.>>> from pandas import DataFrame as obj >>> get_public_module(obj) 'pandas.core.frame' >>> get_public_module(obj, resolve_reexport=True) 'pandas' >>> get_public_module(obj, resolve_reexport=True, include_name=True) 'pandas.DataFrame'
- Raises:
ValueError – If include_name is given without resolve_reexport.
See also
The analogous
get_by_full_name()-function.
- tname(arg: type[Any] | Any | None, prefix_classname: bool = False, attrs: str | Iterable[str] | None = 'func', include_module: bool = False) str[source]#
Get name of method or class.
- Parameters:
arg – Something get a name for.
prefix_classname – If
True, prepend the class name if arg belongs to a class.attrs – Attribute names to search for wrapped functions. The default, ‘func’, is the name used by the built-in
functools.partial()wrapper. May cause infinite recursion.include_module – If
True, prepend the public module (seeget_public_module()).
- Returns:
A name for arg.
- Raises:
ValueError – If no name could be derived for arg.
- format_kwargs(kwargs: Mapping[str, Any], *, max_value_length: int = 80) str[source]#
Format keyword arguments.
- Parameters:
kwargs – Arguments to format.
max_value_length – Replace value with class name above this limit. 0=no limit.
- Returns:
A string on the form ‘key0=repr(value0), key1=repr(value1)’.
- Raises:
ValueError – For keys in kwargs that are not valid Python argument names.
Examples
Basic usage.
>>> format_kwargs({"an_int": 1, "a_string": "Hello!"}) "an_int=1, a_string='Hello!'"
- get_local_or_remote(file: str | PathLike[str] | Path, *, remote_root: str | PathLike[str] | Path, local_root: str | PathLike[str] | Path = '.', force: bool = False, postprocessor: Callable[[str], Any] | None = None, show_progress: bool | None = None) Path[source]#
Retrieve the path of a local file, downloading it if needed.
If file is not available at the local root path, it will be downloaded using requests. A postprocessor may be given in which case the name of the final file will be
local_root/<name-of-postprocessor>/file. Removing a raw local file (i.e.local_root/file) will invalidate postprocessed files as well.- Parameters:
file – A file to retrieve or download.
remote_root – Remote URL where the data may be retrieved using
requests.get.local_root – Local directory where the file may be cached.
force – If
True, always download and apply processing (if applicable). Existing files will be overwritten.postprocessor – A function which takes a single argument input_path and returns a pickleable type.
show_progress – If
True, show a progress bar. Requires the tqdm package. IfNone, use only if TQDM is installed.
- Returns:
An absolute path to the data.
- Raises:
ValueError – If local root path does not exist or is not a directory.
ValueError – If the local file does not exist and
remote=None.ModuleNotFoundError – If the
tqdmpackage is not installed butshow_progress=True.
Examples
Fetch the Title Basics table (a CSV file) of the IMDb dataset.
>>> from rics.misc import get_local_or_remote >>> import pandas as pd >>> >>> file = "name.basics.tsv.gz" >>> local_root = "my-data" # default = "." >>> remote_root = "https://datasets.imdbws.com" >>> path = get_local_or_remote( ... file, remote_root, local_root, show_progress=True ... ) >>> pd.read_csv(path, sep="\t").shape https://datasets.imdbws.com/name.basics.tsv.gz: 100%|██████████| 214M/214M [00:05<00:00, 39.3MiB/s] (11453719, 6)
We had download name.basics.tsv.gz the first time, but
get_local_or_remotereturns immediately the second time it is called. Fetching can be forced usingforce_remote=True.>>> path = get_local_or_remote( ... file, remote_root, local_root, show_progress=True ... ) >>> pd.read_csv(path, sep="\t").shape (11453719, 6)