rics.translation.fetching package
Submodules
rics.translation.fetching.exceptions module
Translation-specific exceptions.
- exception rics.translation.fetching.exceptions.FetcherError[source]
Bases:
RuntimeErrorBase class for fetcher exceptions.
- exception rics.translation.fetching.exceptions.ForbiddenOperationError(operation: str, reason: str = 'not supported by this fetcher.')[source]
Bases:
FetcherErrorException indicating that the fetcher does not support an operation.
- Parameters
operation – The operation which was not supported.
- exception rics.translation.fetching.exceptions.ImplementationError[source]
Bases:
FetcherErrorAn underlying implementation did something wrong.
- exception rics.translation.fetching.exceptions.UnknownPlaceholderError[source]
Bases:
FetcherErrorCaller requested unknown placeholder name(s).
- exception rics.translation.fetching.exceptions.UnknownIdError[source]
Bases:
FetcherErrorCaller requested unknown id(s).
- exception rics.translation.fetching.exceptions.UnknownSourceError[source]
Bases:
FetcherErrorCaller requested unknown source(s).
Module contents
Translation using external sources.
- class rics.translation.fetching.Fetcher(allow_fetch_all: bool = True, placeholder_overrides: Optional[Union[PlaceholderOverrides, Dict[str, Union[Dict[str, str], Dict[str, Dict[str, str]]]]]] = None)[source]
Bases:
ABC,Generic[NameType,IdType,SourceType]Base class for fetching translations from an external source.
Users who wish to define their own fetching logic should inherit this class, but there are implementations for common uses cases. See
PandasFetcherfor a versatile base fetcher orSqlFetcherfor a more specialized solution.- Parameters
allow_fetch_all – If False, an error will be raised when
fetch_all()is called.placeholder_overrides – Placeholder name overrides. Used to adapt placeholder names in sources to wanted names.
See also
Related classes:
rics.translation.offline.Format, the format specification.rics.translation.offline.TranslationMap, application of formats.rics.translation.Translator, the main user interface for translation.
- assert_online() None[source]
Raise an error if offline.
- Raises
OfflineError – If not online.
- abstract property sources: List[SourceType]
Source names known to the fetcher, such as
citiesorlanguages.
- property placeholder_overrides: Optional[PlaceholderOverrides]
Return the override.
- property allow_fetch_all: bool
Flag indicating whether the
fetch_all()operation is permitted.
- fetch(ids_to_fetch: Iterable[IdsToFetch], placeholders: Iterable[str] = (), required: Iterable[str] = ()) Dict[SourceType, PlaceholderTranslations][source]
Fetch translations.
- Parameters
ids_to_fetch – Tuples (source, ids) to fetch. If
ids=None, retrieve data for as many IDs as possible.placeholders – All desired placeholders in preferred order.
required – Placeholders that must be included in the response.
- Returns
A mapping
{source: PlaceholderTranslations}for translation.- Raises
UnknownPlaceholderError – For placeholder(s) that are unknown to the fetcher.
UnknownSourceError – For sources(s) that are unknown to the fetcher.
ForbiddenOperationError – If trying to fetch all IDs when not possible or permitted.
ImplementationError – For errors made by the inheriting implementation.
Notes
Placeholders are usually columns in relational database applications. These are the components which are combined to create ID translations. See
rics.translation.offline.Formatdocumentation for details.
- fetch_all(placeholders: Iterable[str] = (), required: Iterable[str] = ()) Dict[SourceType, PlaceholderTranslations][source]
Fetch as much data as possible.
- Parameters
placeholders – All desired placeholders in preferred order.
required – Placeholders that must be included in the response.
- Returns
A mapping
{source: PlaceholderTranslations}for translation.- Raises
ForbiddenOperationError – If fetching all IDs is not possible or permitted.
UnknownPlaceholderError – For placeholder(s) that are unknown to the fetcher.
ImplementationError – For errors made by the inheriting implementation.
- abstract fetch_placeholders(instruction: FetchInstruction) PlaceholderTranslations[source]
Fetch translations.
- Parameters
instruction – A single instruction for IDs to fetch. If IDs is None, the fetcher should retrieve data for as many IDs as possible.
- Returns
Placeholder translation elements.
- Raises
UnknownPlaceholderError – If the placeholder is unknown to the fetcher.
- classmethod make_and_verify(instr: FetchInstruction, known_placeholders: Collection[str], records: Sequence[Sequence[Any]]) PlaceholderTranslations[source]
Make a
PlaceholderTranslationsinstance from records.Convenience method meant for use by implementations.
- Parameters
instr – A fetch instruction.
known_placeholders – Known placeholders for the instr.source.
records – Records produced from the instruction.
- Returns
Placeholder translation elements.
- Raises
UnknownPlaceholderError – If required placeholders are missing.
ImplementationError – If the underlying fetcher does not return enough IDs.
- classmethod verify_placeholders(instr: FetchInstruction, known_placeholders: Collection[str]) None[source]
Verify required placeholders for a source.
Convenience method meant for use by implementations.
- Parameters
instr – A fetch instruction.
known_placeholders – Known placeholders for the instr.source.
- Raises
UnknownPlaceholderError – If required placeholders are missing.
- classmethod select_placeholders(instr: FetchInstruction, known_placeholders: Collection[str]) List[str][source]
Select as many known, requested placeholders as possible.
- Parameters
instr – A fetch instruction.
known_placeholders – Known placeholders for the instr.source.
- Returns
Known placeholders in the desired order.
- Raises
UnknownPlaceholderError – If required placeholders are missing.
- class rics.translation.fetching.MemoryFetcher(data: Union[Dict[SourceType, PlaceholderTranslations], Dict[SourceType, Union[PlaceholderTranslations, DataFrame, Dict[str, Sequence[Any]]]]], **kwargs: Any)[source]
Bases:
Fetcher[NameType,IdType,SourceType]Fetch from memory.
- Parameters
data – A dict {source: PlaceholderTranslations} to fetch from.
**kwargs – Forwarded to the base fetcher.
- fetch_placeholders(instr: FetchInstruction) PlaceholderTranslations[source]
Fetch columns from memory.
- class rics.translation.fetching.PandasFetcher(read_function: ~typing.Union[~typing.Callable[[~typing.Union[str, bytes, ~os.PathLike], ~typing.Any, ~typing.Any], ~pandas.core.frame.DataFrame], str] = <function read_pickle>, read_path_format: ~typing.Optional[~typing.Union[str, ~typing.Callable[[~typing.Union[str, bytes, ~os.PathLike]], str]]] = 'data/{}.pkl', read_function_args: ~typing.Optional[~typing.Iterable[~typing.Any]] = None, read_function_kwargs: ~typing.Optional[~typing.Mapping[str, ~typing.Any]] = None, **kwargs: ~typing.Any)[source]
Bases:
Fetcher[NameType,IdType,str]Fetcher using pandas DataFrames as the data format.
Fetch data from serialized DataFrames. How this is done is determined by the read_function. This is typically a Pandas function such as
pandas.read_csv()orpandas.read_pickle(), but any function that accepts a string source as the first argument and returns a data frame can be used.- Parameters
read_function – A Pandas read-function.
read_path_format – A formatting string or a callable to apply to a source before passing them to read_function. Must contain a source as its only placeholder. Example:
data/{source}.pkl. None=leave as-is.read_function_args – Additional positional arguments for read_function.
read_function_kwargs – Additional keyword arguments for read_function.
See also
The official Pandas IO documentation
- read(source_path: Union[str, bytes, PathLike]) DataFrame[source]
Read a DataFrame from a source path.
- Parameters
source_path – Path to serialized DataFrame.
- Returns
A deserialized DataFrame.
- find_sources() Dict[str, Path][source]
Search for source paths to pass to read_function using read_path_format.
- Returns
A dict {source, path}.
- Raises
IOError – If files cannot be read.
- fetch_placeholders(instr: FetchInstruction) PlaceholderTranslations[source]
Read data from disk.
- class rics.translation.fetching.SqlFetcher(connection_string: str, password: Optional[str] = None, whitelist_tables: Optional[Iterable[str]] = None, blacklist_tables: Optional[Iterable[str]] = None, include_views: bool = True, fetch_in_below: int = 1200, fetch_between_over: int = 10000, fetch_between_max_overfetch_factor: float = 2.5, fetch_all_limit: Optional[int] = 100000, **kwargs: Any)[source]
Bases:
Fetcher[str,IdType,str]Fetch data from a SQL source. Requires SQLAlchemy.
- Parameters
connection_string – A SQLAlchemy connection string. Read from environment variable if connection_string starts with ‘@’ followed by the name. Example:
@TRANSLATION_DB_CONNECTION_STRINGreads from the TRANSLATION_DB_CONNECTION_STRING environment variable.password – Password to insert into the connection string. Will be escaped to allow for special characters. If given, the connection string must contain a password key, eg;
dialect://user:{password}@host:port. Can be an environment variable just like connection_string.whitelist_tables – The only tables the fetcher may access.
blacklist_tables – The only tables the fetcher may not access.
include_views – If True, discover views as well.
fetch_in_below – Always use
IN-clause when fetching less than fetch_in_below IDs.fetch_between_over – Always use
BETWEEN-clause when fetching more than fetch_between_over IDs.fetch_between_max_overfetch_factor – If number of IDs to fetch is between fetch_in_below and fetch_between_over, use this factor to choose between
INandBETWEENclause.fetch_all_limit – Maximum size of table to allow a fetch all-operation. None=No limit, 0=never allow.
- Raises
ValueError – If both whitelist_tables and blacklist_tables are given.
- fetch_placeholders(instr: FetchInstruction) PlaceholderTranslations[source]
Fetch columns from a SQL database.
- property allow_fetch_all: bool
Flag indicating whether the
fetch_all()operation is permitted.
- classmethod parse_connection_string(connection_string: str, password: Optional[str]) str[source]
Parse a connection string. Read from environment if connection_string starts with ‘@’.
- get_approximate_table_size(table: Table) int[source]
Return the approximate size of a table.
Called only by the
make_table_summary()method during discovery. The default implementation performs a count on the ID column, which may be expensive.- Parameters
table – A table object.
- Returns
An approximate size for table.