rics.translation.fetching package

Submodules

rics.translation.fetching.exceptions module

Translation-specific exceptions.

exception rics.translation.fetching.exceptions.FetcherError[source]

Bases: RuntimeError

Base class for fetcher exceptions.

exception rics.translation.fetching.exceptions.ForbiddenOperationError(operation: str, reason: str = 'not supported by this fetcher.')[source]

Bases: FetcherError

Exception indicating that the fetcher does not support an operation.

Parameters

operation – The operation which was not supported.

exception rics.translation.fetching.exceptions.ImplementationError[source]

Bases: FetcherError

An underlying implementation did something wrong.

exception rics.translation.fetching.exceptions.UnknownPlaceholderError[source]

Bases: FetcherError

Caller requested unknown placeholder name(s).

exception rics.translation.fetching.exceptions.UnknownIdError[source]

Bases: FetcherError

Caller requested unknown id(s).

exception rics.translation.fetching.exceptions.UnknownSourceError[source]

Bases: FetcherError

Caller requested unknown source(s).

Module contents

Translation using external sources.

class rics.translation.fetching.Fetcher(allow_fetch_all: bool = True, placeholder_overrides: Optional[Union[PlaceholderOverrides, Dict[str, Union[Dict[str, str], Dict[str, Dict[str, str]]]]]] = None)[source]

Bases: ABC, Generic[NameType, IdType, SourceType]

Base class for fetching translations from an external source.

Users who wish to define their own fetching logic should inherit this class, but there are implementations for common uses cases. See PandasFetcher for a versatile base fetcher or SqlFetcher for a more specialized solution.

Parameters
  • allow_fetch_all – If False, an error will be raised when fetch_all() is called.

  • placeholder_overrides – Placeholder name overrides. Used to adapt placeholder names in sources to wanted names.

See also

Related classes:

property online: bool

Return connectivity status. If False, no new translations may be fetched.

assert_online() None[source]

Raise an error if offline.

Raises

OfflineError – If not online.

abstract property sources: List[SourceType]

Source names known to the fetcher, such as cities or languages.

property placeholder_overrides: Optional[PlaceholderOverrides]

Return the override.

property allow_fetch_all: bool

Flag indicating whether the fetch_all() operation is permitted.

fetch(ids_to_fetch: Iterable[IdsToFetch], placeholders: Iterable[str] = (), required: Iterable[str] = ()) Dict[SourceType, PlaceholderTranslations][source]

Fetch translations.

Parameters
  • ids_to_fetch – Tuples (source, ids) to fetch. If ids=None, retrieve data for as many IDs as possible.

  • placeholders – All desired placeholders in preferred order.

  • required – Placeholders that must be included in the response.

Returns

A mapping {source: PlaceholderTranslations} for translation.

Raises

Notes

Placeholders are usually columns in relational database applications. These are the components which are combined to create ID translations. See rics.translation.offline.Format documentation for details.

fetch_all(placeholders: Iterable[str] = (), required: Iterable[str] = ()) Dict[SourceType, PlaceholderTranslations][source]

Fetch as much data as possible.

Parameters
  • placeholders – All desired placeholders in preferred order.

  • required – Placeholders that must be included in the response.

Returns

A mapping {source: PlaceholderTranslations} for translation.

Raises
abstract fetch_placeholders(instruction: FetchInstruction) PlaceholderTranslations[source]

Fetch translations.

Parameters

instruction – A single instruction for IDs to fetch. If IDs is None, the fetcher should retrieve data for as many IDs as possible.

Returns

Placeholder translation elements.

Raises

UnknownPlaceholderError – If the placeholder is unknown to the fetcher.

close() None[source]

Close the fetcher. Does nothing by default.

get_id_placeholder(source: SourceType) str[source]

Get the ID placeholder name for source.

classmethod make_and_verify(instr: FetchInstruction, known_placeholders: Collection[str], records: Sequence[Sequence[Any]]) PlaceholderTranslations[source]

Make a PlaceholderTranslations instance from records.

Convenience method meant for use by implementations.

Parameters
  • instr – A fetch instruction.

  • known_placeholders – Known placeholders for the instr.source.

  • records – Records produced from the instruction.

Returns

Placeholder translation elements.

Raises
classmethod verify_placeholders(instr: FetchInstruction, known_placeholders: Collection[str]) None[source]

Verify required placeholders for a source.

Convenience method meant for use by implementations.

Parameters
  • instr – A fetch instruction.

  • known_placeholders – Known placeholders for the instr.source.

Raises

UnknownPlaceholderError – If required placeholders are missing.

classmethod select_placeholders(instr: FetchInstruction, known_placeholders: Collection[str]) List[str][source]

Select as many known, requested placeholders as possible.

Parameters
  • instr – A fetch instruction.

  • known_placeholders – Known placeholders for the instr.source.

Returns

Known placeholders in the desired order.

Raises

UnknownPlaceholderError – If required placeholders are missing.

class rics.translation.fetching.MemoryFetcher(data: Union[Dict[SourceType, PlaceholderTranslations], Dict[SourceType, Union[PlaceholderTranslations, DataFrame, Dict[str, Sequence[Any]]]]], **kwargs: Any)[source]

Bases: Fetcher[NameType, IdType, SourceType]

Fetch from memory.

Parameters
  • data – A dict {source: PlaceholderTranslations} to fetch from.

  • **kwargs – Forwarded to the base fetcher.

property sources: List[SourceType]

Get keys in data as a list.

fetch_placeholders(instr: FetchInstruction) PlaceholderTranslations[source]

Fetch columns from memory.

class rics.translation.fetching.PandasFetcher(read_function: ~typing.Union[~typing.Callable[[~typing.Union[str, bytes, ~os.PathLike], ~typing.Any, ~typing.Any], ~pandas.core.frame.DataFrame], str] = <function read_pickle>, read_path_format: ~typing.Optional[~typing.Union[str, ~typing.Callable[[~typing.Union[str, bytes, ~os.PathLike]], str]]] = 'data/{}.pkl', read_function_args: ~typing.Optional[~typing.Iterable[~typing.Any]] = None, read_function_kwargs: ~typing.Optional[~typing.Mapping[str, ~typing.Any]] = None, **kwargs: ~typing.Any)[source]

Bases: Fetcher[NameType, IdType, str]

Fetcher using pandas DataFrames as the data format.

Fetch data from serialized DataFrames. How this is done is determined by the read_function. This is typically a Pandas function such as pandas.read_csv() or pandas.read_pickle(), but any function that accepts a string source as the first argument and returns a data frame can be used.

Parameters
  • read_function – A Pandas read-function.

  • read_path_format – A formatting string or a callable to apply to a source before passing them to read_function. Must contain a source as its only placeholder. Example: data/{source}.pkl. None=leave as-is.

  • read_function_args – Additional positional arguments for read_function.

  • read_function_kwargs – Additional keyword arguments for read_function.

See also

The official Pandas IO documentation

read(source_path: Union[str, bytes, PathLike]) DataFrame[source]

Read a DataFrame from a source path.

Parameters

source_path – Path to serialized DataFrame.

Returns

A deserialized DataFrame.

find_sources() Dict[str, Path][source]

Search for source paths to pass to read_function using read_path_format.

Returns

A dict {source, path}.

Raises

IOError – If files cannot be read.

property sources: List[str]

Source names known to the fetcher, such as cities or languages.

fetch_placeholders(instr: FetchInstruction) PlaceholderTranslations[source]

Read data from disk.

class rics.translation.fetching.SqlFetcher(connection_string: str, password: Optional[str] = None, whitelist_tables: Optional[Iterable[str]] = None, blacklist_tables: Optional[Iterable[str]] = None, include_views: bool = True, fetch_in_below: int = 1200, fetch_between_over: int = 10000, fetch_between_max_overfetch_factor: float = 2.5, fetch_all_limit: Optional[int] = 100000, **kwargs: Any)[source]

Bases: Fetcher[str, IdType, str]

Fetch data from a SQL source. Requires SQLAlchemy.

Parameters
  • connection_string – A SQLAlchemy connection string. Read from environment variable if connection_string starts with ‘@’ followed by the name. Example: @TRANSLATION_DB_CONNECTION_STRING reads from the TRANSLATION_DB_CONNECTION_STRING environment variable.

  • password – Password to insert into the connection string. Will be escaped to allow for special characters. If given, the connection string must contain a password key, eg; dialect://user:{password}@host:port. Can be an environment variable just like connection_string.

  • whitelist_tables – The only tables the fetcher may access.

  • blacklist_tables – The only tables the fetcher may not access.

  • include_views – If True, discover views as well.

  • fetch_in_below – Always use IN-clause when fetching less than fetch_in_below IDs.

  • fetch_between_over – Always use BETWEEN-clause when fetching more than fetch_between_over IDs.

  • fetch_between_max_overfetch_factor – If number of IDs to fetch is between fetch_in_below and fetch_between_over, use this factor to choose between IN and BETWEEN clause.

  • fetch_all_limit – Maximum size of table to allow a fetch all-operation. None=No limit, 0=never allow.

Raises

ValueError – If both whitelist_tables and blacklist_tables are given.

sanitize_id(arg: IdType) IdType[source]

Sanitize an input.

static sanitize_table(table: str) str[source]

Sanitize a table name.

fetch_placeholders(instr: FetchInstruction) PlaceholderTranslations[source]

Fetch columns from a SQL database.

property online: bool

Return connectivity status. If False, no new translations may be fetched.

property sources: List[str]

Source names known to the fetcher, such as cities or languages.

property allow_fetch_all: bool

Flag indicating whether the fetch_all() operation is permitted.

close() None[source]

Close database connection.

classmethod parse_connection_string(connection_string: str, password: Optional[str]) str[source]

Parse a connection string. Read from environment if connection_string starts with ‘@’.

make_table_summary(table: Table) TableSummary[source]

Create a table summary.

get_approximate_table_size(table: Table) int[source]

Return the approximate size of a table.

Called only by the make_table_summary() method during discovery. The default implementation performs a count on the ID column, which may be expensive.

Parameters

table – A table object.

Returns

An approximate size for table.

get_metadata() MetaData[source]

Create a populated metadata object.