rics.translation.fetching#

Translation using external sources.

Classes

Fetcher(*args, **kwds)

Interface for fetching translations from an external source.

AbstractFetcher([mapper, allow_fetch_all])

Base class for retrieving translations from an external source.

MemoryFetcher(data, **kwargs)

Fetch from memory.

MultiFetcher(*fetchers[, max_workers, ...])

Fetcher which combines the results of other fetchers.

PandasFetcher(read_function, bytes, ...)

Fetcher implementation using pandas DataFrame s as the data format.

SqlFetcher(connection_string[, password, ...])

Fetch data from a SQL source.

class Fetcher(*args, **kwds)[source]#

Bases: ABC, Generic[SourceType, IdType]

Interface for fetching translations from an external source.

abstract property allow_fetch_all: bool#

Flag indicating whether the fetch_all() operation is permitted.

close() None[source]#

Close the Fetcher. Does nothing by default.

abstract property online: bool#

Return connectivity status. If False, no new translations may be fetched.

abstract property sources: List[SourceType]#

Source names known to the Fetcher, such as cities or languages.

abstract property placeholders: Dict[SourceType, List[str]]#

Placeholders for sources managed by the Fetcher.

Returns

A dict {source: [placeholders..]}.

Notes

Placeholders (and sources) are returned as they appear as they are known to the fetcher (without mapping).

abstract fetch(ids_to_fetch: Iterable[IdsToFetch], placeholders: Iterable[str] = (), required: Iterable[str] = ()) Dict[SourceType, PlaceholderTranslations][source]#

Retrieve placeholder translations from the source.

Parameters
  • ids_to_fetch – Tuples (source, ids) to fetch. If ids=None, retrieve data for as many IDs as possible.

  • placeholders – All desired placeholders in preferred order.

  • required – Placeholders that must be included in the response.

Returns

A mapping {source: PlaceholderTranslations} for translation.

Raises

Notes

Placeholders are usually columns in relational database applications. These are the components which are combined to create ID translations. See rics.translation.offline.Format documentation for details.

abstract fetch_all(placeholders: Iterable[str] = (), required: Iterable[str] = ()) Dict[SourceType, PlaceholderTranslations][source]#

Fetch as much data as possible.

Parameters
  • placeholders – All desired placeholders in preferred order.

  • required – Placeholders that must be included in the response.

Returns

A mapping {source: PlaceholderTranslations} for translation.

Raises
class AbstractFetcher(mapper: Optional[Mapper[str, str, SourceType]] = None, allow_fetch_all: bool = True)[source]#

Bases: Fetcher[SourceType, IdType]

Base class for retrieving translations from an external source.

Users who wish to define their own fetching logic should inherit this class, but there are implementations for common uses cases. See PandasFetcher for a versatile base fetcher, or SqlFetcher for a more specialized solution.

Parameters
  • mapper – A Mapper instance used to adapt placeholder names in sources to wanted names, ie the names of the placeholders that are in the translation Format being used.

  • allow_fetch_all – If False, an error will be raised when fetch_all() is called.

map_placeholders(source: SourceType, placeholders: Iterable[str], candidates: Optional[Iterable[str]] = None, clear_cache: bool = False) Dict[str, Optional[str]][source]#

Map placeholder names to the actual names used in source.

Parameters
  • source – The source to map placeholders for.

  • placeholders – Desired placeholders.

  • candidates – A subset of candidates (placeholder names) in source to map with placeholders. None =retrieve using get_placeholders().

  • clear_cache – If True, force a full remap.

Returns

A dict {wanted_placeholder_name: actual_placeholder_name_in_source}, where actual_placeholder_name_in_source will be None if the wanted placeholder could not be mapped to any of the candidates available for the source.

id_column(source: SourceType, candidates: Optional[Iterable[str]] = None) Optional[str][source]#

Return the ID column for source.

property mapper: Mapper[str, str, SourceType]#

Return the Mapper instance used for placeholder name mapping.

property online: bool#

Return connectivity status. If False, no new translations may be fetched.

assert_online() None[source]#

Raise an error if offline.

Raises

ConnectionStatusError – If not online.

abstract property sources: List[SourceType]#

Source names known to the Fetcher, such as cities or languages.

abstract property placeholders: Dict[SourceType, List[str]]#

Placeholders for sources managed by the Fetcher.

Returns

A dict {source: [placeholders..]}.

Notes

Placeholders (and sources) are returned as they appear as they are known to the fetcher (without mapping).

get_placeholders(source: SourceType) List[str][source]#

Get placeholders for source.

property allow_fetch_all: bool#

Flag indicating whether the fetch_all() operation is permitted.

fetch(ids_to_fetch: Iterable[IdsToFetch[SourceType, IdType]], placeholders: Iterable[str] = (), required: Iterable[str] = ()) Dict[SourceType, PlaceholderTranslations][source]#

Retrieve placeholder translations from the source.

Parameters
  • ids_to_fetch – Tuples (source, ids) to fetch. If ids=None, retrieve data for as many IDs as possible.

  • placeholders – All desired placeholders in preferred order.

  • required – Placeholders that must be included in the response.

Returns

A mapping {source: PlaceholderTranslations} for translation.

Raises

Notes

Placeholders are usually columns in relational database applications. These are the components which are combined to create ID translations. See rics.translation.offline.Format documentation for details.

fetch_all(placeholders: Iterable[str] = (), required: Iterable[str] = ()) Dict[SourceType, PlaceholderTranslations][source]#

Fetch as much data as possible.

Parameters
  • placeholders – All desired placeholders in preferred order.

  • required – Placeholders that must be included in the response.

Returns

A mapping {source: PlaceholderTranslations} for translation.

Raises
abstract fetch_translations(instruction: FetchInstruction[SourceType, IdType]) PlaceholderTranslations[SourceType][source]#

Retrieve placeholder translations from the source.

Parameters

instruction – A single instruction for IDs to fetch. If IDs is None, the fetcher should retrieve data for as many IDs as possible.

Returns

Placeholder translation elements.

Raises

UnknownPlaceholderError – If the placeholder is unknown to the fetcher.

close() None[source]#

Close the Fetcher. Does nothing by default.

classmethod default_mapper_kwargs() Dict[str, Any][source]#

Create a default Mapper for AbstractFetcher implementations.

classmethod default_score_function(value: str, candidates: Iterable[str], context: str) Iterable[float][source]#

Compute score for candidates.

class MemoryFetcher(data: Union[Dict[SourceType, PlaceholderTranslations], Dict[SourceType, Union[PlaceholderTranslations, DataFrame, Dict[str, Sequence[Any]]]]], **kwargs: Any)[source]#

Bases: AbstractFetcher[SourceType, IdType]

Fetch from memory.

Parameters

data – A dict {source: PlaceholderTranslations} to fetch from.

property sources: List[SourceType]#

Source names known to the Fetcher, such as cities or languages.

property placeholders: Dict[SourceType, List[str]]#

Placeholders for sources managed by the Fetcher.

Returns

A dict {source: [placeholders..]}.

Notes

Placeholders (and sources) are returned as they appear as they are known to the fetcher (without mapping).

fetch_translations(instr: FetchInstruction) PlaceholderTranslations[source]#

Retrieve placeholder translations from the source.

Parameters

instruction – A single instruction for IDs to fetch. If IDs is None, the fetcher should retrieve data for as many IDs as possible.

Returns

Placeholder translation elements.

Raises

UnknownPlaceholderError – If the placeholder is unknown to the fetcher.

class MultiFetcher(*fetchers: Fetcher, max_workers: int = 2, duplicate_translation_action: ActionLevel = ActionLevel.WARN, duplicate_source_discovered_action: ActionLevel = ActionLevel.IGNORE)[source]#

Bases: Fetcher[SourceType, IdType]

Fetcher which combines the results of other fetchers.

property allow_fetch_all: bool#

Flag indicating whether the fetch_all() operation is permitted.

online() bool[source]#

Return connectivity status. If False, no new translations may be fetched.

property placeholders: Dict[SourceType, List[str]]#

Placeholders for sources managed by the Fetcher.

Returns

A dict {source: [placeholders..]}.

Notes

Placeholders (and sources) are returned as they appear as they are known to the fetcher (without mapping).

property fetchers: List[Fetcher[SourceType, IdType]]#

Return child fetchers.

property sources: List[SourceType]#

Source names known to the Fetcher, such as cities or languages.

fetch(ids_to_fetch: Iterable[IdsToFetch[SourceType, IdType]], placeholders: Iterable[str] = (), required: Iterable[str] = ()) Dict[SourceType, PlaceholderTranslations][source]#

Retrieve placeholder translations from the source.

Parameters
  • ids_to_fetch – Tuples (source, ids) to fetch. If ids=None, retrieve data for as many IDs as possible.

  • placeholders – All desired placeholders in preferred order.

  • required – Placeholders that must be included in the response.

Returns

A mapping {source: PlaceholderTranslations} for translation.

Raises

Notes

Placeholders are usually columns in relational database applications. These are the components which are combined to create ID translations. See rics.translation.offline.Format documentation for details.

fetch_all(placeholders: Iterable[str] = (), required: Iterable[str] = ()) Dict[SourceType, PlaceholderTranslations][source]#

Fetch as much data as possible.

Parameters
  • placeholders – All desired placeholders in preferred order.

  • required – Placeholders that must be included in the response.

Returns

A mapping {source: PlaceholderTranslations} for translation.

Raises
property duplicate_translation_action: ActionLevel#

Return action to take when multiple fetchers return translations for the same source.

property duplicate_source_discovered_action: ActionLevel#

Return action to take when multiple fetchers claim the same source.

class PandasFetcher(read_function: ~typing.Union[~typing.Callable[[~typing.Union[str, bytes, ~os.PathLike], ~typing.Any, ~typing.Any], ~pandas.core.frame.DataFrame], str] = <function read_pickle>, read_path_format: ~typing.Optional[~typing.Union[str, ~typing.Callable[[~typing.Union[str, bytes, ~os.PathLike]], str]]] = 'data/{}.pkl', read_function_args: ~typing.Optional[~typing.Iterable[~typing.Any]] = None, read_function_kwargs: ~typing.Optional[~typing.Mapping[str, ~typing.Any]] = None, **kwargs: ~typing.Any)[source]#

Bases: AbstractFetcher[str, IdType]

Fetcher implementation using pandas DataFrame s as the data format.

Fetch data from serialized DataFrame s. How this is done is determined by the read_function. This is typically a Pandas function such as pandas.read_csv() or pandas.read_pickle(), but any function that accepts a string source as the first argument and returns a data frame can be used.

Parameters
  • read_function – A Pandas read-function.

  • read_path_format – A formatting string or a callable to apply to a source before passing them to read_function. Must contain a source as its only placeholder. Example: data/{source}.pkl. Leave as-is if None.

  • read_function_args – Additional positional arguments for read_function.

  • read_function_kwargs – Additional keyword arguments for read_function.

See also

The official Pandas IO documentation

read(source_path: Union[str, bytes, PathLike]) DataFrame[source]#

Read a DataFrame from a source path.

Parameters

source_path – Path to serialized DataFrame.

Returns

A deserialized DataFrame`.

find_sources() Dict[str, Path][source]#

Search for source paths to pass to read_function using read_path_format.

Returns

A dict {source, path}.

Raises

IOError – If files cannot be read.

property sources: List[str]#

Source names known to the Fetcher, such as cities or languages.

property placeholders: Dict[str, List[str]]#

Placeholders for sources managed by the Fetcher.

Returns

A dict {source: [placeholders..]}.

Notes

Placeholders (and sources) are returned as they appear as they are known to the fetcher (without mapping).

fetch_translations(instr: FetchInstruction) PlaceholderTranslations[source]#

Retrieve placeholder translations from the source.

Parameters

instruction – A single instruction for IDs to fetch. If IDs is None, the fetcher should retrieve data for as many IDs as possible.

Returns

Placeholder translation elements.

Raises

UnknownPlaceholderError – If the placeholder is unknown to the fetcher.

class SqlFetcher(connection_string: str, password: Optional[str] = None, whitelist_tables: Optional[Iterable[str]] = None, blacklist_tables: Optional[Iterable[str]] = None, include_views: bool = True, fetch_all_limit: Optional[int] = 100000, **kwargs: Any)[source]#

Bases: AbstractFetcher[str, IdType]

Fetch data from a SQL source. Requires SQLAlchemy.

Parameters
  • connection_string – A SQLAlchemy connection string. Read from environment variable if connection_string starts with ‘@’ followed by the name. Example: @TRANSLATION_DB_CONNECTION_STRING reads from the TRANSLATION_DB_CONNECTION_STRING environment variable.

  • password – Password to insert into the connection string. Will be escaped to allow for special characters. If given, the connection string must contain a password key, eg; dialect://user:{password}@host:port. Can be an environment variable just like connection_string.

  • whitelist_tables – The only tables the SqlFetcher may access. Mutually exclusive with blacklist_tables.

  • blacklist_tables – The only tables the SqlFetcher may not access. Mutually exclusive with whitelist_tables.

  • include_views – If True, discover views as well.

  • fetch_all_limit – Maximum size of table to allow a fetch all-operation. 0=never allow. Ignore if None.

  • **kwargs – Primarily passed to super().__init__, then used as selection_filter_type() kwargs.

Raises

ValueError – If both whitelist_tables and blacklist_tables are given.

Notes

Inheriting classes may override on or more of the following methods to further customize operation:

fetch_translations(instr: FetchInstruction) PlaceholderTranslations[source]#

Fetch columns from a SQL database.

property online: bool#

Return connectivity status. If False, no new translations may be fetched.

property sources: List[str]#

Source names known to the Fetcher, such as cities or languages.

property placeholders: Dict[str, List[str]]#

Placeholders for sources managed by the Fetcher.

Returns

A dict {source: [placeholders..]}.

Notes

Placeholders (and sources) are returned as they appear as they are known to the fetcher (without mapping).

property allow_fetch_all: bool#

Flag indicating whether the fetch_all() operation is permitted.

close() None[source]#

Close the Fetcher. Does nothing by default.

classmethod parse_connection_string(connection_string: str, password: Optional[str]) str[source]#

Parse a connection string. Read from environment if connection_string starts with ‘@’.

make_table_summary(table: Table, id_column: Column) TableSummary[source]#

Create a table summary.

get_approximate_table_size(table: Table, id_column: Column) int[source]#

Return the approximate size of a table.

Called only by the make_table_summary() method during discovery. The default implementation performs a count on the ID column, which may be expensive.

Parameters
  • table – A table object.

  • id_column – The ID column in table.

Returns

An approximate size for table.

get_metadata() MetaData[source]#

Create a populated metadata object.

classmethod selection_filter_type(ids: Set[IdType], table_summary: TableSummary, fetch_all_below: int = 25, fetch_all_above_ratio: float = 0.9, fetch_in_below: int = 1200, fetch_between_over: int = 10000, fetch_between_max_overfetch_factor: float = 2.5) Literal['in', 'between', None][source]#

Determine the type of filter (WHERE-query) to use, if any.

In the descriptions below, len(table) refers to the TableSummary.size-attribute of table_summary. Bare select implies fetching the entire table.

Parameters
  • ids – IDs to fetch.

  • table_summary – A summary of the table that’s about to be queried.

  • fetch_all_below – Use bare select if len(ids) <= len(table).

  • fetch_all_above_ratio – Use bare select if len(ids) > len(table) * ratio.

  • fetch_in_below – Always use IN-clause when fetching less than fetch_in_below IDs.

  • fetch_between_over – Always use BETWEEN-clause when fetching more than fetch_between_over IDs.

  • fetch_between_max_overfetch_factor – If number of IDs to fetch is between fetch_in_below and fetch_between_over, use this factor to choose between IN and BETWEEN clause.

Returns

One of ('in', 'between', None), where None means bare select (fetch the whole table).

Notes

Override this function to redefine SELECT filtering logic.

class TableSummary(name: str, size: int, columns: ColumnCollection, fetch_all_permitted: bool, id_column: Column)[source]#

Bases: object

Brief description of a known table.

name: str#

Name of the table.

size: int#

Approximate size of the table.

columns: ColumnCollection#

A flag indicating that the FETCH_ALL-operation is permitted for this table.

fetch_all_permitted: bool#

A flag indicating that the FETCH_ALL-operation is permitted for this table.

id_column: Column#

The ID column of the table.

select_columns(instr: FetchInstruction) List[str][source]#

Return required and optional columns of the table.

Modules

rics.translation.fetching.exceptions

Errors and warnings related to fethcing.

rics.translation.fetching.support

Supporting functions for implementations.

rics.translation.fetching.types

Types related to translation fetching.