rics.translation.fetching#
Translation using external sources.
Classes
|
Interface for fetching translations from an external source. |
|
Base class for retrieving translations from an external source. |
|
Fetch from memory. |
|
Fetcher which combines the results of other fetchers. |
|
Fetcher implementation using pandas DataFrames as the data format. |
|
Fetch data from a SQL source. |
- class Fetcher(*args, **kwds)[source]#
Bases:
ABC,Generic[SourceType,IdType]Interface for fetching translations from an external source.
- abstract property allow_fetch_all: bool#
Flag indicating whether the
fetch_all()operation is permitted.
- abstract property online: bool#
Return connectivity status. If
False, no new translations may be fetched.
- abstract property sources: List[SourceType]#
Source names known to the
Fetcher, such ascitiesorlanguages.
- abstract property placeholders: Dict[SourceType, List[str]]#
Placeholders for sources managed by the
Fetcher.- Returns
A dict
{source: [placeholders..]}.
Notes
Placeholders (and sources) are returned as they appear as they are known to the fetcher (without mapping).
- abstract fetch(ids_to_fetch: Iterable[IdsToFetch], placeholders: Iterable[str] = (), required: Iterable[str] = ()) Dict[SourceType, PlaceholderTranslations][source]#
Retrieve placeholder translations from the source.
- Parameters
ids_to_fetch – Tuples (source, ids) to fetch. If
ids=None, retrieve data for as many IDs as possible.placeholders – All desired placeholders in preferred order.
required – Placeholders that must be included in the response.
- Returns
A mapping
{source: PlaceholderTranslations}for translation.- Raises
UnknownPlaceholderError – For placeholder(s) that are unknown to the
Fetcher.UnknownSourceError – For sources(s) that are unknown to the
Fetcher.ForbiddenOperationError – If trying to fetch all IDs when not possible or permitted.
ImplementationError – For errors made by the inheriting implementation.
Notes
Placeholders are usually columns in relational database applications. These are the components which are combined to create ID translations. See
rics.translation.offline.Formatdocumentation for details.
- abstract fetch_all(placeholders: Iterable[str] = (), required: Iterable[str] = ()) Dict[SourceType, PlaceholderTranslations][source]#
Fetch as much data as possible.
- Parameters
placeholders – All desired placeholders in preferred order.
required – Placeholders that must be included in the response.
- Returns
A mapping
{source: PlaceholderTranslations}for translation.- Raises
ForbiddenOperationError – If fetching all IDs is not possible or permitted.
UnknownPlaceholderError – For placeholder(s) that are unknown to the
Fetcher.ImplementationError – For errors made by the inheriting implementation.
- class AbstractFetcher(mapper: Optional[Mapper[str, str, SourceType]] = None, allow_fetch_all: bool = True)[source]#
Bases:
Fetcher[SourceType,IdType]Base class for retrieving translations from an external source.
Users who wish to define their own fetching logic should inherit this class, but there are implementations for common uses cases. See
PandasFetcherfor a versatile base fetcher, orSqlFetcherfor a more specialized solution.- Parameters
mapper – A
Mapperinstance used to adapt placeholder names in sources to wanted names, ie the names of the placeholders that are in the translationFormatbeing used.allow_fetch_all – If
False, an error will be raised whenfetch_all()is called.
- map_placeholders(source: SourceType, placeholders: Iterable[str], candidates: Optional[Iterable[str]] = None, clear_cache: bool = False) Dict[str, Optional[str]][source]#
Map placeholder names to the actual names used in source.
- Parameters
source – The source to map placeholders for.
placeholders – Desired
placeholders.candidates – A subset of candidates (placeholder names) in source to map with placeholders.
None=retrieve usingget_placeholders().clear_cache – If
True, force a full remap.
- Returns
A dict
{wanted_placeholder_name: actual_placeholder_name_in_source}, where actual_placeholder_name_in_source will beNoneif the wanted placeholder could not be mapped to any of the candidates available for the source.
- id_column(source: SourceType, candidates: Optional[Iterable[str]] = None) Optional[str][source]#
Return the ID column for source.
- property mapper: Mapper[str, str, SourceType]#
Return the
Mapperinstance used for placeholder name mapping.
- assert_online() None[source]#
Raise an error if offline.
- Raises
ConnectionStatusError – If not online.
- abstract property sources: List[SourceType]#
Source names known to the
Fetcher, such ascitiesorlanguages.
- abstract property placeholders: Dict[SourceType, List[str]]#
Placeholders for sources managed by the
Fetcher.- Returns
A dict
{source: [placeholders..]}.
Notes
Placeholders (and sources) are returned as they appear as they are known to the fetcher (without mapping).
- get_placeholders(source: SourceType) List[str][source]#
Get placeholders for source.
- property allow_fetch_all: bool#
Flag indicating whether the
fetch_all()operation is permitted.
- fetch(ids_to_fetch: Iterable[IdsToFetch[SourceType, IdType]], placeholders: Iterable[str] = (), required: Iterable[str] = ()) Dict[SourceType, PlaceholderTranslations][source]#
Retrieve placeholder translations from the source.
- Parameters
ids_to_fetch – Tuples (source, ids) to fetch. If
ids=None, retrieve data for as many IDs as possible.placeholders – All desired placeholders in preferred order.
required – Placeholders that must be included in the response.
- Returns
A mapping
{source: PlaceholderTranslations}for translation.- Raises
UnknownPlaceholderError – For placeholder(s) that are unknown to the
Fetcher.UnknownSourceError – For sources(s) that are unknown to the
Fetcher.ForbiddenOperationError – If trying to fetch all IDs when not possible or permitted.
ImplementationError – For errors made by the inheriting implementation.
Notes
Placeholders are usually columns in relational database applications. These are the components which are combined to create ID translations. See
rics.translation.offline.Formatdocumentation for details.
- fetch_all(placeholders: Iterable[str] = (), required: Iterable[str] = ()) Dict[SourceType, PlaceholderTranslations][source]#
Fetch as much data as possible.
- Parameters
placeholders – All desired placeholders in preferred order.
required – Placeholders that must be included in the response.
- Returns
A mapping
{source: PlaceholderTranslations}for translation.- Raises
ForbiddenOperationError – If fetching all IDs is not possible or permitted.
UnknownPlaceholderError – For placeholder(s) that are unknown to the
Fetcher.ImplementationError – For errors made by the inheriting implementation.
- abstract fetch_translations(instruction: FetchInstruction[SourceType, IdType]) PlaceholderTranslations[SourceType][source]#
Retrieve placeholder translations from the source.
- Parameters
instruction – A single instruction for IDs to fetch. If IDs is
None, the fetcher should retrieve data for as many IDs as possible.- Returns
Placeholder translation elements.
- Raises
UnknownPlaceholderError – If the placeholder is unknown to the fetcher.
- class MemoryFetcher(data: Union[Dict[SourceType, PlaceholderTranslations], Dict[SourceType, Union[PlaceholderTranslations, DataFrame, Dict[str, Sequence[Any]]]]], **kwargs: Any)[source]#
Bases:
AbstractFetcher[SourceType,IdType]Fetch from memory.
- Parameters
data – A dict
{source: PlaceholderTranslations}to fetch from.
- property sources: List[SourceType]#
Source names known to the
Fetcher, such ascitiesorlanguages.
- property placeholders: Dict[SourceType, List[str]]#
Placeholders for sources managed by the
Fetcher.- Returns
A dict
{source: [placeholders..]}.
Notes
Placeholders (and sources) are returned as they appear as they are known to the fetcher (without mapping).
- fetch_translations(instr: FetchInstruction) PlaceholderTranslations[source]#
Retrieve placeholder translations from the source.
- Parameters
instruction – A single instruction for IDs to fetch. If IDs is
None, the fetcher should retrieve data for as many IDs as possible.- Returns
Placeholder translation elements.
- Raises
UnknownPlaceholderError – If the placeholder is unknown to the fetcher.
- class MultiFetcher(*fetchers: Fetcher, max_workers: int = 2, duplicate_translation_action: Union[Literal['ignore', 'warn', 'raise', 'IGNORE', 'WARN', 'RAISE'], ActionLevel] = 'warn', duplicate_source_discovered_action: Union[Literal['ignore', 'warn', 'raise', 'IGNORE', 'WARN', 'RAISE'], ActionLevel] = 'ignore')[source]#
Bases:
Fetcher[SourceType,IdType]Fetcher which combines the results of other fetchers.
- property allow_fetch_all: bool#
Flag indicating whether the
fetch_all()operation is permitted.
- property placeholders: Dict[SourceType, List[str]]#
Placeholders for sources managed by the
Fetcher.- Returns
A dict
{source: [placeholders..]}.
Notes
Placeholders (and sources) are returned as they appear as they are known to the fetcher (without mapping).
- property fetchers: List[Fetcher[SourceType, IdType]]#
Return child fetchers.
- property sources: List[SourceType]#
Source names known to the
Fetcher, such ascitiesorlanguages.
- fetch(ids_to_fetch: Iterable[IdsToFetch[SourceType, IdType]], placeholders: Iterable[str] = (), required: Iterable[str] = ()) Dict[SourceType, PlaceholderTranslations][source]#
Retrieve placeholder translations from the source.
- Parameters
ids_to_fetch – Tuples (source, ids) to fetch. If
ids=None, retrieve data for as many IDs as possible.placeholders – All desired placeholders in preferred order.
required – Placeholders that must be included in the response.
- Returns
A mapping
{source: PlaceholderTranslations}for translation.- Raises
UnknownPlaceholderError – For placeholder(s) that are unknown to the
Fetcher.UnknownSourceError – For sources(s) that are unknown to the
Fetcher.ForbiddenOperationError – If trying to fetch all IDs when not possible or permitted.
ImplementationError – For errors made by the inheriting implementation.
Notes
Placeholders are usually columns in relational database applications. These are the components which are combined to create ID translations. See
rics.translation.offline.Formatdocumentation for details.
- fetch_all(placeholders: Iterable[str] = (), required: Iterable[str] = ()) Dict[SourceType, PlaceholderTranslations][source]#
Fetch as much data as possible.
- Parameters
placeholders – All desired placeholders in preferred order.
required – Placeholders that must be included in the response.
- Returns
A mapping
{source: PlaceholderTranslations}for translation.- Raises
ForbiddenOperationError – If fetching all IDs is not possible or permitted.
UnknownPlaceholderError – For placeholder(s) that are unknown to the
Fetcher.ImplementationError – For errors made by the inheriting implementation.
- property duplicate_translation_action: ActionLevel#
Return action to take when multiple fetchers return translations for the same source.
- property duplicate_source_discovered_action: ActionLevel#
Return action to take when multiple fetchers claim the same source.
- class PandasFetcher(read_function: ~typing.Union[~typing.Callable[[~typing.Union[str, bytes, ~os.PathLike], ~typing.Any, ~typing.Any], ~pandas.core.frame.DataFrame], str] = <function read_pickle>, read_path_format: ~typing.Optional[~typing.Union[str, ~typing.Callable[[~typing.Union[str, bytes, ~os.PathLike]], str]]] = 'data/{}.pkl', read_function_args: ~typing.Optional[~typing.Iterable[~typing.Any]] = None, read_function_kwargs: ~typing.Optional[~typing.Mapping[str, ~typing.Any]] = None, **kwargs: ~typing.Any)[source]#
Bases:
AbstractFetcher[str,IdType]Fetcher implementation using pandas DataFrames as the data format.
Fetch data from serialized DataFrames. How this is done is determined by the read_function. This is typically a Pandas function such as
pandas.read_csv()orpandas.read_pickle(), but any function that accepts a string source as the first argument and returns a data frame can be used.- Parameters
read_function – A Pandas read-function.
read_path_format – A formatting string or a callable to apply to a source before passing them to read_function. Must contain a source as its only placeholder. Example:
data/{source}.pkl. Leave as-is ifNone.read_function_args – Additional positional arguments for read_function.
read_function_kwargs – Additional keyword arguments for read_function.
See also
The official Pandas IO documentation
- read(source_path: Union[str, bytes, PathLike]) DataFrame[source]#
Read a DataFrame from a source path.
- Parameters
source_path – Path to serialized DataFrame.
- Returns
A deserialized DataFrame.
- find_sources() Dict[str, Path][source]#
Search for source paths to pass to read_function using read_path_format.
- Returns
A dict
{source, path}.- Raises
IOError – If files cannot be read.
- property placeholders: Dict[str, List[str]]#
Placeholders for sources managed by the
Fetcher.- Returns
A dict
{source: [placeholders..]}.
Notes
Placeholders (and sources) are returned as they appear as they are known to the fetcher (without mapping).
- fetch_translations(instr: FetchInstruction) PlaceholderTranslations[source]#
Retrieve placeholder translations from the source.
- Parameters
instruction – A single instruction for IDs to fetch. If IDs is
None, the fetcher should retrieve data for as many IDs as possible.- Returns
Placeholder translation elements.
- Raises
UnknownPlaceholderError – If the placeholder is unknown to the fetcher.
- class SqlFetcher(connection_string: str, password: Optional[str] = None, whitelist_tables: Optional[Iterable[str]] = None, blacklist_tables: Optional[Iterable[str]] = None, include_views: bool = True, fetch_in_below: int = 1200, fetch_between_over: int = 10000, fetch_between_max_overfetch_factor: float = 2.5, fetch_all_limit: Optional[int] = 100000, **kwargs: Any)[source]#
Bases:
AbstractFetcher[str,IdType]Fetch data from a SQL source. Requires SQLAlchemy.
- Parameters
connection_string – A SQLAlchemy connection string. Read from environment variable if connection_string starts with ‘@’ followed by the name. Example:
@TRANSLATION_DB_CONNECTION_STRINGreads from the TRANSLATION_DB_CONNECTION_STRING environment variable.password – Password to insert into the connection string. Will be escaped to allow for special characters. If given, the connection string must contain a password key, eg;
dialect://user:{password}@host:port. Can be an environment variable just like connection_string.whitelist_tables – The only tables the
SqlFetchermay access.blacklist_tables – The only tables the
SqlFetchermay not access.include_views – If
True, discover views as well.fetch_in_below – Always use
IN-clause when fetching less than fetch_in_below IDs.fetch_between_over – Always use
BETWEEN-clause when fetching more than fetch_between_over IDs.fetch_between_max_overfetch_factor – If number of IDs to fetch is between fetch_in_below and fetch_between_over, use this factor to choose between
INandBETWEENclause.fetch_all_limit – Maximum size of table to allow a fetch all-operation. 0=never allow. Ignore if
None.
- Raises
ValueError – If both whitelist_tables and blacklist_tables are given.
- fetch_translations(instr: FetchInstruction) PlaceholderTranslations[source]#
Fetch columns from a SQL database.
- property placeholders: Dict[str, List[str]]#
Placeholders for sources managed by the
Fetcher.- Returns
A dict
{source: [placeholders..]}.
Notes
Placeholders (and sources) are returned as they appear as they are known to the fetcher (without mapping).
- classmethod parse_connection_string(connection_string: str, password: Optional[str]) str[source]#
Parse a connection string. Read from environment if connection_string starts with ‘@’.
- make_table_summary(table: Table, id_column: Column) TableSummary[source]#
Create a table summary.
- get_approximate_table_size(table: Table, id_column: Column) int[source]#
Return the approximate size of a table.
Called only by the
make_table_summary()method during discovery. The default implementation performs a count on the ID column, which may be expensive.- Parameters
table – A table object.
id_column – The ID column in table.
- Returns
An approximate size for table.
- class TableSummary(name: str, size: int, columns: ColumnCollection, fetch_all_permitted: bool, id_column: Column)[source]#
Bases:
objectBrief description of a known table.
- columns: ColumnCollection#
A flag indicating that the FETCH_ALL-operation is permitted for this table.
- fetch_all_permitted: bool#
A flag indicating that the FETCH_ALL-operation is permitted for this table.
- select_columns(instr: FetchInstruction) List[str][source]#
Return required and optional columns of the table.
Modules
Errors and warnings related to fethcing. |
|
Supporting functions for implementations. |
|
Types related to translation fetching. |