The Translator

The Translator is the main entry point for all translation tasks. For a simple usage example, see the Translating IDs in 30 seconds section.

class rics.translation.Translator(fetcher: Union[Fetcher, TranslationMap, Dict[SourceType, PlaceholderTranslations], Dict[SourceType, Union[PlaceholderTranslations, DataFrame, Dict[str, Sequence[Any]]]]], fmt: Union[str, Format] = '{id}:{name}', mapper: Optional[Mapper] = None, default_fmt: Optional[Union[str, Format]] = None, default_translations: Optional[Union[DefaultTranslations, Dict[str, Union[Dict[str, Any], Dict[str, Dict[str, Any]]]]]] = None)[source]

Translate IDs to human-readable labels.

Untranslatable IDs will be None by default if neither default_fmt nor default_translations is given.

Parameters
  • fetcher – A Fetcher or ready-to-use translations.

  • fmt – String Format specification for translations.

  • mapper – A Mapper instance for binding names to sources.

  • default_fmt – Alternative format specification to use instead of fmt for fallback translation of unknown IDs.

  • default_translations – Shared and/or source-specific default placeholder values for unknown IDs.

See also

Related classes:

classmethod from_config(path: Union[str, bytes, PathLike, Dict[str, Any]]) Translator[source]

Create a translator from a YAML file.

Parameters

path – Path to a YAML file, or a pre-parsed dict.

Returns

A Translator object.

Raises

ConfigurationError – If the config is invalid.

translate(translatable: DefaultTranslatable, names: Optional[Union[NameType, Iterable[NameType], Callable[[NameType], bool]]] = None, ignore_names: Optional[Union[NameType, Iterable[NameType], Callable[[NameType], bool]]] = None, inplace: bool = False) Optional[DefaultTranslatable][source]

Translate IDs to human-readable strings.

Parameters
  • translatable – A data structure to translate.

  • names – Explicit names to translate. Will try to derive form translatable if not given. May also be a predicate which indicates (returns True for) derived names to keep.

  • ignore_names – Names not to translate. Always precedence over names, both explicit and derived. May also be a predicate which indicates (returns True for) names to ignore.

  • inplace – If True, translation is performed in-place and this function returns None.

Returns

A copy of translatable with IDs replaced by translations if inplace=False, otherwise None.

Raises
  • UntranslatableTypeError – If translatable is not translatable using any standard IOs.

  • AttributeError – If names are not given and cannot be derived from translatable.

  • MappingError – If required (explicitly given) names fail to map to a source.

map_to_sources(translatable: DefaultTranslatable, names: Optional[Union[NameType, Iterable[NameType], Callable[[NameType], bool]]] = None, ignore_names: Optional[Union[NameType, Iterable[NameType], Callable[[NameType], bool]]] = None) Optional[DirectionalMapping][source]

Map names to translation sources.

Parameters
  • translatable – A data structure to map names for.

  • names – Explicit names to translate. Will try to derive form translatable if not given. May also be a predicate which indicates (returns True for) derived names to keep.

  • ignore_names – Names not to translate. Always precedence over names, both explicit and derived. May also be a predicate which indicates (returns True for) names to ignore.

Returns

A mapping of names to translation sources. Returns None if mapping failed but success was not required.

Raises
  • AttributeError – If names are not given and cannot be derived from translatable.

  • MappingError – If required (explicitly given) names fail to map to a source.

fetch(translatable: DefaultTranslatable, name_to_source: DirectionalMapping[NameType, SourceType], data_structure_io: Optional[Type[DataStructureIO]] = None) TranslationMap[source]

Fetch translations.

Parameters
  • translatable – A data structure to translate.

  • name_to_source – Mappings of names in translatable to translation sources known the fetcher.

  • data_structure_io – Static Data Structure IO class used to extract IDs from translatable. None=derive.

Returns

A TranslationMap.

Raises

OfflineError – If disconnected from the fetcher, ie not online.

property online: bool

Return connectivity status. If False, no new translations may be fetched.

store(translatable: Optional[DefaultTranslatable] = None, names: Optional[Union[NameType, Iterable[NameType], Callable[[NameType], bool]]] = None, ignore_names: Optional[Union[NameType, Iterable[NameType], Callable[[NameType], bool]]] = None, delete_fetcher: bool = True) Translator[source]

Retrieve and store translations in a local cache.

Parameters
  • translatable – Data from which IDs to fetch will be extracted. None=fetch all IDs.

  • names – Explicit names to translate. Will try to derive form translatable if not given. May also be a predicate which indicates (returns True for) derived names to keep.

  • ignore_names – Names not to translate. Always precedence over names, both explicit and derived. May also be a predicate which indicates (returns True for) names to ignore.

  • delete_fetcher – If True, go offline after retrieving data. The translation will still function, but some methods may raise exceptions and new data cannot be retrieved. Deleting allows the fetcher to close files and connections. If the fetcher has a close()-method, it will be called before deletion.

Returns

Self, for chained assignment.

Raises
  • ForbiddenOperationError – If the fetcher does not permit the FETCH_ALL operation (only when translatable is None).

  • MappingError – If a translatable is given, but no names to translate could be extracted.

To actually get the translations, a Fetcher implementation is needed.

Handling unknown IDs

Untranslatable IDs are will be None by default. Both and alternative alternative translation format and default values may be specified to handle IDs which weren’t returned by the underlying fetcher. Alternative formats work just like regular formats, but if any placeholders other than id are specified, these must be included in the default translations. As an example, by copying the default_fmt and default-translations sections from config.yaml, we see that the output for an unknown title with ID “tt0043440” is translated the way we specified it.

Actor

Debut Title

nm0038172:Peter Aryans *1918†2001

tt0063897:Floris (original: Floris) *1969†1969

nm0040962:Ugo Attanasio *1887†1969

tt0043440:Title unknown (original: Original title unknown) *?†?

Hint

A simple default_fmt such as "{id} not translated" or just "unknown" may be enough, and will only fail if the fetcher is configured to fail for unknown IDs. Using one of these we could’ve skipped the default-translations section entirely in the example above.

Fetching: SQL database

Implementation based on SQLAlchemy. Any supported dialect should work out of the box, though drivers for your particular dialect may need to be installed separately.

class rics.translation.fetching.SqlFetcher(connection_string: str, password: Optional[str] = None, whitelist_tables: Optional[Iterable[str]] = None, blacklist_tables: Optional[Iterable[str]] = None, include_views: bool = True, fetch_in_below: int = 1200, fetch_between_over: int = 10000, fetch_between_max_overfetch_factor: float = 2.5, fetch_all_limit: Optional[int] = 100000, **kwargs: Any)[source]

Fetch data from a SQL source. Requires SQLAlchemy.

Parameters
  • connection_string – A SQLAlchemy connection string. Read from environment variable if connection_string starts with ‘@’ followed by the name. Example: @TRANSLATION_DB_CONNECTION_STRING reads from the TRANSLATION_DB_CONNECTION_STRING environment variable.

  • password – Password to insert into the connection string. Will be escaped to allow for special characters. If given, the connection string must contain a password key, eg; dialect://user:{password}@host:port. Can be an environment variable just like connection_string.

  • whitelist_tables – The only tables the fetcher may access.

  • blacklist_tables – The only tables the fetcher may not access.

  • include_views – If True, discover views as well.

  • fetch_in_below – Always use IN-clause when fetching less than fetch_in_below IDs.

  • fetch_between_over – Always use BETWEEN-clause when fetching more than fetch_between_over IDs.

  • fetch_between_max_overfetch_factor – If number of IDs to fetch is between fetch_in_below and fetch_between_over, use this factor to choose between IN and BETWEEN clause.

  • fetch_all_limit – Maximum size of table to allow a fetch all-operation. None=No limit, 0=never allow.

Raises

ValueError – If both whitelist_tables and blacklist_tables are given.

sanitize_id(arg: IdType) IdType[source]

Sanitize an input.

static sanitize_table(table: str) str[source]

Sanitize a table name.

fetch_placeholders(instr: FetchInstruction) PlaceholderTranslations[source]

Fetch columns from a SQL database.

property online: bool

Return connectivity status. If False, no new translations may be fetched.

property sources: List[str]

Source names known to the fetcher, such as cities or languages.

property allow_fetch_all: bool

Flag indicating whether the fetch_all() operation is permitted.

close() None[source]

Close database connection.

classmethod parse_connection_string(connection_string: str, password: Optional[str]) str[source]

Parse a connection string. Read from environment if connection_string starts with ‘@’.

make_table_summary(table: Table) TableSummary[source]

Create a table summary.

get_approximate_table_size(table: Table) int[source]

Return the approximate size of a table.

Called only by the make_table_summary() method during discovery. The default implementation performs a count on the ID column, which may be expensive.

Parameters

table – A table object.

Returns

An approximate size for table.

get_metadata() MetaData[source]

Create a populated metadata object.

Fetching: Local files

Implementation wrapping a pandas Read-function where file names are interpreted as source names. Most readers in pandas.io should work, though additional dependencies may be required for some of them. Many of these functions do not actually require the file to be present on the local file system, allowing translation data to be shared if stored centrally.

class rics.translation.fetching.PandasFetcher(read_function: ~typing.Union[~typing.Callable[[~typing.Union[str, bytes, ~os.PathLike], ~typing.Any, ~typing.Any], ~pandas.core.frame.DataFrame], str] = <function read_pickle>, read_path_format: ~typing.Optional[~typing.Union[str, ~typing.Callable[[~typing.Union[str, bytes, ~os.PathLike]], str]]] = 'data/{}.pkl', read_function_args: ~typing.Optional[~typing.Iterable[~typing.Any]] = None, read_function_kwargs: ~typing.Optional[~typing.Mapping[str, ~typing.Any]] = None, **kwargs: ~typing.Any)[source]

Fetcher using pandas DataFrames as the data format.

Fetch data from serialized DataFrames. How this is done is determined by the read_function. This is typically a Pandas function such as pandas.read_csv() or pandas.read_pickle(), but any function that accepts a string source as the first argument and returns a data frame can be used.

Parameters
  • read_function – A Pandas read-function.

  • read_path_format – A formatting string or a callable to apply to a source before passing them to read_function. Must contain a source as its only placeholder. Example: data/{source}.pkl. None=leave as-is.

  • read_function_args – Additional positional arguments for read_function.

  • read_function_kwargs – Additional keyword arguments for read_function.

See also

The official Pandas IO documentation

read(source_path: Union[str, bytes, PathLike]) DataFrame[source]

Read a DataFrame from a source path.

Parameters

source_path – Path to serialized DataFrame.

Returns

A deserialized DataFrame.

find_sources() Dict[str, Path][source]

Search for source paths to pass to read_function using read_path_format.

Returns

A dict {source, path}.

Raises

IOError – If files cannot be read.

property sources: List[str]

Source names known to the fetcher, such as cities or languages.

fetch_placeholders(instr: FetchInstruction) PlaceholderTranslations[source]

Read data from disk.

Fetching: User implementations

The base class may be inherited by users to customize all aspects of the fetching process. You will find the API reference for this class below.

class rics.translation.fetching.Fetcher(allow_fetch_all: bool = True, placeholder_overrides: Optional[Union[PlaceholderOverrides, Dict[str, Union[Dict[str, str], Dict[str, Dict[str, str]]]]]] = None)[source]

Base class for fetching translations from an external source.

Users who wish to define their own fetching logic should inherit this class, but there are implementations for common uses cases. See PandasFetcher for a versatile base fetcher or SqlFetcher for a more specialized solution.

Parameters
  • allow_fetch_all – If False, an error will be raised when fetch_all() is called.

  • placeholder_overrides – Placeholder name overrides. Used to adapt placeholder names in sources to wanted names.

See also

Related classes:

property online: bool

Return connectivity status. If False, no new translations may be fetched.

assert_online() None[source]

Raise an error if offline.

Raises

OfflineError – If not online.

abstract property sources: List[SourceType]

Source names known to the fetcher, such as cities or languages.

property placeholder_overrides: Optional[PlaceholderOverrides]

Return the override.

property allow_fetch_all: bool

Flag indicating whether the fetch_all() operation is permitted.

fetch(ids_to_fetch: Iterable[IdsToFetch], placeholders: Iterable[str] = (), required: Iterable[str] = ()) Dict[SourceType, PlaceholderTranslations][source]

Fetch translations.

Parameters
  • ids_to_fetch – Tuples (source, ids) to fetch. If ids=None, retrieve data for as many IDs as possible.

  • placeholders – All desired placeholders in preferred order.

  • required – Placeholders that must be included in the response.

Returns

A mapping {source: PlaceholderTranslations} for translation.

Raises

Notes

Placeholders are usually columns in relational database applications. These are the components which are combined to create ID translations. See rics.translation.offline.Format documentation for details.

fetch_all(placeholders: Iterable[str] = (), required: Iterable[str] = ()) Dict[SourceType, PlaceholderTranslations][source]

Fetch as much data as possible.

Parameters
  • placeholders – All desired placeholders in preferred order.

  • required – Placeholders that must be included in the response.

Returns

A mapping {source: PlaceholderTranslations} for translation.

Raises
abstract fetch_placeholders(instruction: FetchInstruction) PlaceholderTranslations[source]

Fetch translations.

Parameters

instruction – A single instruction for IDs to fetch. If IDs is None, the fetcher should retrieve data for as many IDs as possible.

Returns

Placeholder translation elements.

Raises

UnknownPlaceholderError – If the placeholder is unknown to the fetcher.

close() None[source]

Close the fetcher. Does nothing by default.

get_id_placeholder(source: SourceType) str[source]

Get the ID placeholder name for source.

classmethod make_and_verify(instr: FetchInstruction, known_placeholders: Collection[str], records: Sequence[Sequence[Any]]) PlaceholderTranslations[source]

Make a PlaceholderTranslations instance from records.

Convenience method meant for use by implementations.

Parameters
  • instr – A fetch instruction.

  • known_placeholders – Known placeholders for the instr.source.

  • records – Records produced from the instruction.

Returns

Placeholder translation elements.

Raises
classmethod verify_placeholders(instr: FetchInstruction, known_placeholders: Collection[str]) None[source]

Verify required placeholders for a source.

Convenience method meant for use by implementations.

Parameters
  • instr – A fetch instruction.

  • known_placeholders – Known placeholders for the instr.source.

Raises

UnknownPlaceholderError – If required placeholders are missing.

classmethod select_placeholders(instr: FetchInstruction, known_placeholders: Collection[str]) List[str][source]

Select as many known, requested placeholders as possible.

Parameters
  • instr – A fetch instruction.

  • known_placeholders – Known placeholders for the instr.source.

Returns

Known placeholders in the desired order.

Raises

UnknownPlaceholderError – If required placeholders are missing.

Offline translation

If you do not want to keep the fetcher connected to a database or the file system, you can use the translator store()-method to fetch as much data as possible after which the fetcher will be disconnected and discarded. Alternatively, you may supply a TranslationMap as the fetcher instance when initializing the translator. May cause high memory consumption.