rics.translation#

Translation of IDs with flexible formatting and name matching.

For and introduction to translation, see Translation primer and Mapping primer.

Classes

Translator([fetcher, fmt, mapper, ...])

Translate IDs to human-readable labels.

TranslatorFactory(file, extra_fetchers[, clazz])

Create a Translator from TOML inputs.

ConfigMetadata(rics_version, created, path, ...)

Metadata pertaining to how a Translator instance was initialized from TOML configuration.

class Translator(fetcher: Optional[Union[TranslationMap[NameType, SourceType, IdType], Fetcher[SourceType, IdType], Dict[SourceType, PlaceholderTranslations[SourceType]], Dict[SourceType, Union[PlaceholderTranslations, DataFrame, Dict[str, Sequence[Any]]]]]] = None, fmt: Union[str, Format] = '{id}:{name}', mapper: Optional[Mapper[NameType, SourceType, None]] = None, default_fmt: Optional[Union[str, Format]] = None, default_fmt_placeholders: Optional[Union[InheritedKeysDict[SourceType, str, Any], _MakeDict]] = None, allow_name_inheritance: bool = True)[source]#

Bases: Generic[NameType, SourceType, IdType]

Translate IDs to human-readable labels.

For an introduction to translation, see the Translation primer page.

The recommended way of initializing Translator instances is the from_config() method. For configuration file details, please refer to the Translator Configuration Files page.

The Translator is the main entry point for all translation tasks. Simplified translation process steps:

  1. The map method performs name-to-source mapping (see DirectionalMapping).

  2. The fetch method extracts IDs to translate and retrieves data (see TranslationMap).

  3. Finally, the translate method applies the translations and returns to the caller.

Parameters:
  • fetcher – A Fetcher or ready-to-use translations.

  • fmt – String Format specification for translations.

  • mapper – A Mapper instance for binding names to sources.

  • default_fmt – Alternative Format to use instead of fmt for fallback translation of unknown IDs.

  • default_fmt_placeholders – Shared and/or source-specific default placeholder values for unknown IDs. See InheritedKeysDict.make() for details.

  • allow_name_inheritance – If True, enable name resolution fallback to the parent translatable when translating with the attribute-option. Allows nameless pandas.Index instances to inherit the name of a pandas.Series.

Notes

Untranslatable IDs will be None by default if neither default_fmt nor default_fmt_placeholders is given. Adding the maximal_untranslated_fraction option to translate() will raise an exceptions if too many IDs are left untranslated. Note however that this verifiction step may be expensive.

Examples

A minimal example. For a more complete use case, see the DVD Rental Database example. Assume that we have data for people and animals as in the table below:

people:                       animals:
     id | name    | gender       id | name   | is_nice
  ------+---------+--------     ----+--------+---------
   1991 | Richard | Male          0 | Tarzan | false
   1999 | Sofia   | Female        1 | Morris | true
   1904 | Fred    | Male          2 | Simba  | true

In most real cases we’d fetch this table from somewhere. In this case, however, there’s so little data that we can simply enumerate the components needed for translation ourselves to create a MemoryFetcher.

>>> from rics.translation import Translator
>>> translation_data = {
...     'animals': {'id': [0, 1, 2], 'name': ['Tarzan', 'Morris', 'Simba'], 'is_nice': [False, True, True]},
...     'people': {'id': [1999, 1991, 1904], 'name': ['Sofia', 'Richard', 'Fred']},
... }
>>> translator = Translator(translation_data, fmt='{id}:{name}[, nice={is_nice}]')
>>> data = {'animals': [0, 2], 'people': [1991, 1999]}
>>> for key, translated_table in translator.translate(data).items():
>>>     print(f'Translations for {repr(key)}:')
>>>     for translated_id in translated_table:
>>>         print(f'    {repr(translated_id)}')
Translations for 'animals':
    '0:Tarzan, nice=False'
    '2:Simba, nice=True'
Translations for 'people':
    '1991:Richard'
    '1999:Sofia'

Handling unknown IDs.

>>> default_fmt_placeholders = dict(
...     default={'is_nice': 'Maybe?', 'name': "Bob"},
...     specific={'animals': {'name': 'Fido'}},
>>> )
>>> useless_database = {
...     'animals': {'id': [], 'name': []},
...     'people': {'id': [], 'name': []}
>>> }
>>> translator = Translator(useless_database, default_fmt_placeholders=default_fmt_placeholders,
...                         fmt='{id}:{name}[, nice={is_nice}]')
>>> data = {'animals': [0], 'people': [0]}
>>> for key, translated_table in translator.translate(data).items():
>>>     print(f'Translations for {repr(key)}:')
>>>     for translated_id in translated_table:
>>>         print(f'    {repr(translated_id)}')
Translations for 'animals':
    '0:Fido, nice=Maybe?'
Translations for 'people':
    '0:Bob, nice=Maybe?'

Since we didn’t give an explicit default_fmt_placeholders, the regular fmt is used instead. Formats can be plain strings, in which case translation will never explicitly fail unless the name itself fails to map and Mapper.unmapped_values_action is set to ActionLevel.RAISE.

classmethod from_config(path: Union[str, bytes, PathLike], extra_fetchers: Iterable[Union[str, bytes, PathLike]] = (), clazz: Optional[Union[str, Type[Translator[NameType, SourceType, IdType]]]] = None) Translator[NameType, SourceType, IdType][source]#

Create a Translator from TOML inputs.

Parameters:
  • path – Path to a TOML file.

  • extra_fetchers – Path to TOML files defining additional fetchers. Useful for fetching from multiple sources or kinds of sources, for example locally stored files in conjunction with one or more databases. The fetchers are ranked by input order, with the fetcher defined in path (if any) being given the highest priority (rank 0).

  • clazz – Translator implementation to create. If a string is passed, the class is resolved using get_by_full_name() if a string is given. Use cls if None.

Returns:

A new Translator instance with a config_metadata attribute.

property config_metadata: ConfigMetadata#

Return from_config() initialization metadata.

copy(share_fetcher: bool = True, **overrides: Any) Translator[NameType, SourceType, IdType][source]#

Make a copy of this Translator.

Parameters:
  • share_fetcher – If True, the returned instance use the same Fetcher.

  • overrides – Keyword arguments to use when instantiating the copy. Options that aren’t given will be taken from the current instance. See the Translator class documentation for possible choices.

Returns:

A copy of this Translator with overrides applied.

Raises:

NotImplementedError – If share_fetcher=False.

translate(translatable: Translatable, names: Optional[Union[NameType, Iterable[NameType]]] = None, ignore_names: Optional[Union[NameType, Iterable[NameType], Callable[[NameType], bool]]] = None, inplace: bool = False, override_function: Optional[Callable[[NameType, Set[SourceType], List[IdType]], Optional[Union[SourceType, Dict[SourceType, List[IdType]]]]]] = None, maximal_untranslated_fraction: float = 1.0, reverse: bool = False, attribute: Optional[str] = None) Optional[Translatable][source]#

Translate IDs to human-readable strings.

For an introduction to translation, see the Translation primer page.

Parameters:
  • translatable – A data structure to translate.

  • names – Explicit names to translate. Derive from translatable if None.

  • ignore_names – Names not to translate, or a predicate (str) -> bool.

  • inplace – If True, translate in-place and return None.

  • override_function

    A callable (name, fetcher.sources, ids) -> ... returning one of

    • None (use regular mapping logic)

    • a source to use, or

    • a split mapping {source: [ids_for_source..]}. This forces IDs to be fetched from different sources in spite of being labelled with the same name.

  • maximal_untranslated_fraction – The maximum fraction of IDs for which translation may fail before an error is raised. 1=disabled. Ignored in reverse mode.

  • reverse – If True, perform translations back to IDs. Offline mode only.

  • attribute – If given, translate translatable.attribute instead. If inplace=False, the translated attribute will be assigned to translatable using setattr(translatable, attribute, <translated-attribute>).

Returns:

A copy of translated copy of translatable if inplace=False, otherwise None.

Raises:
map(translatable: Translatable, names: Optional[Union[NameType, Iterable[NameType]]] = None, ignore_names: Optional[Union[NameType, Iterable[NameType], Callable[[NameType], bool]]] = None, override_function: Optional[Callable[[NameType, Set[SourceType], List[IdType]], Optional[Union[SourceType, Dict[SourceType, List[IdType]]]]]] = None) Optional[DirectionalMapping[NameType, SourceType]][source]#

Map names to translation sources.

Parameters:
  • translatable – A data structure to map names for.

  • names – Explicit names to translate. Derive from translatable if None.

  • ignore_names – Names not to translate, or a predicate (str) -> bool.

  • override_function

    A callable (name, fetcher.sources, ids) -> ... returning one of

    • None (use regular mapping logic)

    • a source to use, or

    • a split mapping {source: [ids_for_source..]}. This forces IDs to be fetched from different sources in spite of being labelled with the same name.

Returns:

A mapping of names to translation sources. Returns None if mapping failed.

Raises:
  • AttributeError – If names are not given and cannot be derived from translatable.

  • MappingError – If any required (explicitly given) names fail to map to a source.

  • MappingError – If name-to-source mapping is ambiguous.

  • UserMappingError – If override_function returns a source which is not known, and self.mapper.unknown_user_override_action != 'ignore'.

map_scores(translatable: Translatable, names: Optional[Union[NameType, Iterable[NameType]]] = None, ignore_names: Optional[Union[NameType, Iterable[NameType], Callable[[NameType], bool]]] = None, override_function: Optional[Callable[[NameType, Set[SourceType], List[IdType]], Optional[Union[SourceType, Dict[SourceType, List[IdType]]]]]] = None) DataFrame[source]#

Returns raw match scores for name-to-source mapping. See map() for details.

property sources: List[SourceType]#

Return translation sources.

fetch(translatable: Translatable, name_to_source: DirectionalMapping[NameType, SourceType], data_structure_io: Optional[Type[DataStructureIO]] = None) TranslationMap[NameType, SourceType, IdType][source]#

Fetch translations.

Parameters:
  • translatable – A data structure to translate.

  • name_to_source – Mappings of names in translatable to sources as they are known to the fetcher.

  • data_structure_io – Static namespace used to extract IDs from translatable.

Returns:

A TranslationMap.

Raises:

ConnectionStatusError – If disconnected from the fetcher, i.e. not online.

property online: bool#

Return connectivity status. If False, no new translations may be fetched.

property fetcher: Fetcher[SourceType, IdType]#

Return the Fetcher instance used to retrieve translations.

property mapper: Mapper[NameType, SourceType, None]#

Return the Mapper instance used for name-to-source binding.

property cache: TranslationMap[NameType, SourceType, IdType]#

Return a TranslationMap of cached translations.

classmethod load_persistent_instance(config_path: Union[str, bytes, PathLike], extra_fetchers: Iterable[Union[str, bytes, PathLike]] = (), cache_dir: Optional[Union[str, bytes, PathLike]] = None, max_age: Union[str, Timedelta, timedelta] = '7d', clazz: Optional[Union[str, Type[Translator[NameType, SourceType, IdType]]]] = None) Translator[NameType, SourceType, IdType][source]#

Load or create a persistent fetch_all-instance.

Warning

Experimental method; may change or disappear without warning.

Instances are created, stored and loaded as determined by a metadata file located in the given cache_dir. A new Translator will be created if:

  • There is no ‘metadata’ file, or

  • the original Translator is too old (see max_age), or

  • the current configuration – as defined by (config_path, extra_fetchers, clazz) – has changed in such a way that it is no longer equivalent to the configuration used to create the original Translator.

Note

This method is not thread safe.

Parameters:
  • config_path – Path to a TOML file. See from_config() for details.

  • extra_fetchers – Path to TOML files defining additional fetchers. See from_config() for details.

  • cache_dir – Root directory where the cached translator and associated metadata is stored. Derive based on config_path if None.

  • max_age – The maximum age of the cached Translator before it must be recreated. Pass max_age=0 to force recreation.

  • clazz – Translator implementation to create. If a string is passed, the class is resolved using get_by_full_name(). Use cls if None.

Returns:

A new or cached Translator instance with a config_metadata attribute.

classmethod restore(path: Union[str, bytes, PathLike]) Translator[NameType, SourceType, IdType][source]#

Restore a serialized Translator.

Parameters:

path – Path to a serialized Translator.

Returns:

A Translator.

Raises:

TypeError – If the object at path is not a Translator or a subtype thereof.

See also

The Translator.store() method.

store(translatable: Optional[Translatable] = None, names: Optional[Union[NameType, Iterable[NameType]]] = None, ignore_names: Optional[Union[NameType, Iterable[NameType], Callable[[NameType], bool]]] = None, delete_fetcher: bool = True, path: Optional[Union[str, bytes, PathLike]] = None) Translator[NameType, SourceType, IdType][source]#

Retrieve and store translations in memory.

Parameters:
  • translatable – Data from which IDs to fetch will be extracted. Fetch all IDs if None.

  • names – Explicit names to translate. Derive from translatable if None.

  • ignore_names – Names not to translate, or a predicate (str) -> bool.

  • delete_fetcher – If True, invoke Fetcher.close() and delete the fetcher after retrieving data. The Translator will still function, but new data cannot be retrieved.

  • path – If given, serialize the Translator to disk after retrieving data.

Returns:

Self, for chained assignment.

Raises:

Notes

The Translator is guaranteed to be serializable() once offline. Fetchers often aren’t as they require things like database connections to function.

See also

The Translator.restore() method.

class TranslatorFactory(file: Union[str, bytes, PathLike], extra_fetchers: Iterable[Union[str, bytes, PathLike]], clazz: Union[str, Type[Translator[NameType, SourceType, IdType]]] = None)[source]#

Bases: Generic[NameType, SourceType, IdType]

Create a Translator from TOML inputs.

FETCHER_FACTORY(config: Dict[str, Any]) AbstractFetcher[SourceType, IdType]#

A callable (name, kwargs) -> AbstractFetcher. Overwrite attribute to customize.

MAPPER_FACTORY(for_fetcher: bool) Optional[Mapper[Any, Any, Any]]#

A callable (kwargs) -> Mapper. Overwrite attribute to customize.

create() Translator[NameType, SourceType, IdType][source]#

Create a Translator from a TOML file.

classmethod resolve_class(clazz: Union[str, Type[Translator[NameType, SourceType, IdType]]] = None) Type[Translator[NameType, SourceType, IdType]][source]#

Resolve desired Translator type.

class ConfigMetadata(rics_version: str, created: Timestamp, path: Path, extra_fetchers: Tuple[Path, ...], clazz: str)[source]#

Bases: object

Metadata pertaining to how a Translator instance was initialized from TOML configuration.

rics_version: str#

The rics version under which this instance was created.

created: Timestamp#

The time at which the Translator was originally initialized. Second precision.

path: Path#

Absolute path of the main translation configuration.

extra_fetchers: Tuple[Path, ...]#

Absolute paths of configuration files for auxiliary fetchers.

clazz: str#

String representation of the class type.

is_equivalent(other: ConfigMetadata) bool[source]#

Check if this ConfigMetadata is equivalent to other.

Configs are equivalent if:

  • They have the same rics version, and

  • Use the same fully qualified class name, and

  • The main configuration files are equal after parsing, and

  • They have the same number of auxiliary (“extra”) fetcher configurations, and

  • All auxiliary fetcher configurations are equal after parsing.

Parameters:

other – Another ConfigMetadata instance.

Returns:

Equivalence status.

to_json() str[source]#

Get a JSON representation of this ConfigMetadata.

classmethod from_json(s: str) ConfigMetadata[source]#

Create ConfigMetadata from a JSON string s.

Modules

rics.translation.dio

Integration for insertion and extraction of IDs and translations to and from various data structures.

rics.translation.exceptions

General errors for the translation suite.

rics.translation.factory

Factory functions for translation classes.

rics.translation.fetching

Translation using external sources.

rics.translation.offline

Offline (in-memory) translation classes.

rics.translation.testing

Test implementations.

rics.translation.types

Types used for translation.