Translator Configuration Files#

The recommended way of creating and configuring translators is the Translator.from_config() method. For an example, see the DVD Rental Database page.

Hint

For Fetcher classes and functions used by Mapper, rics-package implementations are used by default. To specify an external class or function, use 'fully.qualified.names' in quotation marks. Names are resolved by get_by_full_name(), using an appropriate default_module argument.

For an introduction to the translation process itself, see the Translation primer.

Multiple fetchers#

Complex applications may require multiple fetchers. These may be specified in auxiliary config files, one fetcher per file. Only the fetching key will be considered in these files. If multiple fetchers are defined, a MultiFetcher is created. Fetchers defined this way are hierarchical. The input order determines rank, affecting Name-to-source mapping. For example, for a Translator created by running:

>>> from rics.translation import Translator
>>> extra_fetchers=["fetcher.toml", "backup-fetcher.toml"]
>>> Translator.from_config("translation.toml", extra_fetchers=extra_fetchers)

the Translator.map-function will first consider the sources of the fetcher defined in translation.toml (if there is one), then fetcher.toml and finally backup-fetcher.toml.

Sections#

The only valid top-level keys are translator, unknown_ids, and fetching. Only the fetching section is required, though it may be left out of the main configuration file if fetching is configured separately. Other top-level keys will raise a ConfigurationError if present.

Section: Translator#

Section keys: [translator]#

Key

Type

Description

Comments

fmt

Format

Specify how translated IDs are displayed

Section: Unknown IDs#

Section keys: [unknown_ids]#

Key

Type

Description

Comments

fmt

Format

Specify an format for untranslated IDs.

Can be a plain string fmt='Unknown', or fmt='{id}' to leave as-is.

Section: Fetching#

The type of the fetcher is determined by the second-level key (other than mapping, which is reserved). For example, a MemoryFetcher would be created by adding a [fetching.MemoryFetcher]-section.

Section keys: [fetching]#

Key

Type

Description

Comments

allow_fetch_all

bool

Control access to fetch_all().

Some fetchers types redefine or ignore this key.

Hint

Custom classes may be initialized by using sections with fully qualified type names in single quotation marks. For example, a [fetching.'my.library.SuperFetcher'] would import and initialize a SuperFetcher from the my.library module.

Subsection: Mapping#

For more information about the mapping procedure, please refer to the Mapping primer page.

Section keys: [*.mapping]#

Key

Type

Description

Comments

score_function

ScoreFunction

Compute value/candidate-likeness

See: rics.mapping.score_functions

unmapped_values_action

raise | warn | ignore

Handle unmatched values.

See: rics.utility.action_level.ActionLevel

cardinality

OneToOne | ManyToOne

Determine how many candidates to map a single value to.

See: rics.mapping.Cardinality

  • Score functions which take additional keyword arguments should be specified in a child section, eg [*.mapping.<score-function-name>]. See: rics.mapping.score_functions for options.

  • External functions may be used by putting fully qualified names in single quotation marks. Names which do not contain any dot characters ('.') are assumed to refer to functions in the appropriate rics.mapping submodule.

Hint

For difficult matches, consider using overrides instead.

Filter functions#

Filters are given in [[*.mapping.filter_functions]] list-subsections. These may be used to remove undesirable matches, for example SQL tables which should not be used or a DataFrame column that should not be translated.

Section keys: [[*.mapping.filter_functions]]#

Key

Type

Description

Comments

function

str

Function name.

See: rics.mapping.filter_functions

Note

Additional keys depend on the chosen function implementation.

As an example, the next snippet ensures that only names ending with an _id-suffix will be translated by using a require_regex_match() filter.

[[translator.mapping.filter_functions]]
function = "require_regex_match"
regex = ".*_id$"
where = "name"

Score function#

There are some ScoreFunction s which take additional keyword arguments. These must be declared in a [*.overrides.<score-function-name>]-subsection. See: rics.mapping.score_functions for options.

Score function heuristics#

Heuristics may be used to aid an underlying score_function to make more difficult matches. There are two types of heuristic functions: AliasFunction s and Short-circuiting functions (which are really just differently interpreted FilterFunction s).

Heuristics are given in [[*.mapping.score_function_heuristics]] list-subsections (note the double brackets) and are applied in the order in which they are given by the HeuristicScore wrapper class.

Section keys: [[*.mapping.score_function_heuristics]]#

Key

Type

Description

Comments

function

str

Function name.

See: rics.mapping.heuristic_functions

mutate

bool

Keep changes made by function.

Disabled by default.

Note

Additional keys depend on the chosen function implementation.

As an example, the next snippet lets us match table columns such as animal_id to the id placeholder by using a value_fstring_alias() heuristic.

[[fetching.mapping.score_function_heuristics]]
function = "value_fstring_alias"
fstring = "{context}_{value}"

Hint

For difficult matches, consider using overrides instead.

Subsection: Overrides#

Shared or context-specific key-value pairs implemented by the InheritedKeysDict class. When used in config files, these appear as [*.overrides]-sections. Top-level override items are given in the [*.overrides]-section, while context-specific items are specified using a subsection, eg [*.overrides.<context-name>].

Note

The type of context is determined by the class that owns the overrides.

This next snipped is from another example. For unknown IDs, the name is set to ‘Name unknown’ for the ‘name_basics’ source and ‘Title unknown’ for the ‘title_basics’ source, respectively. They both inherit the from and to keys which rare set to ‘?’.

[unknown_ids.overrides]
from = "?"
to = "?"

[unknown_ids.overrides.name_basics]
name = "Name unknown"
[unknown_ids.overrides.title_basics]
name = "Title unknown"

Warning

Overrides have no fixed keys. No validation is performed and errors may be silent. The mapping process provides detailed information in debug mode, which may be used to discover issues.

Hint

Overrides may also be used to prevent mapping certain values.

For example, let’s assume that a SQL source table called title_basics with two columns title and name with identical contents. We would like to use a format '[{title}. ]{name}' to output translations such as ‘Mr. Astaire’. To avoid output such as ‘Top Hat. Top Hat’ for movies, we may add

[fetching.mapping.overrides.movies]
title = "_"

to force the fetcher to inform the Translator that the title placeholder (column) does not exist for the title_basics source (we used ‘_’ since TOML does not have a null-type).