The recommended way of creating and configuring translators is the Translator.from_config() method. For an example, see the DVD Rental Database page.
Hint
For Fetcher classes and functions used by Mapper, rics-package implementations are used by default. To
specify an external class or function, use 'fully.qualified.names' in quotation marks. Names are resolved by
get_by_full_name(), using an appropriate default_module argument.
Complex applications may require multiple fetchers. These may be specified in auxiliary config files, one fetcher per
file. Only the fetching key will be considered in these files. If multiple fetchers are defined, a
MultiFetcher is created. Fetchers defined this way are hierarchical. The input
order determines rank, affecting Name-to-source mapping. For
example, for a Translator created by running:
>>> from rics.translation import Translator
>>> extra_fetchers=["fetcher.toml", "backup-fetcher.toml"]
>>> Translator.from_config("translation.toml", extra_fetchers=extra_fetchers)
the Translator.map_to_sources-function will first consider the
sources of the fetcher defined in translation.toml (if there is one), then fetcher.toml and finally backup-fetcher.toml.
The only valid top-level keys are translator, unknown_ids, and fetching. Only the fetching section is
required, though it may be left out of the main configuration file if fetching is configured separately. Other top-level
keys will raise a ConfigurationError if present.
Key |
Type |
Description |
Comments |
|---|---|---|---|
fmt |
Specify how translated IDs are displayed |
Parameters for Name-to-source
mapping are specified in a [translator.mapping]-subsection. See: Subsection: Mapping for details (context =
source).
Key |
Type |
Description |
Comments |
|---|---|---|---|
fmt |
Specify an format for untranslated IDs. |
Can be a plain string |
Alternative placeholder-values for unknown IDs can be declared
in a [unknown_ids.overrides]-subsection. See: Subsection: Overrides for details (context =
source).
The type of the fetcher is determined by the second-level key (other than mapping, which is reserved). For example,
a MemoryFetcher would be created by adding a [fetching.MemoryFetcher]-section.
Key |
Type |
Description |
Comments |
|---|---|---|---|
allow_fetch_all |
Control access to |
Some fetchers types redefine or ignore this key. |
The AbstractFetcher class uses a Mapper to bind actual
placeholder names in
sources to desired
placeholder names requested by the calling Translator instance.
See: Subsection: Mapping for details (context = source).
Hint
Custom classes may be initialized by using sections with fully qualified type names in single quotation marks. For
example, a [fetching.'my.library.SuperFetcher'] would import and initialize a SuperFetcher from the
my.library module.
Key |
Type |
Description |
Comments |
|---|---|---|---|
score_function |
Compute value/candidate-likeness |
||
unmapped_values_action |
raise | warn | ignore |
Handle unmatched values. |
|
cardinality |
OneToOne | ManyToOne |
Determine how many candidates to map a single value to. |
Score functions which take additional keyword arguments should be specified in a child section, eg
[*.mapping.<score-function-name>]. See: rics.mapping.score_functions for options.
External functions may be used by putting fully qualified names in single quotation marks. Names which do not contain
any dot characters ('.') are assumed to refer to functions in the appropriate rics.mapping submodule.
Hint
For difficult matches, consider using overrides instead.
Filters are given in [[*.mapping.filter_functions]] list-subsections. These may be used to remove undesirable
matches, for example SQL tables which should not be used or a DataFrame column that should not be translated.
Key |
Type |
Description |
Comments |
|---|---|---|---|
function |
Function name. |
Note
Additional keys depend on the chosen function implementation.
As an example, the next snippet ensures that only names ending with an _id-suffix will be translated by using a
require_regex_match() filter.
[[translator.mapping.filter_functions]]
function = "require_regex_match"
regex = ".*_id$"
where = "name"
There are some ScoreFunction s which take additional keyword arguments. These must
be declared in a [*.overrides.<score-function-name>]-subsection. See: rics.mapping.score_functions for options.
Heuristics may be used to aid an underlying score_function to make more difficult matches. There are two types of
heuristic functions: AliasFunction s and Short-circuiting functions (which are
really just differently interpreted FilterFunction s).
Heuristics are given in [[*.mapping.score_function_heuristics]] list-subsections (note the double brackets) and
are applied in the order in which they are given by the HeuristicScore wrapper
class.
Key |
Type |
Description |
Comments |
|---|---|---|---|
function |
Function name. |
||
mutate |
Keep changes made by function. |
Disabled by default. |
Note
Additional keys depend on the chosen function implementation.
As an example, the next snippet lets us match table columns such as animal_id to the id placeholder by using a
value_fstring_alias() heuristic.
[[fetching.mapping.score_function_heuristics]]
function = "value_fstring_alias"
fstring = "{context}_{value}"
Hint
For difficult matches, consider using overrides instead.
Shared or context-specific key-value pairs implemented by the InheritedKeysDict
class. When used in config files, these appear as [*.overrides]-sections. Top-level override items are given in the
[*.overrides]-section, while context-specific items are specified using a subsection, eg
[*.overrides.<context-name>].
Note
The type of context is determined by the class that owns the overrides.
This next snipped is from another example. For unknown IDs, the name is set to ‘Name unknown’ for the ‘name_basics’ source and ‘Title unknown’ for the ‘title_basics’ source, respectively. They both inherit the from and to keys which rare set to ‘?’.
[unknown_ids.overrides]
from = "?"
to = "?"
[unknown_ids.overrides.name_basics]
name = "Name unknown"
[unknown_ids.overrides.title_basics]
name = "Title unknown"
Warning
Overrides have no fixed keys. No validation is performed and errors may be silent. The
mapping process provides detailed information in debug mode, which may be used to
discover issues.
Hint
Overrides may also be used to prevent mapping certain values.
For example, let’s assume that a SQL source table called title_basics with two columns title and name with
identical contents. We would like to use a format '[{title}. ]{name}' to output translations such as
‘Mr. Astaire’. To avoid output such as ‘Top Hat. Top Hat’ for movies, we may add
[fetching.mapping.overrides.movies]
title = "_"
to force the fetcher to inform the Translator that the title placeholder (column) does not exist for the
title_basics source (we used ‘_’ since TOML does not have a
null-type).