There are two principal steps involved in the mapping procedure: The Scoring procedure (see
Mapper.compute_scores) and the subsequent Matching procedure (see
Mapper.to_directional_mapping). The two are automatically combined
when using the Mapper.apply-function, though they may be invoked separately by users.
Colours mapped by
spectral distance (RGB).
The steps presented here is the hierarchical order. It does not necessarily represent the actual order in which things are computed.
Runtime overrides (type: UserOverrideFunction); set score = ∞ for the desired
match , and score = -∞ for others [1].
Static overrides (type: dict or InheritedKeysDict); set score = ∞
for the desired match, and score = -∞ for others [1].
Filtering (type: FilterFunction); set score = -∞ for undesirable matches only.
If there are any Heuristics (type: HeuristicScore), apply..
Short-circuiting (type:
FilterFunction); reinterpret aFilterFunctionsuch that the returned candidates (if any) are treated as overrides [1].Aliasing (type:
AliasFunction); try to improveScoreFunctionaccuracy by applying heuristics to the(value, candidates)-argument pairs.Finally, select the best score at each stage (from no to all heuristics) for each pair.
Regular score (type: ScoreFunction); higher is better.
The final output of the scoring procedure is a score matrix (a pandas DataFrame), where columns are candidates and
values make up the index.
store |
category |
customer |
staff |
film |
|
|---|---|---|---|---|---|
film_id |
0.100 |
0.040 |
0.040 |
0.100 |
1.000 ★ |
category_id |
0.125 |
1.000 ★ |
0.222 |
0.042 |
0.040 |
store_id |
1.000 ★ |
0.125 |
0.042 |
0.500 |
0.100 |
rental_date |
-∞ |
-∞ |
-∞ |
-∞ |
-∞ |
The full score matrix has over 100 values (rows). The table above contains a subset of 20. The 'rental_date' value
can be seen having only negative-infinity matching scores. This is intentional; the database has no suitable table for
translating dates. Mapping would’ve most likely failed regardless, but explicitly stating that 'rental_date' should
not be translated (by using a filter) is more efficient. More importantly, it is also clearer.
Given precomputed match scores (see the section above), make as many matches as possible given a Cardinality
restriction. These may be summarized as:
OneToOne = ‘1:1’. Each value and candidate may be used at most once.
OneToMany = ‘1:N’: Values have exclusive ownership of matched candidate(s).
ManyToOne = ‘N:1’: Ensure that as many values as possible are unambiguously
mapped (i.e. to a single candidate). This is the default option for new Mapper instances.
ManyToMany = ‘M:N’: All matches above the score limit are kept.
In theory, OneToMany and ManyToOne are equally restrictive. During mapping however, the goal is usually to
find matches for the values, not candidates. With that in mind, the ordering above may considered strictly decreasing
in preciseness.
Unmapped values are allowed by default. If mapping failure is not an acceptable outcome for your application, initialize
the Mapper with unmapped_values_action='raise' to ensure that an error is raised for unmapped values.
.details-messages#The 'rics.mapping.Mapper.accept.details' and 'rics.mapping.Mapper.unmapped.details' loggers emit per-combination
mapping scores when matches are made (accept.details), or when values are left without a match (unmapped.details).
Records from these loggers are always emitted on the debug-level.
'rics.mapping.Mapper.accept.details'-logger lists matches that were rejected in favour of the current match.#rics.mapping.Mapper.accept: Accepted: 'b' -> 'b'; score=1.000 >= 0.1.
rics.mapping.Mapper.accept.details: This match supersedes 4 other matches:
'b' -> 'ab'; score=0.500 (superseded on value='b').
'b' -> 'a'; score=0.000 < 0.1 (below threshold).
'b' -> 'fixed'; score=0.000 < 0.1 (below threshold).
'a' -> 'b'; score=-inf (superseded by short-circuit or override).
rics.mapping.Mapper: Match selection with cardinality='OneToOne' completed in 0.00369605 sec.
'rics.mapping.Mapper.unmapped.details'-logger explains why values were left unmapped.# rics.mapping.Mapper.unmapped.details: Could not map value='is_nice':
'is_nice' -> 'name'; score=0.125 < 1.0 (below threshold).
'is_nice' -> 'gender'; score=0.083 < 1.0 (below threshold).
'is_nice' -> 'id'; score=0.000 < 1.0 (below threshold).
rics.mapping.Mapper.unmapped: Could not map {'is_nice'} in context='humans' to any of candidates={'name', 'gender', 'id'}.
Unlike the unmapped.details-logger, the level of the records emitted by its parent (the unmapped-logger) is
determined by the Mapper.unmapped_values_action-attribute (
'ignore' emits on the debug-level).
If .details-logging is not enough, the last resort (before opening a debugger) is to enable verbose logging. The
recommended way of doing this is by using the enable_verbose_debug_messages()-method, which
acts as a context manager.
from rics.mapping import Mapper, support
with support.enable_verbose_debug_messages():
Mapper(<config>).apply(<values>, <candidates>)
Verbose mode enables debug-level log messages from individual functions involved in the decision making and mapping
procedure, describing the internal operation of the Mapper in great detail.
rics.mapping.Mapper.accept: Accepted: 'a' -> 'ab'; score=inf (short-circuit or override).
rics.mapping.filter_functions.require_regex_match: Refuse matching for name='a': Matches pattern=re.compile('.*a.*', re.IGNORECASE).
rics.mapping.HeuristicScore: Heuristics scores for value='staff_id': ['store': 0.00 -> 0.50 (+0.50), 'payment': 0.07 -> 0.07 (+0.00), 'inventory': 0.00 -> 0.07 (+0.07), 'language': 0.00 -> 0.08 (+0.08), 'category': 0.00 -> 0.04 (+0.04), 'film': 0.05 -> 0.10 (+0.05), 'address': 0.00 -> 0.08 (+0.08), 'rental': 0.00 -> 0.08 (+0.08), 'customer_list': 0.00 -> 0.02 (+0.02), 'staff': 0.00 -> 1.00 (+1.00), 'staff_list': 0.00 -> 0.03 (+0.03), 'city': 0.00 -> 0.10 (+0.10), 'country': 0.00 -> 0.06 (+0.06), 'customer': 0.00 -> 0.04 (+0.04), 'actor': 0.00 -> 0.17 (+0.17)]
rics.mapping.filter_functions.require_regex_match: Refuse matching for name='return_date': Does not match pattern=re.compile('.*_id$', re.IGNORECASE).
To permanently enable verbose logging, initialize with enable_verbose_logging=True.
Warning
Verbose mode may emit a large number of records and should be avoided except when required. For that reason, using
enable_verbose_logging is not recommended.
Footnotes