There are two major steps involved in to the mapping implemented by the Mapper class: The Scoring procedure
(see Mapper.compute_scores) and the subsequent
Matching procedure (see Mapper.to_directional_mapping). The
two are automatically combined when using the Mapper.apply-function, though they may
be invoked separately by users.
Colours mapped by
spectral distance (RGB).
The steps presented here is the hierarchical order. It does not necessarily represent the actual order in which things are computed.
Runtime overrides (type: UserOverrideFunction); set score = ∞ for the desired
match , and score = -∞ for others 1.
Static overrides (type: dict or InheritedKeysDict); set score = ∞
for the desired match, and score = -∞ for others 1.
Filtering (type: FilterFunction); set score = -∞ for undesirable matches only.
If there are any Heuristics (type: HeuristicScore), apply..
Short-circuiting (type:
FilterFunction); reinterpret aFilterFunctionsuch that the returned candidates (if any) are treated as overrides 1.Aliasing (type:
AliasFunction); try to improveScoreFunctionaccuracy by applying heuristics to the(value, candidates)arguments.Finally, select the best score at each stage (from no to all heuristics) for each pair.
Regular score (type: ScoreFunction); higher is better.
The final output of the scoring procedure is the score matrix, represented as a pandas DataFrame. Candidates are
set as the columns and values are assigned to the index.
store |
category |
customer |
staff |
film |
|
|---|---|---|---|---|---|
film_id |
0.100 |
0.040 |
0.040 |
0.100 |
1.000 ★ |
category_id |
0.125 |
1.000 ★ |
0.222 |
0.042 |
0.040 |
store_id |
1.000 ★ |
0.125 |
0.042 |
0.500 |
0.100 |
rental_date |
-∞ |
-∞ |
-∞ |
-∞ |
-∞ |
The full mapping matrix has over 100 values, the table above contains a selection of 20. The 'rental_date' value can
be seen having only negative-infinity matching scores. This is intentional; the database has no suitable table for
translating dates. Mapping would’ve most likely failed regardless, but explicitly stating that 'rental_date' should
not be translated (by using a filter) is more efficient.
Given precomputed match scores (see the section above), make as many matches as possible given a Cardinality
restriction. These may be summarized as:
OneToOne = ‘1:1’. Each value and candidate may be used at most once.
OneToMany = ‘1:N’: Values have exclusive ownership of matched candidate(s).
ManyToOne = ‘N:1’: Ensure that as many values as possible are unambiguously
mapped (i.e. to a single candidate). This is the default option for new Mapper instances.
ManyToMany = ‘M:N’: All matches above the score limit are kept.
In theory, OneToMany and ManyToOne are equally restrictive. During mapping however, the goal is usually to
find matches for the values, not candidates. With that in mind, the ordering above may considered strictly decreasing
in preciseness, even though (somewhat non-intuitively) they both compare as both less than and greater than each other:
>>> from rics.mapping import Cardinality
>>> Cardinality.ManyToOne < Cardinality.OneToMany
True
>>> Cardinality.ManyToOne > Cardinality.OneToMany
True
The log messages emitted during operation are the best way to diagnose mapping issues. Logging is controlled both by the
log level used, and by the enable_verbose_logging option, which is False by default.
The
'rics.mapping.Mapper.accept.details'-logger emits details about matches that were rejected in favour of the current match. Depending on the chosencardinality, this may affect both values and candidates. Example output:DEBUG rics.mapping.Mapper.accept: Accepted: 'b' -> 'b'; score=1.000 >= 0.1. DEBUG rics.mapping.Mapper.accept.details: This match supersedes 4 other matches: 'b' -> 'ab'; score=0.500 (superseded on value='b'). 'b' -> 'a'; score=0.000 < 0.1 (below threshold). 'b' -> 'fixed'; score=0.000 < 0.1 (below threshold). 'a' -> 'b'; score=-inf (superseded by short-circuit or override). DEBUG rics.mapping.Mapper: Match selection with cardinality='OneToOne' completed in 0.00369605 sec.The
'rics.mapping.Mapper.unmapped.details'-logger emits details about values that weren’t mapped to any candidates. The final action taken dependsunmapped_values_action. Example output:DEBUG rics.mapping.Mapper.unmapped.details: Could not map value='is_nice': 'is_nice' -> 'name'; score=0.125 < 1.0 (below threshold). 'is_nice' -> 'gender'; score=0.083 < 1.0 (below threshold). 'is_nice' -> 'id'; score=0.000 < 1.0 (below threshold). DEBUG rics.mapping.Mapper.unmapped: Could not map {'is_nice'} in context='humans' to any of candidates={'name', 'gender', 'id'}.Verbose mode may emit a large number of records. Example output:
DEBUG rics.mapping.Mapper.accept: Accepted: 'a' -> 'ab'; score=inf (short-circuit or override). DEBUG rics.mapping.filter_functions.require_regex_match: Refuse matching for name='a': Matches pattern=re.compile('.*a.*', re.IGNORECASE). DEBUG rics.mapping.HeuristicScore: Heuristics scores for value='staff_id': ['store': 0.00 -> 0.50 (+0.50), 'payment': 0.07 -> 0.07 (+0.00), 'inventory': 0.00 -> 0.07 (+0.07), 'language': 0.00 -> 0.08 (+0.08), 'category': 0.00 -> 0.04 (+0.04), 'film': 0.05 -> 0.10 (+0.05), 'address': 0.00 -> 0.08 (+0.08), 'rental': 0.00 -> 0.08 (+0.08), 'customer_list': 0.00 -> 0.02 (+0.02), 'staff': 0.00 -> 1.00 (+1.00), 'staff_list': 0.00 -> 0.03 (+0.03), 'city': 0.00 -> 0.10 (+0.10), 'country': 0.00 -> 0.06 (+0.06), 'customer': 0.00 -> 0.04 (+0.04), 'actor': 0.00 -> 0.17 (+0.17)] DEBUG rics.mapping.filter_functions.require_regex_match: Refuse matching for name='return_date': Does not match pattern=re.compile('.*_id$', re.IGNORECASE).
Footnotes