rics.mapping#

Mapping implementations for matching groups of elements.

For and introduction to mapping, see Mapping primer.

Classes

Cardinality(value)

Enumeration type for cardinality relationships.

HeuristicScore(score_function, heuristics)

Callable wrapper for computing heuristic scores.

DirectionalMapping([cardinality, ...])

A two-way mapping between hashable elements.

Mapper([score_function, ...])

Optimal value-candidate matching.

class Cardinality(value)[source]#

Bases: Enum

Enumeration type for cardinality relationships.

Cardinalities are comparable using numerical operators, and can be thought of as comparing “preciseness”. The less ambiguity there is for a given cardinality, the smaller it is in comparison to the others. The hierarchy is given by 1:1 < 1:N = N:1 < M:N. Note that 1:N and N:1 are considered equally precise.

Examples

Comparing cardinalities

>>> from rics.mapping import Cardinality
>>> Cardinality.ManyToOne
<Cardinality.ManyToOne: 'N:1'>
>>> Cardinality.OneToOne
<Cardinality.OneToOne: '1:1'>
>>> Cardinality.ManyToOne < Cardinality.OneToOne
False
ParseType#

Types that may be interpreted as a Cardinality.

alias of Union[str, Cardinality]

OneToOne = '1:1'#

One-to-one relationship.

OneToMany = '1:N'#

One-to-many relationship.

ManyToOne = 'N:1'#

Many-to-one relationship.

ManyToMany = 'M:N'#

Many-to-many relationship.

property many_left: bool#

Many-relationship on the left, True for N:1 and M:N.

property many_right: bool#

Many-relationship on the right, True for 1:N and M:N.

property one_left: bool#

One-relationship on the left, True for 1:1 and 1:N.

property one_right: bool#

One-relationship on the right, True for 1:1 and N:1.

property inverse: Cardinality#

Inverse cardinality. For symmetric cardinalities, self.inverse == self.

Returns:

Inverse cardinality.

See also

symmetric

property symmetric: bool#

Symmetry flag. For symmetric cardinalities, self.inverse == self.

Returns:

Symmetry flag.

See also

inverse

classmethod from_counts(left_count: int, right_count: int) Cardinality[source]#

Derive a Cardinality from counts.

Parameters:
  • left_count – Number of elements on the left-hand side.

  • right_count – Number of elements on the right-hand side.

Returns:

A Cardinality.

Raises:

ValueError – For counts < 1.

classmethod parse(arg: str | Cardinality, strict: bool = False) Cardinality[source]#

Convert to cardinality.

Parameters:
  • arg – Argument to parse.

  • strict – If True, arg must match exactly when it is given as a string.

Returns:

A Cardinality.

Raises:

ValueError – If the argument could not be converted.

class HeuristicScore(score_function: str | Callable[[ValueType, Iterable[CandidateType], ContextType | None], Iterable[float]], heuristics: Iterable[str | Callable[[ValueType, Iterable[CandidateType], ContextType | None], Tuple[ValueType, Iterable[CandidateType]]] | Callable[[ValueType, Iterable[CandidateType], ContextType | None], Set[CandidateType]] | Tuple[str | Callable[[ValueType, Iterable[CandidateType], ContextType | None], Tuple[ValueType, Iterable[CandidateType]]] | Callable[[ValueType, Iterable[CandidateType], ContextType | None], Set[CandidateType]], Dict[str, Any]]])[source]#

Bases: Generic[ValueType, CandidateType, ContextType]

Callable wrapper for computing heuristic scores.

Instances are callable. Signature is given by ScoreFunction.

Short-circuiting:

A mechanism for forced matching. Score is set to +∞ for short-circuited candidates, and -∞ for the rest. No further matching will be performed after this point, so ensure that all desired candidates are returned by chosen filters.

Procedure:
  1. Trigger short-circuiting if there is an exact value-candidate match.

  2. All heuristics are applied and scores are computed.

  3. If no short-circuiting is triggered in step 2, yield max score for each candidate.

Parameters:
  • score_function – A ScoreFunction to wrap.

  • heuristics – Iterable of heuristics or tuples (heuristic, kwargs) to apply to the (value, candidates) inputs of score_function.

Heuristic types:
  • An AliasFunction, which accepts and returns a tuple (value, candidates) to be evaluated.

  • A FilterFunction, which accepts a tuple (value, candidates) and returns a subset of candidates. If any candidates are returned, short-circuiting is triggered.

Notes

  • Heuristic function input order = application order.

  • You may add mutate=True to the heuristics kwargs to forward to the modifications made by that function.

property score_function: Callable[[ValueType, Iterable[CandidateType], ContextType | None], Iterable[float]]#

Return the underlying likeness score function.

add_heuristic(heuristic: str | Callable[[ValueType, Iterable[CandidateType], ContextType | None], Tuple[ValueType, Iterable[CandidateType]]] | Callable[[ValueType, Iterable[CandidateType], ContextType | None], Set[CandidateType]], kwargs: Dict[str, Any] | None = None) None[source]#

Add a new heuristic.

class DirectionalMapping(cardinality: str | Cardinality | None = None, left_to_right: Mapping[HL, Iterable[HR]] | None = None, right_to_left: Mapping[HR, Iterable[HL]] | None = None, _verify: bool = True)[source]#

Bases: Generic[HL, HR]

A two-way mapping between hashable elements.

Parameters:
  • cardinality – Explicit cardinality. Derive if None.

  • left_to_right – A left-to-right mapping of elements.

  • right_to_left – A right-to-left mapping of elements.

  • _verify – If False, input checks are disabled. Intended for internal use.

Raises:
property cardinality: Cardinality#

Cardinality with which this mapping was created.

Returns:

Cardinality with which this mapping was created.

property left: Tuple[HL, ...]#

Left-side elements in the mapping.

property right: Tuple[HR, ...]#

Right-side elements in the mapping.

property left_to_right: Dict[HL, Tuple[HR, ...]]#

Left-to-right element mappings.

property right_to_left: Dict[HR, Tuple[HL, ...]]#

Right-to-left element mappings.

property reverse: DirectionalMapping[HR, HL]#

Reverse the mapping by swapping the sides.

Returns:

A copy with data identical to the calling instance, but with sides inversed compared to the caller.

flatten() Dict[HL, HR][source]#

Return a flattened version of self as a dict.

Returns:

A dict {left: right}.

Raises:

CardinalityError – If cardinality is not OneToOne or ManyToOne.

select_left(elements: Iterable[HL], exclude: bool = False) DirectionalMapping[HL, HR][source]#

Perform a selection on left-side elements.

Parameters:
  • elements – Elements to select.

  • exclude – If True, return everything except the given elements.

Returns:

A new Mapping for the selection.

Raises:

KeyError – If any of the chosen elements do not exist and exclude=False.

select_right(elements: Iterable[HR], exclude: bool = False) DirectionalMapping[HL, HR][source]#

Perform a selection on right-side elements.

Parameters:
  • elements – Elements to select.

  • exclude – If True, return everything except the given elements.

Returns:

A new instance for the selection.

Raises:

KeyError – If any of the chosen elements do not exist and exclude=False.

class Mapper(score_function: str | Callable[[ValueType, Iterable[CandidateType], ContextType | None], Iterable[float]] = 'equality', score_function_kwargs: Dict[str, Any] | None = None, filter_functions: Iterable[Tuple[str | Callable[[ValueType, Iterable[CandidateType], ContextType | None], Set[CandidateType]], Dict[str, Any]]] = (), min_score: float = 0.9, overrides: InheritedKeysDict[ContextType, ValueType, CandidateType] | Dict[ValueType, CandidateType] | None = None, unmapped_values_action: Literal['ignore', 'warn', 'raise', 'IGNORE', 'WARN', 'RAISE'] | ActionLevel = ActionLevel.IGNORE, unknown_user_override_action: Literal['ignore', 'warn', 'raise', 'IGNORE', 'WARN', 'RAISE'] | ActionLevel = ActionLevel.RAISE, cardinality: str | Cardinality | None = Cardinality.ManyToOne, verbose_logging: bool = False)[source]#

Bases: Generic[ValueType, CandidateType, ContextType]

Optimal value-candidate matching.

For an introduction to mapping, see the Mapping primer page.

Parameters:
  • score_function – A callable which accepts a value k and an ordered collection of candidates c, returning a score s_i for each candidate c_i in c. Default: s_i = float(k == c_i). Higher=better match.

  • score_function_kwargs – Keyword arguments for score_function.

  • filter_functions – Function-kwargs pairs of filters to apply before scoring.

  • min_score – Minimum score s_i, as given by score(k, c_i), to consider k a match for c_i.

  • overrides – If a dict, assumed to be 1:1 mappings (value to candidate) which override the scoring logic. If InheritedKeysDict, the context passed to apply() is used to retrieve specific overrides.

  • unmapped_values_action – Action to take if mapping fails for any values.

  • unknown_user_override_action – Action to take if a UserOverrideFunction returns an unknown candidate. Unknown candidates, i.e. candidates not in the input candidates collection, will not be used unless ‘ignore’ is chosen. As such, ‘ignore’ should rather be interpreted as ‘allow’.

  • cardinality – Desired cardinality for mapped values. Derive for each matching if None.

  • verbose_logging – If True, enable verbose logging for the apply() function. Has no effect when the log level is above logging.DEBUG.

apply(values: Iterable[ValueType], candidates: Iterable[CandidateType], context: ContextType | None = None, override_function: Callable[[ValueType, Set[CandidateType], ContextType | None], CandidateType | None] | None = None, **kwargs: Any) DirectionalMapping[ValueType, CandidateType][source]#

Map values to candidates.

Parameters:
  • values – Iterable of elements to match to candidates.

  • candidates – Iterable of candidates to match with value. Duplicate elements will be discarded.

  • context – Context in which mapping is being done. Required when using context-sensitive overrides.

  • override_function – A callable that takes inputs (value, candidates, context) that returns either None (let the regular mapping logic decide) or one of the candidates. How non-candidates returned is handled is determined by the unknown_user_override_action property.

  • **kwargs – Runtime keyword arguments for score and filter functions. May be used to add information which is not known when the Mapper is initialized.

Returns:

A DirectionalMapping on the form {value: [matched_candidates, ...]}. May be turned into a plain dict {value: candidate} by using the DirectionalMapping.flatten() function (only if DirectionalMapping.cardinality is of type Cardinality.one_right).

Raises:
  • MappingError – If any values failed to match and unmapped_values_action='raise'.

  • BadFilterError – If a filter returns candidates that are not a subset of the original candidates.

  • UserMappingError – If override_function returns an unknown candidate and unknown_user_override_action != 'ignore'

  • ValueError – If passing context=None (the default) when context_sensitive_overrides is True.

compute_scores(values: Iterable[ValueType], candidates: Iterable[CandidateType], context: ContextType | None = None, override_function: Callable[[ValueType, Set[CandidateType], ContextType | None], CandidateType | None] | None = None, **kwargs: Any) DataFrame[source]#

Compute likeness scores.

Parameters:
  • values – Iterable of elements to match to candidates.

  • candidates – Iterable of candidates to match with value. Duplicate elements will be discarded.

  • context – Context in which mapping is being done.

  • override_function – A callable that takes inputs (value, candidates, context) that returns either None (let the regular mapping logic decide) or one of the candidates. How non-candidates returned is handled is determined by the unknown_user_override_action property.

  • **kwargs – Runtime keyword arguments for score and filter functions. May be used to add information which is not known when the Mapper is initialized.

Returns:

A DataFrame of value-candidate match scores, with DataFrame.index=values and DataFrame.columns=candidates.

Raises:
  • BadFilterError – If a filter returns candidates that are not a subset of the original candidates.

  • UserMappingError – If override_function returns an unknown candidate and unknown_user_override_action != 'ignore'

to_directional_mapping(scores: DataFrame) DirectionalMapping[ValueType, CandidateType][source]#

Create a DirectionalMapping from match scores.

Parameters:

scores – A score matrix, where scores.index are values and score.columns are treated as the candidates.

Returns:

A DirectionalMapping.

property cardinality: Cardinality | None#

Return upper cardinality bound during mapping.

property unmapped_values_action: ActionLevel#

Return the action to take if mapping fails for any values.

property unknown_user_override_action: ActionLevel#

Return the action to take if an override function returns an unknown candidate.

Unknown candidates, i.e. candidates not in the input candidates collection, will not be used unless ‘ignore’ is chosen. As such, ‘ignore’ should rather be interpreted as ‘allow’.

Returns:

Action to take if a user-defined override function returns an unknown candidate.

property context_sensitive_overrides: bool#

Return True if the overrides are context-sensitive.

property verbose_logging: bool#

Return True if verbose debug-level messages are enabled.

copy(**overrides: Any) Mapper[ValueType, CandidateType, ContextType][source]#

Make a copy of this Mapper.

Parameters:

overrides – Keyword arguments to use when instantiating the copy. Options that aren’t given will be taken from the current instance. See the Mapper class documentation for possible choices.

Returns:

A copy of this Mapper with overrides applied.

Modules

rics.mapping.exceptions

Mapping errors.

rics.mapping.filter_functions

Functions that remove candidates.

rics.mapping.heuristic_functions

Functions which perform heuristics for score functions.

rics.mapping.score_functions

Functions which return a likeness score.

rics.mapping.support

Functions and classes used by the Mapper for handling score matrices.

rics.mapping.types

Types used for mapping.