rics.mapping#
Mapping implementations for matching groups of elements.
For and introduction to mapping, see Mapping primer.
Classes
|
Enumeration type for cardinality relationships. |
|
A two-way mapping between hashable elements. |
|
Callable wrapper for computing heuristic scores. |
|
Optimal value-candidate matching. |
- class Cardinality(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]#
Bases:
EnumEnumeration type for cardinality relationships.
Cardinalities are comparable using numerical operators, and can be thought of as comparing “preciseness”. The less ambiguity there is for a given cardinality, the smaller it is in comparison to the others. The hierarchy is given by
1:1 < 1:N = N:1 < M:N. Note that1:NandN:1are considered equally precise.Examples
Comparing cardinalities
>>> from rics.mapping import Cardinality >>> Cardinality.ManyToOne <Cardinality.ManyToOne: 'N:1'> >>> Cardinality.OneToOne <Cardinality.OneToOne: '1:1'> >>> Cardinality.ManyToOne < Cardinality.OneToOne False
- ParseType#
Types that may be interpreted as a
Cardinality.alias of
Union[str,Cardinality]
- OneToOne = '1:1'#
One-to-one relationship.
- OneToMany = '1:N'#
One-to-many relationship.
- ManyToOne = 'N:1'#
Many-to-one relationship.
- ManyToMany = 'M:N'#
Many-to-many relationship.
- property inverse: Cardinality#
Inverse cardinality. For symmetric cardinalities,
self.inverse == self.- Returns:
Inverse cardinality.
See also
- property symmetric: bool#
Symmetry flag. For symmetric cardinalities,
self.inverse == self.- Returns:
Symmetry flag.
See also
- classmethod from_counts(left_count: int, right_count: int) Cardinality[source]#
Derive a Cardinality from counts.
- Parameters:
left_count – Number of elements on the left-hand side.
right_count – Number of elements on the right-hand side.
- Returns:
A
Cardinality.- Raises:
ValueError – For counts < 1.
- classmethod parse(arg: str | Cardinality, strict: bool = False) Cardinality[source]#
Convert to cardinality.
- Parameters:
arg – Argument to parse.
strict – If
True, arg must match exactly when it is given as a string.
- Returns:
A
Cardinality.- Raises:
ValueError – If the argument could not be converted.
- class HeuristicScore(score_function: str | Callable[[ValueType, Iterable[CandidateType], ContextType | None], Iterable[float]], heuristics: Iterable[str | Callable[[ValueType, Iterable[CandidateType], ContextType | None], Tuple[ValueType, Iterable[CandidateType]]] | Callable[[ValueType, Iterable[CandidateType], ContextType | None], Set[CandidateType]] | Tuple[str | Callable[[ValueType, Iterable[CandidateType], ContextType | None], Tuple[ValueType, Iterable[CandidateType]]] | Callable[[ValueType, Iterable[CandidateType], ContextType | None], Set[CandidateType]], Dict[str, Any]]])[source]#
Bases:
Generic[ValueType,CandidateType,ContextType]Callable wrapper for computing heuristic scores.
Instances are callable. Signature is given by
ScoreFunction.- Short-circuiting:
A mechanism for forced matching. Score is set to +∞ for short-circuited candidates, and -∞ for the rest. No further matching will be performed after this point, so ensure that all desired candidates are returned by chosen filters.
- Procedure:
Trigger
short-circuitingif there is an exact value-candidate match.All heuristics are applied and scores are computed.
If no
short-circuitingis triggered in step 2, yield max score for each candidate.
- Parameters:
score_function – A
ScoreFunctionto wrap.heuristics – Iterable of heuristics or tuples
(heuristic, kwargs)to apply to the(value, candidates)inputs of score_function.
- Heuristic types:
An
AliasFunction, which accepts and returns a tuple (value, candidates) to be evaluated.A
FilterFunction, which accepts a tuple (value, candidates) and returns a subset of candidates. If any candidates are returned,short-circuitingis triggered.
Notes
Heuristic function input order = application order.
You may add
mutate=Trueto the heuristics kwargs to forward to the modifications made by that function.
- property score_function: Callable[[ValueType, Iterable[CandidateType], ContextType | None], Iterable[float]]#
Return the underlying likeness score function.
- add_heuristic(heuristic: str | Callable[[ValueType, Iterable[CandidateType], ContextType | None], Tuple[ValueType, Iterable[CandidateType]]] | Callable[[ValueType, Iterable[CandidateType], ContextType | None], Set[CandidateType]], kwargs: Dict[str, Any] = None) None[source]#
Add a new heuristic.
- class DirectionalMapping(cardinality: str | Cardinality = None, left_to_right: Mapping[HL, Iterable[HR]] = None, right_to_left: Mapping[HR, Iterable[HL]] = None, _verify: bool = True)[source]#
-
A two-way mapping between hashable elements.
- Parameters:
cardinality – Explicit cardinality. Derive if
None.left_to_right – A left-to-right mapping of elements.
right_to_left – A right-to-left mapping of elements.
_verify – If
False, input checks are disabled. Intended for internal use.
- Raises:
ValueError – If both of left_to_right and right_to_left are
None.ValueError – If verification of two-sided input fails, and
verify=True.CardinalityError – If explicit cardinality <
cardinality, andverify=True.
- property cardinality: Cardinality#
Cardinality with which this mapping was created.
- Returns:
Cardinality with which this mapping was created.
- property reverse: DirectionalMapping[HR, HL]#
Reverse the mapping by swapping the sides.
- Returns:
A copy with data identical to the calling instance, but with sides inversed compared to the caller.
- flatten() Dict[HL, HR][source]#
Return a flattened version of self as a dict.
- Returns:
A dict
{left: right}.- Raises:
CardinalityError – If cardinality is not
OneToOneorManyToOne.
- select_left(elements: Iterable[HL], exclude: bool = False) DirectionalMapping[HL, HR][source]#
Perform a selection on left-side elements.
- Parameters:
elements – Elements to select.
exclude – If
True, return everything except the given elements.
- Returns:
A new Mapping for the selection.
- Raises:
KeyError – If any of the chosen elements do not exist and
exclude=False.
- select_right(elements: Iterable[HR], exclude: bool = False) DirectionalMapping[HL, HR][source]#
Perform a selection on right-side elements.
- Parameters:
elements – Elements to select.
exclude – If
True, return everything except the given elements.
- Returns:
A new instance for the selection.
- Raises:
KeyError – If any of the chosen elements do not exist and
exclude=False.
- class Mapper(score_function: str | Callable[[ValueType, Iterable[CandidateType], ContextType | None], Iterable[float]] = 'equality', score_function_kwargs: Dict[str, Any] = None, filter_functions: Iterable[Tuple[str | Callable[[ValueType, Iterable[CandidateType], ContextType | None], Set[CandidateType]], Dict[str, Any]]] = (), min_score: float = 0.9, overrides: InheritedKeysDict[ContextType, ValueType, CandidateType] | Dict[ValueType, CandidateType] = None, unmapped_values_action: Literal['ignore', 'warn', 'raise', 'IGNORE', 'WARN', 'RAISE'] | ActionLevel = ActionLevel.IGNORE, unknown_user_override_action: Literal['ignore', 'warn', 'raise', 'IGNORE', 'WARN', 'RAISE'] | ActionLevel = ActionLevel.RAISE, cardinality: str | Cardinality | None = Cardinality.ManyToOne, verbose_logging: bool = False)[source]#
Bases:
Generic[ValueType,CandidateType,ContextType]Optimal value-candidate matching.
For an introduction to mapping, see the Mapping primer page.
- Parameters:
score_function – A callable which accepts a value k and an ordered collection of candidates c, returning a score
s_ifor each candidate c_i in c. Default:s_i = float(k == c_i). Higher=better match.score_function_kwargs – Keyword arguments for score_function.
filter_functions – Function-kwargs pairs of filters to apply before scoring.
min_score – Minimum score s_i, as given by
score(k, c_i), to consider k a match for c_i.overrides – If a dict, assumed to be 1:1 mappings (value to candidate) which override the scoring logic. If
InheritedKeysDict, the context passed toapply()is used to retrieve specific overrides.unmapped_values_action – Action to take if mapping fails for any values.
unknown_user_override_action – Action to take if a
UserOverrideFunctionreturns an unknown candidate. Unknown candidates, i.e. candidates not in the input candidates collection, will not be used unless ‘ignore’ is chosen. As such, ‘ignore’ should rather be interpreted as ‘allow’.cardinality – Desired cardinality for mapped values. Derive for each matching if
None.verbose_logging – If
True, enable verbose logging for theapply()function. Has no effect when the log level is abovelogging.DEBUG.
- apply(values: Iterable[ValueType], candidates: Iterable[CandidateType], context: ContextType = None, override_function: Callable[[ValueType, Set[CandidateType], ContextType | None], CandidateType | None] = None, **kwargs: Any) DirectionalMapping[ValueType, CandidateType][source]#
Map values to candidates.
- Parameters:
values – Iterable of elements to match to candidates.
candidates – Iterable of candidates to match with value. Duplicate elements will be discarded.
context – Context in which mapping is being done. Required when using context-sensitive overrides.
override_function – A callable that takes inputs
(value, candidates, context)that returns eitherNone(let the regular mapping logic decide) or one of the candidates. How non-candidates returned is handled is determined by theunknown_user_override_actionproperty.**kwargs – Runtime keyword arguments for score and filter functions. May be used to add information which is not known when the
Mapperis initialized.
- Returns:
A
DirectionalMappingon the form{value: [matched_candidates, ...]}. May be turned into a plain dict{value: candidate}by using theDirectionalMapping.flatten()function (only ifDirectionalMapping.cardinalityis of typeCardinality.one_right).- Raises:
MappingError – If any values failed to match and
unmapped_values_action='raise'.BadFilterError – If a filter returns candidates that are not a subset of the original candidates.
UserMappingError – If override_function returns an unknown candidate and
unknown_user_override_action != 'ignore'ValueError – If passing
context=None(the default) whencontext_sensitive_overridesisTrue.
- compute_scores(values: Iterable[ValueType], candidates: Iterable[CandidateType], context: ContextType = None, override_function: Callable[[ValueType, Set[CandidateType], ContextType | None], CandidateType | None] = None, **kwargs: Any) DataFrame[source]#
Compute likeness scores.
- Parameters:
values – Iterable of elements to match to candidates.
candidates – Iterable of candidates to match with value. Duplicate elements will be discarded.
context – Context in which mapping is being done.
override_function – A callable that takes inputs
(value, candidates, context)that returns eitherNone(let the regular mapping logic decide) or one of the candidates. How non-candidates returned is handled is determined by theunknown_user_override_actionproperty.**kwargs – Runtime keyword arguments for score and filter functions. May be used to add information which is not known when the
Mapperis initialized.
- Returns:
A
DataFrameof value-candidate match scores, withDataFrame.index=valuesandDataFrame.columns=candidates.- Raises:
BadFilterError – If a filter returns candidates that are not a subset of the original candidates.
UserMappingError – If override_function returns an unknown candidate and
unknown_user_override_action != 'ignore'
- to_directional_mapping(scores: DataFrame) DirectionalMapping[ValueType, CandidateType][source]#
Create a
DirectionalMappingfrom match scores.- Parameters:
scores – A score matrix, where
scores.indexare values andscore.columnsare treated as the candidates.- Returns:
A
DirectionalMapping.
See also
- property cardinality: Cardinality | None#
Return upper cardinality bound during mapping.
- property unmapped_values_action: ActionLevel#
Return the action to take if mapping fails for any values.
- property unknown_user_override_action: ActionLevel#
Return the action to take if an override function returns an unknown candidate.
Unknown candidates, i.e. candidates not in the input candidates collection, will not be used unless ‘ignore’ is chosen. As such, ‘ignore’ should rather be interpreted as ‘allow’.
- Returns:
Action to take if a user-defined override function returns an unknown candidate.
- copy(**overrides: Any) Mapper[ValueType, CandidateType, ContextType][source]#
Make a copy of this
Mapper.- Parameters:
overrides – Keyword arguments to use when instantiating the copy. Options that aren’t given will be taken from the current instance. See the
Mapperclass documentation for possible choices.- Returns:
A copy of this
Mapperwith overrides applied.
Modules
Mapping errors. |
|
Functions that remove candidates. |
|
Functions which perform heuristics for score functions. |
|
Functions which return a likeness score. |
|
Functions and classes used by the |
|
Types used for mapping. |