rics.ml.time_split.support#

Supporting functions.

These functions are used internally, but are exposed here as well so that user may create their own logic using the internal logic, or just to test things out.

Warning

Not part of the stable API.

This module may change without notice. Stick to the top-level rics.ml.time_split-module, or lock down your dependencies if you need to use the support module.

Functions

expand_limits(limits, *[, flex])

Derive the "real" bounds of limits.

fold_weight(splits, *[, unit, available])

Compute fold weights.

to_string()

Pretty-print a fold.

expand_limits(limits: tuple[Timestamp, Timestamp], *, flex: bool | Literal['auto'] | str | tuple[str | Timedelta | timedelta | timedelta64, str | Timedelta | timedelta | timedelta64, str | Timedelta | timedelta | timedelta64] | Iterable[tuple[str | Timedelta | timedelta | timedelta64, str | Timedelta | timedelta | timedelta64, str | Timedelta | timedelta | timedelta64]] = 'auto') tuple[Timestamp, Timestamp][source]#

Derive the “real” bounds of limits.

Parameters:
  • limits – A tuple (lo, hi) of timestamps.

  • flex – Flex arguments as described in the User guide. Also supports level-tuples [(start_at, round_to, tolerance)...]. Passing flex=[settings.auto_flex.day, settings.auto_flex.hour] is equivalent to flex='auto'.

Returns:

Limits rounded according to the flex-argument.

Raises:

ValueError – For invalid limits.

Examples

>>> from pandas import Timestamp
>>> limits = Timestamp("2019-05-11"), Timestamp("2019-05-11 22:05:30")

Basic usage.

>>> expand_limits(limits, flex="d")
(Timestamp('2019-05-11 00:00:00'), Timestamp('2019-05-12 00:00:00'))

You may specify a maximum “distance” that limits may be expanded.

>>> expand_limits(limits, flex="d<1h")
(Timestamp('2019-05-11 00:00:00'), Timestamp('2019-05-11 22:05:30'))

Limits will never be rounded in the “wrong” direction…

>>> limits = Timestamp("2019-05-11"), Timestamp("2019-05-11 11:05:30")
>>> expand_limits(limits, flex="d")
(Timestamp('2019-05-11 00:00:00'), Timestamp('2019-05-11 11:05:30'))

…even if you make the tolerance large enough.

>>> expand_limits(limits, flex="d<14h")
(Timestamp('2019-05-11 00:00:00'), Timestamp('2019-05-11 11:05:30'))
fold_weight(splits: list[DatetimeSplitBounds], *, unit: str | Literal['rows', 'hours', 'days'] = 'hours', available: Iterable[str | Timestamp | datetime | date | datetime64] | None = None) list[DatetimeSplitCounts][source]#

Compute fold weights.

Parameters:
  • splits – List of DatetimeSplitBounds.

  • unit – Time unit of the returned count, or ‘rows’ (requires available data).

  • available – Available data. Required when unit='rows'.

Returns:

A list of tuples [(n_data_units, n_future_data_units), ...].

Raises:

ValueError – if unit='rows' and available=None.

to_string(bounds: str | Timestamp | datetime | date | datetime64 | DatetimeSplitBounds | tuple[str | Timestamp | datetime | date | datetime64, str | Timestamp | datetime | date | datetime64, str | Timestamp | datetime | date | datetime64], mid: str | Timestamp | datetime | date | datetime64 | None = None, end: str | Timestamp | datetime | date | datetime64 | None = None, /, *, format: str | None = None) str[source]#

Pretty-print a fold.

Sample output.#
('2021-12-30' <= [schedule: '2022-01-04' (Tuesday)] < '2022-01-04 18:00:00')
Parameters:
  • bounds – A fold tuple (start, mid, end), or just start (followed by mid and end).

  • mid – Datetime-like. Must be None when bounds is a tuple.

  • end – Datetime-like. Must be None when bounds is a tuple.

  • format – A custom format to use. Use FOLD_FORMAT if None, but note that only the start, mid and end keys are available to this function.

Returns:

Formatted bounds string.

Raises:

TypeError – If an incorrect number of timestamps are given.