rics.ml.time_split.support#

Supporting functions.

These functions are used internally, but are exposed here as well so that user may create their own logic using the internal logic, or just to test things out.

Warning

Not part of the stable API.

This module may change without notice. Stick to the top-level rics.ml.time_split-module, or lock down your dependencies if you need to use the support module.

Functions

expand_limits(limits, *, flex)

Derive the "real" bounds of limits.

fold_weight(splits, *[, unit, available])

Compute fold weights.

to_string()

Pretty-print a fold.

expand_limits(limits: Tuple[Timestamp, Timestamp], *, flex: bool | Literal['auto'] | str | Tuple[str | Timedelta | timedelta | timedelta64, str | Timedelta | timedelta | timedelta64, str | Timedelta | timedelta | timedelta64] | Iterable[Tuple[str | Timedelta | timedelta | timedelta64, str | Timedelta | timedelta | timedelta64, str | Timedelta | timedelta | timedelta64]]) Tuple[Timestamp, Timestamp][source]#

Derive the “real” bounds of limits.

Flex options.#

Type

Description

True or 'auto'

Auto-flex using auto_flex-settings.

False

Do nothing; return limits unchanged.

str

A string round_to or round_to<tolerance, where round_to is the desired frequency of the limits and tolerance is the maximum amount by which to change the input limits.

list[tuple]

Passing tuples (start_at, round_to, tolerance) will use the largest tuple such that start_at > >= limits[1] - limits[0]. Other parameters are interpreted as above.

tuple

Like list[tuple], but with just one level.

Note

Passing flex=[auto_flex.day, auto_flex.hour] is equivalent to flex='auto'.

Parameters:
  • limits – A tuple (lo, hi) of timestamps.

  • flex – See the table above.

Returns:

Limits rounded according to the flex-argument.

Raises:

ValueError – For invalid limits.

Examples

>>> from pandas import Timestamp
>>> limits = Timestamp("2019-05-11"), Timestamp("2019-05-11 22:05:30")

Basic usage.

>>> expand_limits(limits, flex="d")
(Timestamp('2019-05-11 00:00:00'), Timestamp('2019-05-12 00:00:00'))

You may specify a maximum “distance” that limits may be expanded.

>>> expand_limits(limits, flex="d<1h")
(Timestamp('2019-05-11 00:00:00'), Timestamp('2019-05-11 22:05:30'))

Limits will never be rounded in the “wrong” direction..

>>> limits = Timestamp("2019-05-11"), Timestamp("2019-05-11 11:05:30")
>>> expand_limits(limits, flex="d")
(Timestamp('2019-05-11 00:00:00'), Timestamp('2019-05-11 11:05:30'))

…even if you make the tolerance large enough.

>>> expand_limits(limits, flex="d<14h")
(Timestamp('2019-05-11 00:00:00'), Timestamp('2019-05-11 11:05:30'))
fold_weight(splits: List[DatetimeSplitBounds], *, unit: str | Literal['rows', 'hours', 'days'] = 'hours', available: Iterable[str | Timestamp | datetime | date | datetime64] = None) List[DatetimeSplitCounts][source]#

Compute fold weights.

Parameters:
  • splits – List of DatetimeSplitBounds.

  • unit – Time unit of the returned count, or ‘rows’ (requires available data).

  • available – Available data. Required when unit='rows'.

Returns:

A list of tuples [(n_data_units, n_future_data_units), ...].

Raises:

ValueError – if unit='rows' and available=None.

to_string(bounds: str | Timestamp | datetime | date | datetime64 | DatetimeSplitBounds | Tuple[str | Timestamp | datetime | date | datetime64, str | Timestamp | datetime | date | datetime64, str | Timestamp | datetime | date | datetime64], mid: str | Timestamp | datetime | date | datetime64 | None = None, end: str | Timestamp | datetime | date | datetime64 | None = None, /, *, format: str = None) str[source]#

Pretty-print a fold.

Sample output.#
('2021-12-30' <= [schedule: '2022-01-04' (Tuesday)] < '2022-01-04 18:00:00')
Parameters:
  • bounds – A fold tuple (start, mid, end), or just start (followed by mid and end).

  • mid – Datetime-like. Must be None when bounds is a tuple.

  • end – Datetime-like. Must be None when bounds is a tuple.

  • format – A custom format to use. Use FOLD_FORMAT if None, but note that only the start, mid and end keys are available to this function.

Returns:

Formatted bounds string.

Raises:

TypeError – If an incorrect number of timestamps are given.