rics.ml.time_split.integration.split_data#

Base implementations for splitting generic data types.

Users may implement splitting of any data type by implementing suitable as_available and select functions.

Module Attributes

DataT

Type of data to split.

DataAsAvailableFn

A callable (data: DataT) -> DatetimeIterable.

DataSelectFn

A callable (data: DataT, left_inclusive: datetime, end_exclusive: datetime) -> DataT).

Functions

split_data(data, *[, log_progress])

Base implementation for splitting integrated data types.

Classes

DatetimeSplit(data, future_data, bounds)

Time-based split of a generic data type.

class DataT#

Type of data to split.

alias of TypeVar(‘DataT’)

DataAsAvailableFn#

A callable (data: DataT) -> DatetimeIterable.

alias of Callable[[DataT], Iterable[str | Timestamp | datetime | date | datetime64]]

DataSelectFn#

A callable (data: DataT, left_inclusive: datetime, end_exclusive: datetime) -> DataT).

alias of Callable[[DataT, datetime, datetime], DataT]

class DatetimeSplit(data: DataT, future_data: DataT, bounds: DatetimeSplitBounds)[source]#

Bases: NamedTuple, Generic[DataT]

Time-based split of a generic data type.

data: DataT#

Data before bounds.mid.

future_data: DataT#

Data after bounds.mid.

bounds: DatetimeSplitBounds#

The underlying bounds that produced this split.

split_data(data: DataT, *, log_progress: str | bool | dict[str, Any] | Logger | LoggerAdapter = False, as_available: Callable[[DataT], Iterable[str | Timestamp | datetime | date | datetime64]], select: Callable[[DataT, datetime, datetime], DataT], **kwargs: Unpack[DatetimeIndexSplitterKwargs]) Iterable[DatetimeSplit[DataT]][source]#

Base implementation for splitting integrated data types.

The required as_available and select callables provided perform the actual integration.

Parameters:
  • data – The data to split.

  • log_progress – Controls logging of fold progress. See log_split_progress() for details.

  • as_available – A callable (data: DataT) -> DatetimeIterable.

  • select – A callable (data: DataT, left_inclusive: datetime, end_exclusive: datetime) -> DataT).

  • **kwargs – Keyword arguments for split()-function.

Yields:

Tuples (data, future_data, bounds).

See also

To get started with your own integration, copy split_pandas() or split_polars() and use it as the baseline (click [source]) on the linked function.