rics.ml.time_split.integration.sklearn#
Integration with the scikit-learn library.
Classes
|
A scikit-learn compatible datetime splitter. |
- class ScikitLearnSplitter(schedule: DatetimeIndex | Iterable[str | Timestamp | datetime | date | datetime64] | str | Timedelta | timedelta | timedelta64, *, before: int | Literal['all'] | str | Timedelta | timedelta | timedelta64 = '7d', after: int | Literal['all'] | str | Timedelta | timedelta | timedelta64 = 1, n_splits: int | None = None, flex: bool | Literal['auto'] | str = 'auto', step: int = 1, log_progress: str | bool | Dict[str, Any] | Logger | LoggerAdapter = False, verify_xy: bool = True)[source]#
Bases:
objectA scikit-learn compatible datetime splitter.
This class may be used to create temporal folds from heterogeneous/unaggregated data, typically used for training models (e.g. on raw transaction data). If your data is a well-formed time series, consider using the TimeSeriesSplit class from scikit-learn instead.
Note
Test coverage is limited for this integration. Please report issues to rsundqvist/rics#new.
If a
pandastype is passed to theScikitLearnSplitter.split()-method, the index will be used.- Parameters:
schedule – A collection of timestamps, a pandas offset alias, or a cron expression.
before – Range before schedule timestamps. Either a pandas offset alias, an integer (schedule-based offsets), or ‘all’ (requires available data).
after – Range after schedule timestamps. Either a pandas offset alias, an integer (schedule-based offsets), or ‘all’ (requires available data).
step – Select a subset of folds, preferring folds later in the schedule.
n_splits – Maximum number of folds, preferring folds later in the schedule.
flex – A pandas offset alias used to expand available data to its likely “true” limits. Pass
Falseto disable.log_progress – Controls logging of fold progress. See
log_split_progress()for details.verify_xy – If
True, split X and y independently and verify that they are equal.
For more information about the schedule, before/after and flex-arguments, see the User guide.
- get_n_splits(X: Iterable[str | Timestamp | datetime | date | datetime64] = None, y: Iterable[str | Timestamp | datetime | date | datetime64] = None, groups: Any = None) int[source]#
Returns the number of splitting iterations in the cross-validator.
Equivalent to
len(list(split(X, y, groups)).- Parameters:
X – Training data (features).
y – Target variable.
groups – Always ignored, exists for compatibility.
- Returns:
Number of splits with given arguments.
- Raises:
ValueError – If both X and y are
None.ValueError – If splits of X and y are not equal when
verify_xy=True.
- split(X: Iterable[str | Timestamp | datetime | date | datetime64] = None, y: Iterable[str | Timestamp | datetime | date | datetime64] = None, groups: Any = None) Iterable[Tuple[Sequence[int], Sequence[int]]][source]#
Generate indices to split data into training and test set.
- Parameters:
X – Training data (features).
y – Target variable.
groups – Always ignored, exists for compatibility.
- Yields:
The training/test set indices for that split.
- Raises:
ValueError – If both X and y are
None.ValueError – If splits of X and y are not equal when
verify_xy=True.TypeError – If X or y have an
index-attribute, but index elements are not datetime-like.