Fold sampling using the step-argument.#

Filtering every other Thursday, preferring later folds.

import pandas
from rics import configure_stuff
from rics.ml.time_split import log_split_progress, plot, split

configure_stuff(datefmt="")

data = pandas.date_range("2022-01", "2022-03")
config = dict(schedule="0 0 * * THU", after="5d", step=2, n_splits=3, available=data)

plot(**config, bar_labels="h", show_removed=True)
time_split.split(schedule='0 0 * * THU', after='5d', step=2, n_splits=3, available=pandas.DatetimeIndex)
👻 Configured some stuff just the way I like it!

<Axes: title={'center': "time_split.split(schedule='0 0 * * THU', after='5d', step=2, n_splits=3, available=pandas.DatetimeIndex)"}, ylabel='Fold'>

Specifying a step is useful especially when working with non-stationary but slow-moving distributions. The step argument will never reduce the number of folds below 1. The n_splits argument sets the upper limit on the number of folds created; fewer folds may be produced, depending on the outer range of the available data.

for fold in log_split_progress(split(**config), logger="my-logger"):
    print("Doing work..")
[my-logger:INFO] Begin fold 1/3: '2022-01-20' <= [schedule: '2022-01-27' (Thursday)] < '2022-02-01'.
Doing work..
[my-logger:INFO] Finished fold 1/3: [schedule: '2022-01-27' (Thursday)] after 23μs.
[my-logger:INFO] Begin fold 2/3: '2022-02-03' <= [schedule: '2022-02-10' (Thursday)] < '2022-02-15'.
Doing work..
[my-logger:INFO] Finished fold 2/3: [schedule: '2022-02-10' (Thursday)] after 23μs.
[my-logger:INFO] Begin fold 3/3: '2022-02-17' <= [schedule: '2022-02-24' (Thursday)] < '2022-03-01'.
Doing work..
[my-logger:INFO] Finished fold 3/3: [schedule: '2022-02-24' (Thursday)] after 23μs.

You may also use cron directly to filter a weekday based on the date. For example, '0 0 * * THU#2' will select the second Thursday of every month.

Total running time of the script: (0 minutes 0.434 seconds)

Gallery generated by Sphinx-Gallery