source

DataLoader

 DataLoader (max_records:int=None)

*A class for loading and processing time series data with adjustable parameters.

This class handles loading CSV data, computing returns, and calculating rolling variance.

Attributes: max_records: Maximum number of records to keep (from the end of the dataset)*

# Create a data loader with default parameters and load the data
data_loader = DataLoader(max_records=9000)
source_df = data_loader.load_data("./data/ng_daily.csv")
source_df.head()
shape: (5, 3)
date price ret
date f64 f64
1997-01-07 3.82 null
1997-01-08 3.8 0.994764
1997-01-09 3.61 0.95
1997-01-10 3.92 1.085873
1997-01-13 4.0 1.020408

source

FeatureEngineer

 FeatureEngineer (transforms:list[__main__.DFFeature], n_shifts=3,
                  drop_nulls:bool=True)

*A class for creating lagged features from time series data.

This class handles the creation of lagged (shifted) features that can be used for GARCH-like models and other time series forecasting tasks.

Attributes: columns: List of column names to create lags for n_shifts: Number of lag periods to create drop_nulls: whether to drop the nulls after rolling window calculations*


source

Derivative

 Derivative (source_field:str, feature_name:str,
             requested_lag:int|None=None, step_size:int=1)

source

Identity

 Identity (source_field:str, feature_name:str,
           requested_lag:int|None=None, step_size:int=1)

source

LogReturn

 LogReturn (source_field:str, feature_name:str,
            requested_lag:int|None=None, step_size:int=1)

source

Square

 Square (source_field:str, feature_name:str, requested_lag:int|None=None,
         step_size:int=1)

source

Variance

 Variance (source_field:str, feature_name:str,
           requested_lag:int|None=None, step_size:int=1,
           rolling_variance_window:int=3)

source

QuantileTransformer

 QuantileTransformer (source_field:str, feature_name:str,
                      requested_lag:int|None=None, step_size:int=1,
                      n_quantiles:int=1000,
                      output_distribution:str='uniform')

A stateful quantile transformer using sklearn’s QuantileTransformer.


source

ZeroBasedMonth

 ZeroBasedMonth (source_field:str, feature_name:str,
                 requested_lag:int|None=None, step_size:int=1)

source

DFFeature

 DFFeature (source_field:str, feature_name:str,
            requested_lag:int|None=None, step_size:int=1)
feature_engineer = FeatureEngineer(
    transforms=[
        LogReturn(source_field="ret", feature_name="log_ret"),
        Variance(source_field="price", feature_name="var", requested_lag=0),
        QuantileTransformer(
            source_field="var", feature_name="var_quantile", requested_lag=0
        ),
    ],
    n_shifts=3,
)
df_with_features = feature_engineer.create_features(source_df)
df_with_features.head()
shape: (5, 9)
date price ret log_ret var var_quantile prev_log_ret_1 prev_log_ret_2 prev_log_ret_3
date f64 f64 f64 f64 f64 f64 f64 f64
1997-01-13 4.0 1.020408 0.020203 0.042433 0.860861 0.082384 -0.051293 -0.005249
1997-01-14 4.01 1.0025 0.002497 0.002433 0.316111 0.020203 0.082384 -0.051293
1997-01-15 4.34 1.082294 0.079083 0.037433 0.847429 0.002497 0.020203 0.082384
1997-01-16 4.71 1.085253 0.081814 0.122633 0.944662 0.079083 0.002497 0.020203
1997-01-17 3.91 0.830149 -0.186151 0.1603 0.95773 0.081814 0.079083 0.002497
qt = QuantileTransformer(source_field="price", feature_name="price_quantile")
qt.fit(source_df)
source_df = qt.extract(source_df)
source_df.head()
shape: (5, 4)
date price ret price_quantile
date f64 f64 f64
1997-01-07 3.82 null 0.573073
1997-01-08 3.8 0.994764 0.569069
1997-01-09 3.61 0.95 0.535536
1997-01-10 3.92 1.085873 0.591091
1997-01-13 4.0 1.020408 0.607107

source

append_from_log_ret

 append_from_log_ret (df:polars.dataframe.frame.DataFrame,
                      new_log_ret:float, inherit_vals:list[str],
                      add_variables:dict[str,float])

*Adds a new record to the dataframe based on a log return value.

Args: df: Input DataFrame containing time series data new_log_ret: The new log return value to add

Returns: DataFrame with a new row appended*


source

binary_feature_from_date_ranges

 binary_feature_from_date_ranges
                                  (date_range:tuple[datetime.date,datetime
                                  .date], periods:list[tuple[datetime.date
                                  ,datetime.date]],
                                  feature_name:str='feature')
binary_feature_from_date_ranges(
    date_range=(date(2010, 1, 1), date(2026, 1, 1)),
    periods=[
        (date(2022, 2, 24), date(2026, 1, 1)),
    ],
    feature_name="RU/UA_war",
)
shape: (5_845, 2)
date RU/UA_war
date i64
2010-01-01 0
2010-01-02 0
2010-01-03 0
2010-01-04 0
2010-01-05 0
2025-12-28 1
2025-12-29 1
2025-12-30 1
2025-12-31 1
2026-01-01 1