source
DataLoader
DataLoader (max_records:int=None)
*A class for loading and processing time series data with adjustable parameters.
This class handles loading CSV data, computing returns, and calculating rolling variance.
Attributes: max_records: Maximum number of records to keep (from the end of the dataset)*
# Create a data loader with default parameters and load the data
data_loader = DataLoader(max_records= 9000 )
source_df = data_loader.load_data("./data/ng_daily.csv" )
source_df.head()
shape: (5, 3)
date
f64
f64
1997-01-07
3.82
null
1997-01-08
3.8
0.994764
1997-01-09
3.61
0.95
1997-01-10
3.92
1.085873
1997-01-13
4.0
1.020408
source
FeatureEngineer
FeatureEngineer (transforms:list[__main__.DFFeature], n_shifts=3,
drop_nulls:bool=True)
*A class for creating lagged features from time series data.
This class handles the creation of lagged (shifted) features that can be used for GARCH-like models and other time series forecasting tasks.
Attributes: columns: List of column names to create lags for n_shifts: Number of lag periods to create drop_nulls: whether to drop the nulls after rolling window calculations*
source
Derivative
Derivative (source_field:str, feature_name:str,
requested_lag:int|None=None, step_size:int=1)
source
Identity
Identity (source_field:str, feature_name:str,
requested_lag:int|None=None, step_size:int=1)
source
LogReturn
LogReturn (source_field:str, feature_name:str,
requested_lag:int|None=None, step_size:int=1)
source
Square
Square (source_field:str, feature_name:str, requested_lag:int|None=None,
step_size:int=1)
source
Variance
Variance (source_field:str, feature_name:str,
requested_lag:int|None=None, step_size:int=1,
rolling_variance_window:int=3)
source
ZeroBasedMonth
ZeroBasedMonth (source_field:str, feature_name:str,
requested_lag:int|None=None, step_size:int=1)
source
DFFeature
DFFeature (source_field:str, feature_name:str,
requested_lag:int|None=None, step_size:int=1)
feature_engineer = FeatureEngineer(
transforms= [
LogReturn(source_field= "ret" , feature_name= "log_ret" ),
Variance(source_field= "price" , feature_name= "var" , requested_lag= 0 ),
QuantileTransformer(
source_field= "var" , feature_name= "var_quantile" , requested_lag= 0
),
],
n_shifts= 3 ,
)
df_with_features = feature_engineer.create_features(source_df)
df_with_features.head()
shape: (5, 9)
date
f64
f64
f64
f64
f64
f64
f64
f64
1997-01-13
4.0
1.020408
0.020203
0.042433
0.860861
0.082384
-0.051293
-0.005249
1997-01-14
4.01
1.0025
0.002497
0.002433
0.316111
0.020203
0.082384
-0.051293
1997-01-15
4.34
1.082294
0.079083
0.037433
0.847429
0.002497
0.020203
0.082384
1997-01-16
4.71
1.085253
0.081814
0.122633
0.944662
0.079083
0.002497
0.020203
1997-01-17
3.91
0.830149
-0.186151
0.1603
0.95773
0.081814
0.079083
0.002497
qt = QuantileTransformer(source_field= "price" , feature_name= "price_quantile" )
qt.fit(source_df)
source_df = qt.extract(source_df)
source_df.head()
shape: (5, 4)
date
f64
f64
f64
1997-01-07
3.82
null
0.573073
1997-01-08
3.8
0.994764
0.569069
1997-01-09
3.61
0.95
0.535536
1997-01-10
3.92
1.085873
0.591091
1997-01-13
4.0
1.020408
0.607107
source
append_from_log_ret
append_from_log_ret (df:polars.dataframe.frame.DataFrame,
new_log_ret:float, inherit_vals:list[str],
add_variables:dict[str,float])
*Adds a new record to the dataframe based on a log return value.
Args: df: Input DataFrame containing time series data new_log_ret: The new log return value to add
Returns: DataFrame with a new row appended*
source
binary_feature_from_date_ranges
binary_feature_from_date_ranges
(date_range:tuple[datetime.date,datetime
.date], periods:list[tuple[datetime.date
,datetime.date]],
feature_name:str='feature')
binary_feature_from_date_ranges(
date_range= (date(2010 , 1 , 1 ), date(2026 , 1 , 1 )),
periods= [
(date(2022 , 2 , 24 ), date(2026 , 1 , 1 )),
],
feature_name= "RU/UA_war" ,
)
shape: (5_845, 2)
date
i64
2010-01-01
0
2010-01-02
0
2010-01-03
0
2010-01-04
0
2010-01-05
0
…
…
2025-12-28
1
2025-12-29
1
2025-12-30
1
2025-12-31
1
2026-01-01
1