Association

TemporalSlowAssociation

Bases: Association

Temporal data mixed-type association.

Notes

For continuous data, we use Pearson Correlation with mass implementation. For categorical data, we use an ANOVA test between the residuals and the tested series.

`init(config: dict)`

Parameters:

Name	Type	Description	Default
`config`	`dict`	Must contain an entry for: - "lags": int, the number of lags to compute the correlation over - "categorical_method": str, any of 'f_oneway', 'kruskal', 'alexandergovern'. This specifies the kind of test used for categorical data. - "numerical_method": str, any of 'pearsonr', 'spearmanr'. - "variable_types": dict, for each variable name, whether it is "numerical" or "categorical". See examples. - "n_jobs": int, the number of processors used in parallel. Must be different from 0. See joblib.Parallel for more information. - "check_na": bool, if True, checks that there is no NaN in the variables and residuals DataFrames.	required

Returns:

Type	Description
`None`

Examples:

>>> data = pd.DataFrame(np.random.random(size=(1000,5)),columns=["target","1","2","3","4"])
>>> variable_types = dict([(column, "numerical") for column in data.columns])
>>> asso = TemporalSlowAssociation({"lags":10,"categorical_method":"f_oneway","variable_types":variable_types})
>>> asso

Or with mixed types:

>>> numerical = pd.DataFrame(np.random.random(size=(1000,3)),columns=["target","1","2"])
>>> categorical = pd.DataFrame(np.random.randint(size=(1000,2)),columns=["3","4"])
>>> data = pd.concat([numerical,categorical], axis="columns")
>>> variable_types = {"target":"numerical","1":"numerical","2":"numerical","3":"categorical","4":"categorical"}
>>> asso = TemporalSlowAssociation({"lags":10,"categorical_method":"f_oneway","variable_types":variable_types})
>>> asso

`association(residuals_df: pd.DataFrame, variables_df: pd.DataFrame) -> np.array`

Computes the association score between the residuals and candidate time series.

Parameters:

Name	Type	Description	Default
`residuals_df`	`DataFrame`	DataFrame of shape (ntimesteps, 1) containing the model residuals of a learning model. The index must be aligned with variables_df.	required
`variables_df`	`DataFrame`	DataFrame of shape (ntimesteps, D) containing the D time series to test for association with the residuals. The index must be aligned with residuals_df	required

Returns:

Name	Type	Description
`pvalues`	`array`	A 1D numpy array containing minus the minimal p-value across lags, for each of the D time series to test. The coefficients are in the same order as the columns in variables_df.columns. We return minus the p-value by convention, as the maximal -pvalue correspond to the maximal association.

Examples:

>>> rng = np.random.default_rng(0)
>>> data = pd.DataFrame(rng.random(size=(1000,5)),columns=["target","1","2","3","4"])
>>> variable_types = dict([(column, "numerical") for column in data.columns])
>>> asso = TemporalSlowAssociation({"lags":10,"categorical_method":"f_oneway","variable_types":variable_types})
>>> asso.association(data[["target"]], data[["1","2","3","4"]])
array([-0.03384917, -0.02838155, -0.0633841 , -0.15107386])

Or with mixed types:

>>> rng = np.random.default_rng(0)
>>> numerical = pd.DataFrame(rng.random(size=(1000,3)),columns=["target","1","2"])
>>> categorical = pd.DataFrame(rng.integers(0,3,size=(1000,2)),columns=["3","4"])
>>> data = pd.concat([numerical,categorical], axis="columns")
>>> variable_types = {"target":"numerical","1":"numerical","2":"numerical","3":"categorical","4":"categorical"}
>>> asso = TemporalSlowAssociation({"lags":10,"categorical_method":"f_oneway","variable_types":variable_types})
>>> asso.association(data[["target"]], data[["1","2","3","4"]])
array([-0.03111284, -0.04568282, -0.03302831, -0.02551908])

CrossSectionalAssociation

Bases: Association

Cross-sectional, mixed-type, grouped data association.

Notes

This class is intended for use with two-level column index dataframes. The first level corresponds to groups of features, over which the association is computed. See documentation on data format for precisions.

For continuous data, we use Pearson Correlation. For categorical data, we use an ANOVA test between the residuals and the tested series.

`init(config: dict)`

Parameters:

Name	Type	Description	Default
`config`	`dict`	Must contain an entry for: - "categorical_method": str, any of 'f_oneway', 'kruskal', 'alexandergovern'. This specifies the kind of test used for categorical data. - "numerical_method": str, any of 'pearsonr', 'spearmanr'. This specifies the kind of test used for numerical data. - "variable_types": dict, for each group name (first level of the column index), whether it is "numerical" or "categorical". This implies that all columns in a group must belong to the same type (numerical or categorical). See examples. - "n_jobs": int, the number of jobs for parallelism. See joblib.Parallel for details.	required

Returns:

Type	Description
`None`

Examples:

>>> data = pd.DataFrame(np.random.random(size=(1000,5)),columns=pd.MultiIndex.from_tuples([("target",""),("G1","a"),("G1","b"),("G2","a"),("G2","b")]))
>>> variable_types = dict([(group, "numerical") for group in data.columns.get_level_values(0).unique()])
>>> asso = CrossSectionalAssociation({"categorical_method":"f_oneway","variable_types":variable_types})
>>> asso

Or with mixed types:

>>> numerical = pd.DataFrame(np.random.random(size=(1000,3)),columns=pd.MultiIndex.from_tuples([("target",None),("G1","a"),("G1","b")]))
>>> categorical = pd.DataFrame(np.random.randint(0,5,size=(1000,3)),columns=pd.MultiIndex.from_tuples([("G2","a"),("G2","b"),("G2","c")]))
>>> data = pd.concat([numerical,categorical], axis="columns")
>>> variable_types = {"target":"numerical","G1":"numerical","G2":"categorical"}
>>> asso =  CrossSectionalAssociation({"categorical_method":"f_oneway","variable_types":variable_types})
>>> asso

`association(residuals_df: pd.DataFrame, variables_df: pd.DataFrame) -> np.ndarray`

Computes the association score between the residuals and candidate time series.

Parameters:

Name	Type	Description	Default
`residuals_df`	`DataFrame`	DataFrame of shape (nsamples, 1) containing the model residuals of a learning model. The index must be aligned with variables_df.	required
`variables_df`	`DataFrame`	DataFrame of shape (nsamples, D) containing the D features to test for association with the residuals. The index must be aligned with residuals_df. The columns must be a pd.MultiIndex instance with two levels. See documentation on data format for precisions.	required

Returns:

Name	Type	Description
`pvalues`	`array`	A 1D numpy array containing minus the minimal p-value for each group defined by the first level column index. The coefficients are in the same order as the first level of the column index. We return minus the p-value by convention, as the maximal -pvalue correspond to the maximal association.

Examples:

>>> rng = np.random.default_rng(0)
>>> data = pd.DataFrame(rng.random(size=(1000,5)),columns=pd.MultiIndex.from_tuples([("target",""),("G1","a"),("G1","b"),("G2","a"),("G2","b")]))
>>> variable_types = dict([(column, "numerical") for column in data.columns.get_level_values(0).unique()])
>>> asso = CrossSectionalAssociation({"categorical_method":"f_oneway","variable_types":variable_types})
>>> asso.association(data[["target"]], data[["G1","G2"]])
array([-0.32736175, -0.11320393])

Or with mixed types:

>>> rng = np.random.default_rng(0)
>>> numerical = pd.DataFrame(rng.random(size=(1000,3)),columns=pd.MultiIndex.from_tuples([("target",None),("G1","a"),("G1","b")]))
>>> categorical = pd.DataFrame(rng.integers(0,5,size=(1000,3)),columns=pd.MultiIndex.from_tuples([("G2","a"),("G2","b"),("G2","c")]))
>>> data = pd.concat([numerical,categorical], axis="columns")
>>> variable_types = {"target":"numerical","G1":"numerical","G2":"categorical"}
>>> asso =  CrossSectionalAssociation({"categorical_method":"f_oneway","variable_types":variable_types})
>>> asso.association(data[["target"]],data[["G1","G2"]])
array([-0.05543262, -0.0992026 ])

Association

TemporalSlowAssociation

__init__(config: dict)

association(residuals_df: pd.DataFrame, variables_df: pd.DataFrame) -> np.array

CrossSectionalAssociation

__init__(config: dict)

association(residuals_df: pd.DataFrame, variables_df: pd.DataFrame) -> np.ndarray

`init(config: dict)`

`association(residuals_df: pd.DataFrame, variables_df: pd.DataFrame) -> np.array`

`init(config: dict)`

`association(residuals_df: pd.DataFrame, variables_df: pd.DataFrame) -> np.ndarray`