utils

lingam.utils.print_causal_directions(cdc, n_sampling, labels=None)[source]

Print causal directions of bootstrap result to stdout.

Parameters:
  • cdc (dict) – List of causal directions sorted by count in descending order. This can be set the value returned by BootstrapResult.get_causal_direction_counts() method.

  • n_sampling (int) – Number of bootstrapping samples.

  • labels (array-like, optional (default=None)) – List of feature lables. If set labels, the output feature name will be the specified label.

lingam.utils.print_dagc(dagc, n_sampling, labels=None)[source]

Print DAGs of bootstrap result to stdout.

Parameters:
  • dagc (dict) – List of directed acyclic graphs sorted by count in descending order. This can be set the value returned by BootstrapResult.get_directed_acyclic_graph_counts() method.

  • n_sampling (int) – Number of bootstrapping samples.

  • labels (array-like, optional (default=None)) – List of feature lables. If set labels, the output feature name will be the specified label.

lingam.utils.make_prior_knowledge(n_variables, exogenous_variables=None, sink_variables=None, paths=None, no_paths=None)[source]

Make matrix of prior knowledge.

Parameters:
  • n_variables (int) – Number of variables.

  • exogenous_variables (array-like, shape (index, ...), optional (default=None)) – List of exogenous variables(index). Prior knowledge is created with the specified variables as exogenous variables.

  • sink_variables (array-like, shape (index, ...), optional (default=None)) – List of sink variables(index). Prior knowledge is created with the specified variables as sink variables.

  • paths (array-like, shape ((index, index), ...), optional (default=None)) – List of variables(index) pairs with directed path. If (i, j), prior knowledge is created that xi has a directed path to xj.

  • no_paths (array-like, shape ((index, index), ...), optional (default=None)) – List of variables(index) pairs without directed path. If (i, j), prior knowledge is created that xi does not have a directed path to xj.

Returns:

prior_knowledge – Return matrix of prior knowledge used for causal discovery.

Return type:

array-like, shape (n_variables, n_variables)

lingam.utils.remove_effect(X, remove_features)[source]

Create a dataset that removes the effects of features by linear regression.

Parameters:
  • X (array-like, shape (n_samples, n_features)) – Data, where n_samples is the number of samples and n_features is the number of features.

  • remove_features (array-like) – List of features(index) to remove effects.

Returns:

X – Data after removing effects of remove_features.

Return type:

array-like, shape (n_samples, n_features)

lingam.utils.make_dot(adjacency_matrix, labels=None, lower_limit=0.01, prediction_feature_indices=None, prediction_target_label='Y(pred)', prediction_line_color='red', prediction_coefs=None, prediction_feature_importance=None, path=None, path_color=None, detect_cycle=False, ignore_shape=False)[source]

Directed graph source code in the DOT language with specified adjacency matrix.

Parameters:
  • adjacency_matrix (array-like with shape (n_features, n_features)) – Adjacency matrix to make graph, where n_features is the number of features.

  • labels (array-like, optional (default=None)) – Label to use for graph features.

  • lower_limit (float, optional (default=0.01)) – Threshold for drawing direction. If float, then directions with absolute values of coefficients less than lower_limit are excluded.

  • prediction_feature_indices (array-like, optional (default=None)) – Indices to use as prediction features.

  • prediction_target_label (string, optional (default='Y(pred)'))) – Label to use for target variable of prediction.

  • prediction_line_color (string, optional (default='red')) – Line color to use for prediction’s graph.

  • prediction_coefs (array-like, optional (default=None)) – Coefficients to use for prediction’s graph.

  • prediction_feature_importance (array-like, optional (default=None)) – Feature importance to use for prediction’s graph.

  • path (tuple, optional (default=None)) – Path to highlight. Tuple of start index and end index.

  • path_color (string, optional (default=None)) – Colors to highlight a path.

  • detect_cycle (boolean, optional (default=False)) – Highlight simple cycles.

  • ignore_shape (boolean, optional (default=False)) – Ignore checking the shape of adjaceny_matrix or not.

Returns:

graph – Directed graph source code in the DOT language. If order is unknown, draw a double-headed arrow.

Return type:

graphviz.Digraph

lingam.utils.get_sink_variables(adjacency_matrix)[source]

The sink variables(index) in the adjacency matrix.

Parameters:

adjacency_matrix (array-like, shape (n_variables, n_variables)) – Adjacency matrix, where n_variables is the number of variables.

Returns:

sink_variables – List of sink variables(index).

Return type:

array-like

lingam.utils.get_exo_variables(adjacency_matrix)[source]

The exogenous variables(index) in the adjacency matrix.

Parameters:

adjacency_matrix (array-like, shape (n_variables, n_variables)) – Adjacency matrix, where n_variables is the number of variables.

Returns:

exogenous_variables – List of exogenous variables(index).

Return type:

array-like

lingam.utils.find_all_paths(dag, from_index, to_index, min_causal_effect=0.0)[source]

Find all paths from point to point in DAG.

Parameters:
  • dag (array-like, shape (n_features, n_features)) – The adjacency matrix to fine all paths, where n_features is the number of features.

  • from_index (int) – Index of the variable at the start of the path.

  • to_index (int) – Index of the variable at the end of the path.

  • min_causal_effect (float, optional (default=0.0)) – Threshold for detecting causal direction. Causal directions with absolute values of causal effects less than min_causal_effect are excluded.

Returns:

  • paths (array-like, shape (n_paths)) – List of found path, where n_paths is the number of paths.

  • effects (array-like, shape (n_paths)) – List of causal effect, where n_paths is the number of paths.

lingam.utils.predict_adaptive_lasso(X, predictors, target, gamma=1.0)[source]

Predict with Adaptive Lasso.

Parameters:
  • X (array-like, shape (n_samples, n_features)) – Training data, where n_samples is the number of samples and n_features is the number of features.

  • predictors (array-like, shape (n_predictors)) – Indices of predictor variable.

  • target (int) – Index of target variable.

Returns:

coef – Coefficients of predictor variable.

Return type:

array-like, shape (n_features)

lingam.utils.likelihood_i(x, i, b_i, bi_0)[source]

Compute local log-likelihood of component i.

Parameters:
  • x (array-like, shape (n_features, n_samples)) – Data, where n_samples is the number of samples and n_features is the number of features.

  • i (array-like) – Variable index.

  • b_i (array-like) – The i^th column of adjacency matrix, B[i].

  • bi_0 (float) – Constant value for the i^th variable.

Returns:

ll – Local log-likelihood of component i.

Return type:

float

lingam.utils.log_p_super_gaussian(s)[source]

Compute density function of the normalized independent components.

Parameters:

s (array-like, shape (1, n_samples)) – Data, where n_samples is the number of samples.

Returns:

x – Density function of the normalized independent components, whose disturbances are super-Gaussian.

Return type:

float

lingam.utils.variance_i(X, i, b_i)[source]

Compute empirical variance of component i.

Parameters:
  • x (array-like, shape (n_features, n_samples)) – Data, where n_samples is the number of samples and n_features is the number of features.

  • i (array-like) – Variable index.

  • b_i (array-like) – The i^th column of adjacency matrix, B[i].

Returns:

variance – Empirical variance of component i.

Return type:

float

lingam.utils.extract_ancestors(X, max_explanatory_num=2, cor_alpha=0.01, ind_alpha=0.01, shapiro_alpha=0.01, MLHSICR=True, bw_method='mdbs')[source]

Extract a set of ancestors of each variable Implementation of RCD Algorithm1 [1]

References

Parameters:
  • X (array-like, shape (n_samples, n_features)) – Training data, where n_samples is the number of samples and n_features is the number of features.

  • max_explanatory_num (int, optional (default=2)) – Maximum number of explanatory variables.

  • cor_alpha (float, optional (default=0.01)) – Alpha level for pearson correlation.

  • ind_alpha (float, optional (default=0.01)) – Alpha level for HSIC.

  • shapiro_alpha (float, optional (default=0.01)) – Alpha level for Shapiro-Wilk test.

  • MLHSICR (bool, optional (default=False)) – If True, use MLHSICR for multiple regression, if False, use OLS for multiple regression.

  • bw_method (str, optional (default=``mdbs``)) –

    The method used to calculate the bandwidth of the HSIC.

    • mdbs : Median distance between samples.

    • scott : Scott’s Rule of Thumb.

    • silverman : Silverman’s Rule of Thumb.

lingam.utils.f_correlation(x, y)[source]

Implementation of F-correlation [2]

References

Parameters:
  • x (array-like, shape (n_samples)) – Data, where n_samples is the number of samples.

  • y (array-like, shape (n_samples)) – Data, where n_samples is the number of samples.

Returns:

The valus of F-correlation.

Return type:

float

lingam.utils.visualize_nonlinear_causal_effect(X, cd_result, estimator, cause_name, effect_name, cause_positions=None, percentile=None, fig=None, boxplot=False)[source]

Visualize non-linear causal effect.

Parameters:
  • X (pandas.DataFrame, shape (n_samples, n_features)) – Training data used to obtain cd_result.

  • cd_result (array-like with shape (n_features, n_features) or BootstrapResult) – Adjacency matrix or BootstrapResult. These are the results of a causal discovery.

  • estimator (estimator object) – estimator used for non-linear regression. Regression with estimator using cause_name and covariates as explanatory variables and effect_name as objective variable. Those covariates are searched for in cd_result.

  • cause_name (str) – The name of the cause variable.

  • effect_name (str) – The name of the effect variable.

  • cause_positions (array-like, optional (default=None)) – List of positions from which causal effects are calculated. By default, cause_positions stores the position at which the value range of X is divided into 10 equal parts.

  • percentile (array-like, optional (default=None)) – A tuple consisting of three percentile values. Each value must be greater than 0 and less than 100. By default, (95, 50, 5) is set.

  • fig (plt.Figure, optional (default=None)) – If fig is given, draw a figure in fig. If not given, plt.fig is prepared internally.

  • boxplot (boolean, optional (default=False)) – If True, draw a box plot instead of a scatter plot for each cause_positions.

Returns:

fig – Plotted figure.

Return type:

plt.Figure

lingam.utils.evaluate_model_fit(adjacency_matrix, X, is_ordinal=None)[source]

evaluate the given adjacency matrix and return fit indices

Parameters:
  • adjacency_matrix (array-like, shape (n_features, n_features)) – Adjacency matrix representing a causal graph. The i-th column and row correspond to the i-th column of X.

  • X (array-like, shape (n_samples, n_features)) – Training data.

  • is_ordinal (array-like, shape (n_features,)) – Binary list. The i-th element represents that the i-th column of X is ordinal or not. 0 means not ordinal, otherwise ordinal.

Returns:

fit_indices – Fit indices. This API uses semopy’s calc_stats(). See semopy’s reference for details.

Return type:

pandas.DataFrame