tools

lingam.tools.bootstrap_with_imputation(X, n_sampling, n_repeats=10, imp=None, cd_model=None, prior_knowledge=None, apply_prior_knowledge_softly=False, random_state=None)[source]

Discovering causal relations in data with missing values..

bootstrap_with_imputation is a function to perform a causal discovery on a dataset with missing values. bootstrap_with_imputation creates n_sampling bootstrap samples from the dataset, creates n_repeats samples for each bootstrap sample, completes the missing values in each sample, and runs a causal discovery assuming a common causal structure for n_repeats samples.

Parameters:
  • X (array-like, shape (n_samples, n_features)) – Training data, where n_samples is the number of samples and n_features is the number of features.

  • n_sampling (int) – The number of bootstraps.

  • n_repeats (int, optional (default=10)) – The number of times to complete missing values for each bootstrap sample. This value is only used when imp is None.

  • imp (object, optional (default=None)) – Instance of a class inheriting from BaseMultipleImputation class. If None, this function uses _DefaultMultipleImputation to impute datasets.

  • cd_model (object, optional (default=None)) – Instance of a class inheriting from BaseMultiGroupCDModel class. If None, this function uses MultiGroupDirectLiNGAM to estimate the causal order.

  • prior_knowledge (array-like, shape (n_features, n_features), optional (default=None)) –

    Prior knowledge used for the causal discovery, where n_features is the number of features. prior_knowledge is used only if cd_model is None.

    The elements of prior knowledge matrix are defined as follows:

    • 0 : \(x_i\) does not have a directed path to \(x_j\)

    • 1 : \(x_i\) has a directed path to \(x_j\)

    • -1 : No prior knowledge is available to know if either of the two cases above (0 or 1) is true.

  • apply_prior_knowledge_softly (boolean, optional (default=False)) – If True, apply prior knowledge softly. apply_prior_knowledge_softly is used only if cd_model is None.

  • random_state (int, optional (default=None)) – random_state is the seed used by the random number generator.

Returns:

  • causal_orders (array-like, shape (n_sampling, n_features)) – The causal order of the fitted model, where n_features is the number of features.

  • adj_matrices_list (array-like, shape (n_sampling, n_repeats, n_features, n_features)) – The list of adjacency matrices.

  • resampled_indices_ (array-like, shape (n_sampling, n_samples)) – The list of original index of resampled samples.

  • imputation_results (array-like, shape (n_sampling, n_repeats, n_samples, n_features)) – This array shows the result of the imputation. Elements which are not NaN are the imputation values.