tools
- lingam.tools.bootstrap_with_imputation(X, n_sampling, n_repeats=10, imp=None, cd_model=None, prior_knowledge=None, apply_prior_knowledge_softly=False, random_state=None)[source]
Discovering causal relations in data with missing values..
bootstrap_with_imputation is a function to perform a causal discovery on a dataset with missing values. bootstrap_with_imputation creates n_sampling bootstrap samples from the dataset, creates n_repeats samples for each bootstrap sample, completes the missing values in each sample, and runs a causal discovery assuming a common causal structure for n_repeats samples.
- Parameters:
X (array-like, shape (n_samples, n_features)) – Training data, where
n_samples
is the number of samples andn_features
is the number of features.n_sampling (int) – The number of bootstraps.
n_repeats (int, optional (default=10)) – The number of times to complete missing values for each bootstrap sample. This value is only used when imp is None.
imp (object, optional (default=None)) – Instance of a class inheriting from
BaseMultipleImputation
class. If None, this function uses_DefaultMultipleImputation
to impute datasets.cd_model (object, optional (default=None)) – Instance of a class inheriting from
BaseMultiGroupCDModel
class. If None, this function usesMultiGroupDirectLiNGAM
to estimate the causal order.prior_knowledge (array-like, shape (n_features, n_features), optional (default=None)) –
Prior knowledge used for the causal discovery, where
n_features
is the number of features. prior_knowledge is used only if cd_model is None.The elements of prior knowledge matrix are defined as follows:
0
: \(x_i\) does not have a directed path to \(x_j\)1
: \(x_i\) has a directed path to \(x_j\)-1
: No prior knowledge is available to know if either of the two cases above (0 or 1) is true.
apply_prior_knowledge_softly (boolean, optional (default=False)) – If True, apply prior knowledge softly.
apply_prior_knowledge_softly
is used only ifcd_model
is None.random_state (int, optional (default=None)) –
random_state
is the seed used by the random number generator.
- Returns:
causal_orders (array-like, shape (n_sampling, n_features)) – The causal order of the fitted model, where n_features is the number of features.
adj_matrices_list (array-like, shape (n_sampling, n_repeats, n_features, n_features)) – The list of adjacency matrices.
resampled_indices_ (array-like, shape (n_sampling, n_samples)) – The list of original index of resampled samples.
imputation_results (array-like, shape (n_sampling, n_repeats, n_samples, n_features)) – This array shows the result of the imputation. Elements which are not NaN are the imputation values.