In this tutorial, we will show you how to run LiNGAM algorithms and see the results. We will also show you how to run the bootstrap method and check the results.
The following packages must be installed in order to run this tutorial. And import if necessary:
DirectLiNGAM [1] is a direct method for learning the basic LiNGAM model [2].
It uses an entropy-based measure [3] to evaluate independence between error variables.
The basic LiNGAM model makes the following assumptions.
Linearity
Non-Gaussian continuous error variables (except at most one)
Acyclicity
No hidden common causes
Denote observed variables by \({x}_{i}\) and error variables by \({e}_{i}\) and coefficients or connection strengths \({b}_{ij}\).
Collect them in vectors \({x}\) and \({e}\) and a matrix \({B}\), respectivelly.
Due to the acyclicity assumption, the adjacency matrix \({B}\) can be permuted to be strictly lower-triangular by a simultaneous row and column permutation.
The error variables \({e}_{i}\) are independent due to the assumption of no hidden common causes.
Then, mathematically, the model for observed variable vector \({x}\) is written as
$$ x = Bx + e. $$
Example applications are found here.
For example, [4] uses the basic LiNGAM model to infer causal relations of health indice including LDL, HDL, and γGT.
m=np.array([[0.0,0.0,0.0,3.0,0.0,0.0],[3.0,0.0,2.0,0.0,0.0,0.0],[0.0,0.0,0.0,6.0,0.0,0.0],[0.0,0.0,0.0,0.0,0.0,0.0],[8.0,0.0,-1.0,0.0,0.0,0.0],[4.0,0.0,0.0,0.0,0.0,0.0]])dot=make_dot(m)# Save pdfdot.render('dag')# Save pngdot.format='png'dot.render('dag')dot
To check if the LiNGAM assumption is broken, we can get p-values of
independence between error variables. The value in the i-th row and j-th
column of the obtained matrix shows the p-value of the independence of
the error variables \(e_i\) and \(e_j\).
Since BootstrapResult object is returned, we can get the ranking of the causal directions extracted by get_causal_direction_counts() method. In the following sample code, n_directions option is limited to the causal directions of the top 8 rankings, and min_causal_effect option is limited to causal directions with a coefficient of 0.01 or more.
Also, using the get_directed_acyclic_graph_counts() method, we can
get the ranking of the DAGs extracted. In the following sample code,
n_dags option is limited to the dags of the top 3 rankings, and
min_causal_effect option is limited to causal directions with a
coefficient of 0.01 or more.
Using the get_total_causal_effects() method, we can get the list of
total causal effect. The total causal effects we can get are dictionary
type variable. We can display the list nicely by assigning it to
pandas.DataFrame. Also, we have replaced the variable index with a label
below.
causal_effects=result.get_total_causal_effects(min_causal_effect=0.01)# Assign to pandas.DataFrame for pretty displaydf=pd.DataFrame(causal_effects)labels=[f'x{i}'foriinrange(X.shape[1])]df['from']=df['from'].apply(lambdax:labels[x])df['to']=df['to'].apply(lambdax:labels[x])df
from
to
effect
probability
0
x3
x0
3.004106
1.00
1
x0
x1
2.963177
1.00
2
x2
x1
2.017539
1.00
3
x3
x1
20.928254
1.00
4
x0
x5
3.997787
1.00
5
x3
x4
18.077943
1.00
6
x3
x5
12.012988
1.00
7
x2
x4
-1.006362
1.00
8
x0
x4
8.011818
0.98
9
x3
x2
5.964879
0.96
10
x2
x5
0.396327
0.09
11
x2
x0
0.487915
0.07
12
x2
x3
0.164565
0.04
13
x5
x4
0.087437
0.03
14
x4
x5
0.496445
0.02
15
x5
x1
-0.064703
0.02
16
x4
x1
0.367100
0.02
17
x4
x0
0.124114
0.02
18
x0
x2
0.056261
0.01
19
x1
x4
-0.097108
0.01
20
x5
x2
-0.111894
0.01
We can easily perform sorting operations with pandas.DataFrame.
And with pandas.DataFrame, we can easily filter by keywords. The
following code extracts the causal direction towards x1.
df[df['to']=='x1'].head()
from
to
effect
probability
1
x0
x1
2.963177
1.00
2
x2
x1
2.017539
1.00
3
x3
x1
20.928254
1.00
15
x5
x1
-0.064703
0.02
16
x4
x1
0.367100
0.02
Because it holds the raw data of the total causal effect (the original data
for calculating the median), it is possible to draw a histogram of the
values of the causal effect, as shown below.
importmatplotlib.pyplotaspltimportseabornassnssns.set()%matplotlibinlinefrom_index=3# index of x3to_index=0# index of x0plt.hist(result.total_effects_[:,to_index,from_index])
Using the get_paths() method, we can explore all paths from any
variable to any variable and calculate the bootstrap probability for
each path. The path will be output as an array of variable indices. For
example, the array [3,0,1] shows the path from variable X3 through
variable X0 to variable X1.
from_index=3# index of x3to_index=1# index of x0pd.DataFrame(result.get_paths(from_index,to_index))
This algorithm [3] simultaneously analyzes multiple datasets obtained from different sources, e.g., from groups of different ages.
The algorithm is an extention of DirectLiNGAM [1] to multiple-group cases.
The algorithm assumes that each dataset comes from a basic LiNGAM model [2], i.e., makes the following assumptions in each dataset:
Linearity
Non-Gaussian continuous error variables (except at most one)
Acyclicity
No hidden common causes
Further, it assumes the topological causal orders are common to the groups.
The similarity in the topological causal orders would give a better performance than analyzing each dataset separatly if the assumption on the causal orders are reasonable.
Denote observed variables of Group \({g}\) by \({x}_{i}^{(g)}\) and error variables by \({e}_{i}^{(g)}\) and coefficients or connection strengths \({b}_{ij}^{(g)}\).
Collect them in vectors \({x}`and :math:`{e}^{(g)}\) and a matrix \({B}^{(g)}\), respectivelly.
Due to the assumptions of acyclicity and common topological causal orders, the adjacency matrix \({B}^{(g)}\) can be permuted to be strictly lower-triangular by a simultaneous row and column permutation common to the groups.
The error variables \({e}_{i}^{(g)}\) are independent due to the assumption of no hidden common causes.
Then, mathematically, the model for observed variable vector \({x}^{(g)}\) is written as
Using the causal_order_ properties, we can see the causal ordering as a result of the causal discovery.
model.causal_order_
[3,0,5,2,1,4]
Also, using the adjacency_matrix_ properties, we can see the adjacency matrix as a result of the causal discovery. As you can see from the following, DAG in each dataset is correctly estimated.
To check if the LiNGAM assumption is broken, we can get p-values of
independence between error variables. The value in the i-th row and j-th
column of the obtained matrix shows the p-value of the independence of
the error variables \(e_i\) and \(e_j\).
The bootstrap() method returns a list of multiple BootstrapResult, so we can get the result of bootstrapping from the list. We can get the same number of results as the number of datasets, so we specify an index when we access the results. We can get the ranking of the causal directions extracted by get_causal_direction_counts().
Also, using the get_directed_acyclic_graph_counts() method, we can get the ranking of the DAGs extracted. In the following sample code, n_dags option is limited to the dags of the top 3 rankings, and min_causal_effect option is limited to causal directions with a coefficient of 0.01 or more.
Using the get_total_causal_effects() method, we can get the list of
total causal effect. The total causal effects we can get are dictionary
type variable. We can display the list nicely by assigning it to
pandas.DataFrame. Also, we have replaced the variable index with a label
below.
We can easily perform sorting operations with pandas.DataFrame.
df.sort_values('effect',ascending=False).head()
from
to
effect
probability
3
x3
x1
20.943993
1.0
7
x3
x4
18.061596
1.0
9
x3
x5
12.026086
1.0
5
x0
x4
8.011896
1.0
4
x3
x2
5.969457
1.0
And with pandas.DataFrame, we can easily filter by keywords. The
following code extracts the causal direction towards x1.
df[df['to']=='x1'].head()
from
to
effect
probability
1
x0
x1
2.963048
1.00
2
x2
x1
2.016516
1.00
3
x3
x1
20.943993
1.00
11
x5
x1
0.092263
0.02
Because it holds the raw data of the causal effect (the original data
for calculating the median), it is possible to draw a histogram of the
values of the causal effect, as shown below.
Using the get_paths() method, we can explore all paths from any
variable to any variable and calculate the bootstrap probability for
each path. The path will be output as an array of variable indices. For
example, the array [3,0,1] shows the path from variable X3 through
variable X0 to variable X1.
from_index=3# index of x3to_index=1# index of x0pd.DataFrame(results[0].get_paths(from_index,to_index))
The following demonstrates a method [1] that analyzes the prediction mechanisms of constructed predictive models based on causality.
This method estimates causal effects, i.e., intervention effects of features or explanatory variables used in constructed predictive models on the predicted variables.
Users can use estimated causal structures, e.g., by a LiNGAM-type method or known causal structures based on domain knowledge.
Next, we create the prediction model. In the following example, linear regression model is created, but it is also possible to create logistic regression model or non-linear regression model.
To identify of the feature having the greatest intervention effect on the prediction, we can get the feature that maximizes the value of the obtained list.
To estimate of the intervention such that the expectation of the prediction of the post-intervention observations is equal or close to a specified value, we use estimate_optimal_intervention() method of CausalEffect.
In the following example, we estimate the intervention value at variable index 1 so that the predicted value is close to 15.
VARLiNGAM [2] is an extension of the basic LiNGAM model [1] to time series cases.
It combines the basic LiNGAM model with the classic vector autoregressive models (VAR).
It enables analyzing both lagged and contemporaneous (instantaneous) causal relations, whereas the classic VAR only analyzes lagged causal relations.
This VARLiNGAM makes the following assumptions similarly to the basic LiNGAM model [1]:
Linearity
Non-Gaussian continuous error variables (except at most one)
Acyclicity of contemporaneous causal relations
No hidden common causes
Denote observed variables at time point \({t}\) by \({x}_{i}(t)\) and error variables by \({e}_{i}(t)\).
Collect them in vectors \({x}(t)\) and \({e}(t)\), respectivelly.
Further, denote by matrices \({B}_{\tau}\) adjacency matrices with time lag \({\tau}\).
Due to the acyclicity assumption of contemporaneous causal relations, the coefficient matrix \({B}_{0}\) can be permuted to be strictly lower-triangular by a simultaneous row and column permutation.
The error variables \({e}_{i}(t)\) are independent due to the assumption of no hidden common causes.
Then, mathematically, the model for observed variable vector \({x}(t)\) is written as
Example applications are found here, especially in Section. Economics/Finance/Marketing.
For example, [3] uses the VARLiNGAM model to to study the processes of firm growth and firm performance using microeconomic data
and to analyse the effects of monetary policy using macroeconomic data.
B0=[[0,-0.12,0,0,0],[0,0,0,0,0],[-0.41,0.01,0,-0.02,0],[0.04,-0.22,0,0,0],[0.15,0,-0.03,0,0],]B1=[[-0.32,0,0.12,0.32,0],[0,-0.35,-0.1,-0.46,0.4],[0,0,0.37,0,0.46],[-0.38,-0.1,-0.24,0,-0.13],[0,0,0,0,0],]causal_order=[1,0,3,2,4]# data generated from B0 and B1X=pd.read_csv('data/sample_data_var_lingam.csv')
To check if the LiNGAM assumption is broken, we can get p-values of
independence between error variables. The value in the i-th row and j-th
column of the obtained matrix shows the p-value of the independence of
the error variables \(e_i\) and \(e_j\).
Since BootstrapResult object is returned, we can get the ranking of the causal directions extracted by get_causal_direction_counts() method. In the following sample code, n_directions option is limited to the causal directions of the top 8 rankings, and min_causal_effect option is limited to causal directions with a coefficient of 0.3 or more.
Also, using the get_directed_acyclic_graph_counts() method, we can get the ranking of the DAGs extracted. In the following sample code, n_dags option is limited to the dags of the top 3 rankings, and min_causal_effect option is limited to causal directions with a coefficient of 0.2 or more.
Using the get_causal_effects() method, we can get the list of total
causal effect. The total causal effects we can get are dictionary type
variable. We can display the list nicely by assigning it to
pandas.DataFrame. Also, we have replaced the variable index with a label
below.
We can easily perform sorting operations with pandas.DataFrame.
df.sort_values('effect',ascending=False).head()
from
to
effect
probability
6
x4(t-1)
x2(t)
0.501859
1.00
2
x4(t-1)
x1(t)
0.427308
1.00
1
x2(t-1)
x2(t)
0.388777
1.00
5
x3(t-1)
x0(t)
0.345461
1.00
24
x0(t)
x3(t)
0.259859
0.64
And with pandas.DataFrame, we can easily filter by keywords. The following code extracts the causal direction towards x1(t).
df[df['to']=='x1(t)'].head()
from
to
effect
probability
2
x4(t-1)
x1(t)
0.427308
1.00
3
x1(t-1)
x1(t)
-0.338691
1.00
11
x3(t-1)
x1(t)
-0.315462
0.99
13
x2(t-1)
x1(t)
-0.172693
0.98
29
x0(t-1)
x1(t)
-0.119971
0.53
Because it holds the raw data of the causal effect (the original data
for calculating the median), it is possible to draw a histogram of the
values of the causal effect, as shown below.
importmatplotlib.pyplotaspltimportseabornassnssns.set()%matplotlibinlinefrom_index=7# index of x2(t-1). (index:2)+(n_features:5)*(lag:1) = 7to_index=2# index of x2(t). (index:2)+(n_features:5)*(lag:0) = 2plt.hist(result.total_effects_[:,to_index,from_index])
VARMALiNGAM [3] is an extension of the basic LiNGAM model [1] to time series cases.
It combines the basic LiNGAM model with the classic vector autoregressive moving average models (VARMA).
It enables analyzing both lagged and contemporaneous (instantaneous) causal relations, whereas the classic VARMA only analyzes lagged causal relations.
This VARMALiNGAM model also is an extension of the VARLiNGAM model [2].
It uses VARMA to analyze lagged causal relations instead of VAR.
This VARMALiNGAM makes the following assumptions similarly to the basic LiNGAM model [1]:
Linearity
Non-Gaussian continuous error variables (except at most one)
Acyclicity of contemporaneous causal relations
No hidden common causes between contempraneous error variables
Denote observed variables at time point \({t}\) by \({x}_{i}(t)\) and error variables by \({e}_{i}(t)\).
Collect them in vectors \({x}(t)\) and \({e}(t)\), respectivelly.
Further, denote by matrices \({B}_{\tau}\) and \({\Omega}_{\omega}\) coefficient matrices with time lags \({\tau}\) and \({\omega}\), respectivelly.
Due to the acyclicity assumption of contemporaneous causal relations, the adjacency matrix $B_0$ can be permuted to be strictly lower-triangular by a simultaneous row and column permutation.
The error variables \({e}_{i}(t)\) are independent due to the assumption of no hidden common causes.
Then, mathematically, the model for observed variable vector \({x}(t)\) is written as
Example applications are found here, especially in Section. Economics/Finance/Marketing.
For example, [3] uses the VARLiNGAM model to to study the processes of firm growth and firm performance using microeconomic data
and to analyse the effects of monetary policy using macroeconomic data.
psi0=np.array([[0.,0.,-0.25,0.,0.],[-0.38,0.,0.14,0.,0.],[0.,0.,0.,0.,0.],[0.44,-0.2,-0.09,0.,0.],[0.07,-0.06,0.,0.07,0.]])phi1=np.array([[-0.04,-0.29,-0.26,0.14,0.47],[-0.42,0.2,0.1,0.24,0.25],[-0.25,0.18,-0.06,0.15,0.18],[0.22,0.39,0.08,0.12,-0.37],[-0.43,0.09,-0.23,0.16,0.25]])theta1=np.array([[0.15,-0.02,-0.3,-0.2,0.21],[0.32,0.12,-0.11,0.03,0.42],[-0.07,-0.5,0.03,-0.27,-0.21],[-0.17,0.35,0.25,0.24,-0.25],[0.09,0.4,0.41,0.24,-0.31]])causal_order=[2,0,1,3,4]# data generated from psi0 and phi1 and theta1, causal_orderX=np.loadtxt('data/sample_data_varma_lingam.csv',delimiter=',')
To check if the LiNGAM assumption is broken, we can get p-values of
independence between error variables. The value in the i-th row and j-th
column of the obtained matrix shows the p-value of the independence of
the error variables \(e_i\) and \(e_j\).
Since BootstrapResult object is returned, we can get the ranking of the causal directions extracted by get_causal_direction_counts() method. In the following sample code, n_directions option is limited to the causal directions of the top 8 rankings, and min_causal_effect option is limited to causal directions with a coefficient of 0.4 or more.
Also, using the get_directed_acyclic_graph_counts() method, we can get the ranking of the DAGs extracted. In the following sample code, n_dags option is limited to the dags of the top 3 rankings, and min_causal_effect option is limited to causal directions with a coefficient of 0.3 or more.
Using the get_probabilities() method, we can get the probability of bootstrapping.
prob=result.get_probabilities(min_causal_effect=0.1)print('Probability of psi0:\n',prob[0])print('Probability of psi1:\n',prob[1])print('Probability of omega1:\n',prob[2])
Using the get_totalcausal_effects() method, we can get the list of
total causal effect. The total causal effects we can get are dictionary
type variable. We can display the list nicely by assigning it to
pandas.DataFrame. Also, we have replaced the variable index with a label
below.
We can easily perform sorting operations with pandas.DataFrame.
df.sort_values('effect',ascending=False).head()
from
to
effect
probability
23
y0(t)
y4(t)
0.934934
0.70
29
y2(t)
y1(t)
0.815024
0.59
25
y0(t)
y3(t)
0.636926
0.63
5
y4(t-1)
y0(t)
0.573678
1.00
37
y1(t)
y2(t)
0.564823
0.41
And with pandas.DataFrame, we can easily filter by keywords. The
following code extracts the causal direction towards y2(t).
df[df['to']=='y2(t)'].head()
from
to
effect
probability
6
y4(t-1)
y2(t)
0.360151
0.97
12
y0(t-1)
y2(t)
-0.267122
0.94
17
y1(t-1)
y2(t)
0.135946
0.90
21
y2(t-1)
y2(t)
-0.137767
0.77
24
y3(t-1)
y2(t)
0.141032
0.66
Because it holds the raw data of the causal effect (the original data
for calculating the median), it is possible to draw a histogram of the
values of the causal effect, as shown below.
importmatplotlib.pyplotaspltimportseabornassnssns.set()%matplotlibinlinefrom_index=5# index of y0(t-1). (index:0)+(n_features:5)*(lag:1) = 5to_index=2# index of y2(t). (index:2)+(n_features:5)*(lag:0) = 2plt.hist(result.total_effects_[:,to_index,from_index])
This method [2] performs causal discovery on “paired” samples based on longitudinal data that collects samples over time.
Their algorithm can analyze causal structures, including topological causal orders, that may change over time.
Similarly to the basic LiNGAM model [1], this method makes the following assumptions:
Linearity
Non-Gaussian continuous error variables (except at most one)
Acyclicity
No hidden common causes
Denote observed variables and error variables of \({m}\)-the sample at time point \({m}\)
by \({x}_{i}^{m}(t)\) and \({e}_{i}^{m}(t)\).
Collect them from variables in vectors \({x}^{m}(t)\) and \({e}^{m}(t)\).
Further, collect them from samples in matrices \({X}(t)\) and \({E}(t)\).
Further, denote by \({B}(t,t-\tau)\) adjacency matrices with time lag \(\tau\).
Due to the assumptions of acyclicity, the adjacency matrix \({B}(t,t)\) can be permuted to be strictly lower-triangular by a simultaneous row and column permutation.
The error variables \({e}_{i}^{m}(t)\) are independent due to the assumption of no hidden common causes.
Then, mathematically, the model for observed variable matrix \({X}(t)\) is written as
Using the causal_orders_ property, we can see the causal ordering in time-points as a result of the causal discovery. All elements are nan because the causal order of B(t,t) at t=0 is not calculated. So access to the time points above t=1.
print(model.causal_orders_[0])# nan at t=0print(model.causal_orders_[1])print(model.causal_orders_[2])
[nan,nan,nan,nan,nan][3,1,2,4,0][0,4,2,3,1]
Also, using the adjacency_matrices_ property, we can see the adjacency matrix as a result of the causal discovery. As with the causal order, all elements are nan because the B(t,t) and B(t,t-τ) at t=0 is not calculated. So access to the time points above t=1. Also, if we run causal discovery with n_lags=2, B(t,t-τ) at t=1 is also not computed, so all the elements are nan.
t=0# nan at t=0print('B(0,0):')print(model.adjacency_matrices_[t,0])print('B(0,-1):')print(model.adjacency_matrices_[t,1])t=1print('B(1,1):')print(model.adjacency_matrices_[t,0])print('B(1,0):')print(model.adjacency_matrices_[t,1])t=2print('B(2,2):')print(model.adjacency_matrices_[t,0])print('B(2,1):')print(model.adjacency_matrices_[t,1])
To check if the LiNGAM assumption is broken, we can get p-values of
independence between error variables. The value in the i-th row and j-th
column of the obtained matrix shows the p-value of the independence of
the error variables \(e_i\) and \(e_j\).
Since LongitudinalBootstrapResult object is returned, we can get the ranking of the causal directions extracted by get_causal_direction_counts() method. In the following sample code, n_directions option is limited to the causal directions of the top 8 rankings, and min_causal_effect option is limited to causal directions with a coefficient of 0.01 or more.
Also, using the get_directed_acyclic_graph_counts() method, we can get the ranking of the DAGs extracted. In the following sample code, n_dags option is limited to the dags of the top 3 rankings, and min_causal_effect option is limited to causal directions with a coefficient of 0.01 or more.
Using the get_total_causal_effects() method, we can get the list of
total causal effect. The total causal effects we can get are dictionary
type variable. We can display the list nicely by assigning it to
pandas.DataFrame. Also, we have replaced the variable index with a label
below.
We can easily perform sorting operations with pandas.DataFrame.
df.sort_values('effect',ascending=False).head()
from
to
effect
probability
6
x2(1)
x3(2)
0.495256
1.0
14
x4(2)
x1(2)
0.476373
1.0
9
x3(1)
x2(2)
0.400577
1.0
17
x3(1)
x1(1)
0.388875
1.0
21
x1(1)
x2(1)
0.357268
1.0
And with pandas.DataFrame, we can easily filter by keywords. The
following code extracts the causal direction towards x0(2).
df[df['to']=='x0(2)'].head()
from
to
effect
probability
15
x3(1)
x0(2)
-0.495518
1.00
18
x0(1)
x0(2)
0.202197
1.00
19
x1(1)
x0(2)
0.191862
1.00
32
x4(1)
x0(2)
-0.005440
0.92
34
x2(1)
x0(2)
0.019144
0.91
Because it holds the raw data of the total causal effect (the original
data for calculating the median), it is possible to draw a histogram of
the values of the causal effect, as shown below.
importmatplotlib.pyplotaspltimportseabornassnssns.set()%matplotlibinlinefrom_index=5# index of x0(1). (index:0)+(n_features:5)*(timepoint:1) = 5to_index=12# index of x2(2). (index:2)+(n_features:5)*(timepoint:2) = 12plt.hist(result.total_effects_[:,to_index,from_index])
This method assumes an extension of the basic LiNGAM model [1] to hidden common cause cases.
Specifically, this implements Algorithm 1 of [3] except the Step 2.
Similarly to the basic LiNGAM model [1], this method makes the following assumptions:
Linearity
Non-Gaussian continuous error variables (except at most one)
Acyclicity
However, it allows the following hidden common causes:
Only exogenous observed variables may share hidden common causes.
This is a simpler version of the latent variable LiNGAM [2] that extends the basic LiNGAM model to hidden common causes.
Note that the latent variable LiNGAM [2] allows the existence of hidden common causes between any observed variables.
However, this kind of causal graph structures are often assumed in the classic structural equation modelling [4].
Denote observed variables by \({x}_{i}\) and error variables by \({e}_{i}\) and coefficients or connection strengths \({b}_{ij}\).
Collect them in vectors \({x}\) and \({e}\) and a matrix \({B}\), respectivelly.
Due to the acyclicity assumption, the adjacency matrix \({B}\) can be permuted to be strictly lower-triangular by a simultaneous row and column permutation.
The error variables \({e}_{i}\) except those corresponding to exogenous observed variables are independent due to the assumption that only exogenous observed variables may share hidden common causes.
Then, mathematically, the model for observed variable vector \({x}\) is written as
First, we generate a causal structure with 7 variables. Then we create a
dataset with 6 variables from x0 to x5, with x6 being the latent
variable for x2 and x3.
np.random.seed(1000)x6=np.random.uniform(size=1000)x3=2.0*x6+np.random.uniform(size=1000)x0=0.5*x3+np.random.uniform(size=1000)x2=2.0*x6+np.random.uniform(size=1000)x1=0.5*x0+0.5*x2+np.random.uniform(size=1000)x5=0.5*x0+np.random.uniform(size=1000)x4=0.5*x0-0.5*x2+np.random.uniform(size=1000)# The latent variable x6 is not included.X=pd.DataFrame(np.array([x0,x1,x2,x3,x4,x5]).T,columns=['x0','x1','x2','x3','x4','x5'])X.head()
x0
x1
x2
x3
x4
x5
0
1.505949
2.667827
2.029420
1.463708
0.615387
1.157907
1
1.379130
1.721744
0.965613
0.801952
0.919654
0.957148
2
1.436825
2.845166
2.773506
2.533417
-0.616746
0.903326
3
1.562885
2.205270
1.080121
1.192257
1.240595
1.411295
4
1.940721
2.974182
2.140298
1.886342
0.451992
1.770786
m=np.array([[0.0,0.0,0.0,0.5,0.0,0.0,0.0],[0.5,0.0,0.5,0.0,0.0,0.0,0.0],[0.0,0.0,0.0,0.0,0.0,0.0,2.0],[0.0,0.0,0.0,0.0,0.0,0.0,2.0],[0.5,0.0,-0.5,0.0,0.0,0.0,0.0],[0.5,0.0,0.0,0.0,0.0,0.0,0.0],[0.0,0.0,0.0,0.0,0.0,0.0,0.0]])dot=make_dot(m)# Save pdfdot.render('dag')# Save pngdot.format='png'dot.render('dag')dot
Using the causal_order_ properties, we can see the causal ordering
as a result of the causal discovery. x2 and x3, which have latent
confounders as parents, are stored in a list without causal ordering.
model.causal_order_
[[2,3],0,5,1,4]
Also, using the adjacency_matrix_ properties, we can see the
adjacency matrix as a result of the causal discovery. The coefficients
between variables with latent confounders are np.nan.
To check if the LiNGAM assumption is broken, we can get p-values of
independence between error variables. The value in the i-th row and j-th
column of the obtained matrix shows the p-value of the independence of
the error variables \(e_i\) and \(e_j\).
Since BootstrapResult object is returned, we can get the ranking of
the causal directions extracted by get_causal_direction_counts()
method. In the following sample code, n_directions option is limited
to the causal directions of the top 8 rankings, and
min_causal_effect option is limited to causal directions with a
coefficient of 0.01 or more.
Also, using the get_directed_acyclic_graph_counts() method, we can
get the ranking of the DAGs extracted. In the following sample code,
n_dags option is limited to the dags of the top 3 rankings, and
min_causal_effect option is limited to causal directions with a
coefficient of 0.01 or more.
Using the get_total_causal_effects() method, we can get the list of
total causal effect. The total causal effects we can get are dictionary
type variable. We can display the list nicely by assigning it to
pandas.DataFrame. Also, we have replaced the variable index with a label
below.
causal_effects=result.get_total_causal_effects(min_causal_effect=0.01)# Assign to pandas.DataFrame for pretty displaydf=pd.DataFrame(causal_effects)labels=[f'x{i}'foriinrange(X.shape[1])]df['from']=df['from'].apply(lambdax:labels[x])df['to']=df['to'].apply(lambdax:labels[x])df
from
to
effect
probability
0
x0
x5
0.518064
0.12
1
x0
x1
0.504613
0.11
2
x0
x4
0.479543
0.11
3
x2
x1
0.508531
0.02
4
x2
x4
-0.476555
0.02
5
x3
x0
0.490217
0.01
6
x3
x1
0.630292
0.01
7
x4
x1
0.097063
0.01
8
x3
x2
0.796101
0.01
9
x1
x4
0.089596
0.01
10
x3
x4
-0.151733
0.01
11
x3
x5
0.254280
0.01
We can easily perform sorting operations with pandas.DataFrame.
And with pandas.DataFrame, we can easily filter by keywords. The
following code extracts the causal direction towards x1.
df[df['to']=='x1'].head()
from
to
effect
probability
1
x0
x1
0.504613
0.11
3
x2
x1
0.508531
0.02
6
x3
x1
0.630292
0.01
7
x4
x1
0.097063
0.01
Because it holds the raw data of the total causal effect (the original data
for calculating the median), it is possible to draw a histogram of the
values of the causal effect, as shown below.
importmatplotlib.pyplotaspltimportseabornassnssns.set()%matplotlibinlinefrom_index=0# index of x0to_index=5# index of x5plt.hist(result.total_effects_[:,to_index,from_index])
Using the get_paths() method, we can explore all paths from any
variable to any variable and calculate the bootstrap probability for
each path. The path will be output as an array of variable indices. For
example, the array [3,0,1] shows the path from variable X3 through
variable X0 to variable X1.
from_index=3# index of x3to_index=1# index of x0pd.DataFrame(result.get_paths(from_index,to_index))
path
effect
probability
0
[3, 0, 1]
0.263068
0.11
1
[3, 2, 1]
0.404828
0.02
How to use prior knowledge in BottomUpParceLiNGAM
np.random.seed(1000)x6=np.random.uniform(size=1000)x3=2.0*x6+np.random.uniform(size=1000)x0=0.5*x3+np.random.uniform(size=1000)x2=2.0*x6+np.random.uniform(size=1000)x1=0.5*x0+0.5*x2+np.random.uniform(size=1000)x5=0.5*x0+np.random.uniform(size=1000)x4=0.5*x0-0.5*x2+np.random.uniform(size=1000)# The latent variable x6 is not included.X=pd.DataFrame(np.array([x0,x1,x2,x3,x4,x5]).T,columns=['x0','x1','x2','x3','x4','x5'])
This method RCD (Repetitive Causal Discovery) [3] assumes an extension of the basic LiNGAM model [1] to hidden common cause cases, i.e., the latent variable LiNGAM model [2].
Similarly to the basic LiNGAM model [1], this method makes the following assumptions:
Linearity
Non-Gaussian continuous error variables
Acyclicity
However, RCD allows the existence of hidden common causes.
It outputs a causal graph where a bi-directed arc indicates the pair of variables that have the same hidden common causes, and a directed arrow indicates the causal direction of a pair of variables that are not affected by the same hidden common causes.
First, we generate a causal structure with 7 variables. Then we create a
dataset with 5 variables from x0 to x4, with x5 and x6 being the latent
variables.
np.random.seed(0)get_external_effect=lambdan:np.random.normal(0.0,0.5,n)**3n_samples=300x5=get_external_effect(n_samples)x6=get_external_effect(n_samples)x1=0.6*x5+get_external_effect(n_samples)x3=0.5*x5+get_external_effect(n_samples)x0=1.0*x1+1.0*x3+get_external_effect(n_samples)x2=0.8*x0-0.6*x6+get_external_effect(n_samples)x4=1.0*x0-0.5*x6+get_external_effect(n_samples)# The latent variable x6 is not included.X=pd.DataFrame(np.array([x0,x1,x2,x3,x4,x5]).T,columns=['x0','x1','x2','x3','x4','x5'])X.head()
x0
x1
x2
x3
x4
x5
0
-0.191493
-0.054157
0.014075
-0.047309
0.016311
0.686190
1
-0.967142
0.013890
-1.115854
-0.035899
-1.254783
0.008009
2
0.527409
-0.034960
0.426923
0.064804
0.894242
0.117195
3
1.583826
0.845653
1.265038
0.704166
1.994283
1.406609
4
0.286276
0.141120
0.116967
0.329866
0.257932
0.814202
m=np.array([[0.0,1.0,0.0,1.0,0.0,0.0,0.0],[0.0,0.0,0.0,0.0,0.0,0.6,0.0],[0.8,0.0,0.0,0.0,0.0,0.0,-0.6],[0.0,0.0,0.0,0.0,0.0,0.5,0.0],[1.0,0.0,0.0,0.0,0.0,0.0,-0.5],[0.0,0.0,0.0,0.0,0.0,0.0,0.0],[0.0,0.0,0.0,0.0,0.0,0.0,0.0]])dot=make_dot(m,labels=['x0','x1','x2','x3','x4','x5','f1(x6)'])# Save pdfdot.render('dag')# Save pngdot.format='png'dot.render('dag')dot
Also, using the adjacency_matrix_ properties, we can see the
adjacency matrix as a result of the causal discovery. The coefficients
between variables with latent confounders are np.nan.
To check if the LiNGAM assumption is broken, we can get p-values of
independence between error variables. The value in the i-th row and j-th
column of the obtained matrix shows the p-value of the independence of
the error variables \(e_i\) and \(e_j\).
Since BootstrapResult object is returned, we can get the ranking of
the causal directions extracted by get_causal_direction_counts()
method. In the following sample code, n_directions option is limited
to the causal directions of the top 8 rankings, and
min_causal_effect option is limited to causal directions with a
coefficient of 0.01 or more.
Also, using the get_directed_acyclic_graph_counts() method, we can
get the ranking of the DAGs extracted. In the following sample code,
n_dags option is limited to the dags of the top 3 rankings, and
min_causal_effect option is limited to causal directions with a
coefficient of 0.01 or more.
Using the get_total_causal_effects() method, we can get the list of
total causal effect. The total causal effects we can get are dictionary
type variable. We can display the list nicely by assigning it to
pandas.DataFrame. Also, we have replaced the variable index with a label
below.
causal_effects=result.get_total_causal_effects(min_causal_effect=0.01)# Assign to pandas.DataFrame for pretty displaydf=pd.DataFrame(causal_effects)labels=[f'x{i}'foriinrange(X.shape[1])]df['from']=df['from'].apply(lambdax:labels[x])df['to']=df['to'].apply(lambdax:labels[x])df
from
to
effect
probability
0
x3
x0
1.055784
0.04
1
x3
x2
0.818606
0.04
2
x4
x2
0.484138
0.04
3
x3
x4
1.016508
0.04
4
x1
x0
0.929668
0.03
5
x0
x2
0.781983
0.03
6
x0
x4
1.003733
0.03
7
x1
x4
1.041410
0.03
8
x1
x2
0.730836
0.02
9
x5
x0
0.961161
0.01
10
x5
x1
0.542628
0.01
11
x5
x3
0.559532
0.01
We can easily perform sorting operations with pandas.DataFrame.
Because it holds the raw data of the causal effect (the original data
for calculating the median), it is possible to draw a histogram of the
values of the causal effect, as shown below.
importmatplotlib.pyplotaspltimportseabornassnssns.set()%matplotlibinlinefrom_index=5# index of x5to_index=0# index of x0plt.hist(result.total_effects_[:,to_index,from_index])
Using the get_paths() method, we can explore all paths from any
variable to any variable and calculate the bootstrap probability for
each path. The path will be output as an array of variable indices. For
example, the array [3,0,1] shows the path from variable X3 through
variable X0 to variable X1.
from_index=5# index of x5to_index=4# index of x4pd.DataFrame(result.get_paths(from_index,to_index))
In this example, we need to import numpy, pandas, and
graphviz in addition to lingam. And to draw the causal graph, we
need to import make_dot method from lingam.utils.
By specify prediction_feature_indices and prediction_coefs that
can be obtained from the prediction model, we can draw the prediction
model with the causal structure.
Also, we can change the label of the target variable by
prediction_target_label, omit the coefficient of prediction model
without prediction_coefs, and change the color by
prediction_line_color.
Draw the result of LiNGAM with emphasis on descendants and ancestors
make_dot_highlight highlights descendants or ancestors of the graph.
The first argument is the result and the second argument is the index of the target variable. There are four types of cluster names: target, ancestor, descendant, and others. target contains only the node specified in the second argument. Nodes that are ancestors or descendants of target belong to ancestor or descendant. The number appended to the cluster name is the distance from target. Other nodes belong to others.
make_dot_highlight(model.adjacency_matrix_,0)
It is also possible to disable the display of clusters of ancestors and descendants.
LiNA [1] allows to locate the latent factors as well as uncover the causal structure between such latent factors of interests.
Causal structure between latent factors can be ubiquitous in real-world applications, e.g., relations bewteen anxiety, depression, and coping in psychology [2][3] , etc.
This method is based on the LiNA model as shown below,
where \({\varepsilon}^{(m)}\) and \({e}^{(m)}\) are random vectors that collect external influences, and errors, respectively,
and they are independent with each other. \({f}^{(m)}\) and \({x}^{(m)}\) are random vectors that collect latent factors, and observed data, respectively.
\({B}^{(m)}\) is a matrix that collects causal effects \(b_{ij}^{(m)}\) between \({f}_i^{(m)}\) and \({f}_j^{(m)}\),
while \({G}^{(m)}\) collects factor loadings \(g_{ij}^{(m)}\) between \({f}_j^{(m)}\) and \({x}_i^{(m)}\). \(m\)
stands for the \(m^{th}\) domain.
This method makes the following assumptions.
Linearity
Acyclicity
No causal relations between observed variables
Non-Gaussian continuous distubance variables (except at most one) for latent factors
Gaussian error variables (except at most one) for observed variables
Each latent factor has at lest 2 pure measurement variables.
First, we generate a causal structure with 10 measurement variables and 5 latent factors, where each latent
variable has 2 pure measurement variables.
ut.set_random_seed(1)noise_ratios=0.1n_features=10# number of measurement vars.n_samples,n_features_latent,n_edges,graph_type,sem_type=1000,5,5,'ER','laplace'B_true=ut.simulate_dag(n_features_latent,n_edges,graph_type)W_true=ut.simulate_parameter(B_true)# row to columnf,E,E_weight=ut.simulate_linear_sem(W_true,n_samples,sem_type)f_nor=np.zeros([n_samples,n_features_latent])scale=np.zeros([1,n_features_latent])W_true_scale=np.zeros([n_features_latent,n_features_latent])forjinrange(n_features_latent):scale[0,j]=np.std(f[:,j])f_nor[:,j]=f[:,j]/np.std(f[:,j])W_true_scale[:,j]=W_true[:,j]/scale[0,j]# scaled W_true# generate noises ei of xie=np.random.random([n_features,n_samples])forjinrange(n_features):e[j,:]=e[j,:]-np.mean(e[j,:])e[j,:]=e[j,:]/np.std(e[j,:])G=np.zeros([n_features,n_features_latent])G[0,0]=random.choice((-1,1))*(0.2+0.5*np.random.rand(1))G[1,0]=random.choice((-1,1))*(0.2+0.5*np.random.rand(1))G[2,1]=random.choice((-1,1))*(0.2+0.5*np.random.rand(1))G[3,1]=random.choice((-1,1))*(0.2+0.5*np.random.rand(1))G[4,2]=random.choice((-1,1))*(0.2+0.5*np.random.rand(1))G[5,2]=random.choice((-1,1))*(0.2+0.5*np.random.rand(1))G[6,3]=random.choice((-1,1))*(0.2+0.5*np.random.rand(1))G[7,3]=random.choice((-1,1))*(0.2+0.5*np.random.rand(1))G[8,4]=random.choice((-1,1))*(0.2+0.5*np.random.rand(1))G[9,4]=random.choice((-1,1))*(0.2+0.5*np.random.rand(1))G_sign=np.sign(G)# normalize GG_nor=np.zeros([n_features,n_features_latent])forjinrange(n_features):e[j,:]=e[j,:]/np.sqrt(np.square(np.sum(G[j,:]))+np.square(noise_ratios))G_nor[j,:]=G[j,:]/np.sqrt(np.square(np.sum(G[j,:]))+np.square(noise_ratios))X=G_nor@f_nor.T+noise_ratios*e# X:n_features*n_samples "e is small or n_features are large"X=X.Tprint('The true adjacency matrix is:\n',W_true)
We generate a causal structure with 2 domains where in each domain there are 6 measurement variables and 3 latent factors. Each latent factor has 2 pure measurement variables.
n_features=6# number of measurement vars. in each domainnoise_ratios=0.1ut.set_random_seed(1)n_samples,n_features_latent,n_edges,graph_type,sem_type1,sem_type2=1000,3,3,'ER','subGaussian','supGaussian'# n_edges: number of edges btw. latent factors in a domain# sem_type1/sem_type2: different distributions of noises from different domainsB_true=ut.simulate_dag(n_features_latent,n_edges,graph_type)# skeleton btw. latent factorsW_true=ut.simulate_parameter(B_true)# causal effects matrix btw. latent factors# 1 domainf,E,E_weight=ut.simulate_linear_sem(W_true,n_samples,sem_type1)f_nor1=np.zeros([n_samples,n_features_latent])scale1=np.zeros([1,n_features_latent])W_true_scale=np.zeros([n_features_latent,n_features_latent])forjinrange(n_features_latent):scale1[0,j]=np.std(f[:,j])f_nor1[:,j]=f[:,j]/np.std(f[:,j])W_true_scale[:,j]=W_true[:,j]/scale1[0,j]e=np.random.random([n_features,n_samples])forjinrange(n_features):e[j,:]=e[j,:]-np.mean(e[j,:])e[j,:]=e[j,:]/np.std(e[j,:])G1=np.zeros([n_features,n_features_latent])G1[0,0]=random.choice((-1,1))*(0.2+0.5*np.random.rand(1))G1[1,0]=random.choice((-1,1))*(0.2+0.5*np.random.rand(1))G1[2,1]=random.choice((-1,1))*(0.2+0.5*np.random.rand(1))G1[3,1]=random.choice((-1,1))*(0.2+0.5*np.random.rand(1))G1[4,2]=random.choice((-1,1))*(0.2+0.5*np.random.rand(1))G1[5,2]=random.choice((-1,1))*(0.2+0.5*np.random.rand(1))G_sign1=np.sign(G1)# normalize GG_nor1=np.zeros([n_features,n_features_latent])forjinrange(n_features):e[j,:]=e[j,:]/np.sqrt(np.square(np.sum(G1[j,:]))+np.square(noise_ratios))G_nor1[j,:]=G1[j,:]/np.sqrt(np.square(np.sum(G1[j,:]))+np.square(noise_ratios))X1=G_nor1@f_nor1.T+noise_ratios*e# "the noise ratio e is small or n_features is large"X1=X1.T# 2 domainf2,E,E_weight=ut.simulate_linear_sem(W_true,n_samples,sem_type2)f_nor2=np.zeros([n_samples,n_features_latent])scale2=np.zeros([1,n_features_latent])W_true_scale=np.zeros([n_features_latent,n_features_latent])forjinrange(n_features_latent):scale2[0,j]=np.std(f2[:,j])f_nor2[:,j]=f2[:,j]/np.std(f2[:,j])W_true_scale[:,j]=W_true[:,j]/scale2[0,j]e=np.random.random([n_features,n_samples])forjinrange(n_features):e[j,:]=e[j,:]-np.mean(e[j,:])e[j,:]=e[j,:]/np.std(e[j,:])G2=np.zeros([n_features,n_features_latent])G2[0,0]=random.choice((-1,1))*(0.2+0.5*np.random.rand(1))G2[1,0]=random.choice((-1,1))*(0.2+0.5*np.random.rand(1))G2[2,1]=random.choice((-1,1))*(0.2+0.5*np.random.rand(1))G2[3,1]=random.choice((-1,1))*(0.2+0.5*np.random.rand(1))G2[4,2]=random.choice((-1,1))*(0.2+0.5*np.random.rand(1))G2[5,2]=random.choice((-1,1))*(0.2+0.5*np.random.rand(1))G_sign2=np.sign(G2)# normalize GG_nor2=np.zeros([n_features,n_features_latent])forjinrange(n_features):e[j,:]=e[j,:]/np.sqrt(np.square(np.sum(G2[j,:]))+np.square(noise_ratios))G_nor2[j,:]=G2[j,:]/np.sqrt(np.square(np.sum(G2[j,:]))+np.square(noise_ratios))X2=G_nor2@f_nor2.T+noise_ratios*eX2=X2.T# X:n_samples * n_features# augment the data XX=scipy.linalg.block_diag(X1,X2)G_sign=scipy.linalg.block_diag(G_sign1,G_sign2)scale=scipy.linalg.block_diag(scale1,scale2)print('The true adjacency matrix is:\n',W_true)
RESIT [2] is an estimation algorithm for Additive Noise Model [1].
This method makes the following assumptions.
Continouos variables
Nonlinearity
Additive noise
Acyclicity
No hidden common causes
Denote observed variables by \({x}_{i}\) and error variables by \({e}_{i}\).
The error variables \({e}_{i}\) are independent due to the assumption of no hidden common causes.
Then, mathematically, the model for observed variables \({x}_{i}\) is written as
$$ x_i = f_i (pa(x_i))+e_i, $$
where \({f}_{i}\) are some nonlinear (differentiable) functions and \({pa}({x}_{i})\) are the parents of \({x}_{i}\).
First, we generate a causal structure with 7 variables. Then we create a
dataset with 6 variables from x0 to x5, with x6 being the latent
variable for x2 and x3.
X=pd.read_csv('nonlinear_data.csv')
m=np.array([[0,0,0,0,0],[1,0,0,0,0],[1,1,0,0,0],[0,1,1,0,0],[0,0,0,1,0]])dot=make_dot(m)# Save pdfdot.render('dag')# Save pngdot.format='png'dot.render('dag')dot
Using the causal_order_ properties, we can see the causal ordering
as a result of the causal discovery. x2 and x3, which have latent
confounders as parents, are stored in a list without causal ordering.
model.causal_order_
[0,1,2,3,4]
Also, using the adjacency_matrix_ properties, we can see the
adjacency matrix as a result of the causal discovery. The coefficients
between variables with latent confounders are np.nan.
Since BootstrapResult object is returned, we can get the ranking of
the causal directions extracted by get_causal_direction_counts()
method. In the following sample code, n_directions option is limited
to the causal directions of the top 8 rankings, and
min_causal_effect option is limited to causal directions with a
coefficient of 0.01 or more.
Also, using the get_directed_acyclic_graph_counts() method, we can
get the ranking of the DAGs extracted. In the following sample code,
n_dags option is limited to the dags of the top 3 rankings, and
min_causal_effect option is limited to causal directions with a
coefficient of 0.01 or more.
Using the get_paths() method, we can explore all paths from any
variable to any variable and calculate the bootstrap probability for
each path. The path will be output as an array of variable indices. For
example, the array [0,1,3] shows the path from variable X0 through
variable X1 to variable X3.
from_index=0# index of x0to_index=3# index of x3pd.DataFrame(result.get_paths(from_index,to_index))
Linear Mixed (LiM) causal discovery algorithm [1] extends LiNGAM to handle the mixed data that consists of both continuous and discrete variables.
The estimation is performed by first globally optimizing the log-likelihood function on the joint distribution of data with the acyclicity constraint, and then applying a local combinatorial search to output a causal graph.
This method is based on the LiM model as shown below,
i) As for the continuous variable, its value assigned to each of \(x_i\) is a linear function of its parent variables denoted
by \(x_{\mathrm{pa}(i)}\) plus a non-Gaussian error term \(e_i\), that is,
where the error terms \(e_i\) are continuous random variables with non-Gaussian densities, and the error variables \(e_i\) are independent of
each other. The coefficients \(b_{ij}\) and intercepts \(c_i\) are constants.
ii) As for the discrete variable, its value equals 1 if the linear function of its parent variables \(x_{\mathrm{pa}(i)}\) plus a Logistic error
term \(e_i\) is larger than 0, otherwise, its value equals 0. That is,
This method CAM-UV (Causal Additive Models with Unobserved Variables) [2] assumes an extension of the basic CAM model [1] to include unobserved variables.
This method makes the following assumptions:
The effects of direct causes on a variable form generalized additive models (GAMs).
The causal structures form directed acyclic graphs (DAGs).
CAM-UV allows the existence of unobserved variables.
It outputs a causal graph where a undirected edge indicates the pair of variables that have an unobserved causal path (UCP) or an unobserved backdoor path (UBP), and a directed edge indicates the causal direction of a pair of variables that do not have an UCP or UBP.
Definition of UCPs ans UBPs: As shown in the below figure, a causal path from \(x_j\) to \(x_i\) is called an UCP if it ends with the directed edge connecting \(x_i\) and its unobserved direct cause. A backdoor path between \(x_i\) and \(x_j\) is called an UBP if it starts with the edge connecting \(x_i\) and its unobserved direct cause, and ends with the edge connecting \(x_j\) and its unobserved direct cause.
First, we generate a causal structure with 2 unobserved variables (y6 and y7) and 6 observed variables (x0–x5) as shown in the below figure.
defget_noise(n):noise=((np.random.rand(1,n)-0.5)*5).reshape(n)mean=get_random_constant(0.0,2.0)noise+=meanreturnnoisedefcausal_func(cause):a=get_random_constant(-5.0,5.0)b=get_random_constant(-1.0,1.0)c=int(random.uniform(2,3))return((cause+a)**(c))+bdefget_random_constant(s,b):constant=random.uniform(-1.0,1.0)ifconstant>0:constant=random.uniform(s,b)else:constant=random.uniform(-b,-s)returnconstantdefcreate_data(n):causal_pairs=[[0,1],[0,3],[2,4]]intermediate_pairs=[[2,5]]confounder_pairs=[[3,4]]n_variables=6data=np.zeros((n,n_variables))# observed dataconfounders=np.zeros((n,len(confounder_pairs)))# data of unobserced common causes# Adding external effectsforiinrange(n_variables):data[:,i]=get_noise(n)foriinrange(len(confounder_pairs)):confounders[:,i]=get_noise(n)confounders[:,i]=confounders[:,i]/np.std(confounders[:,i])# Adding the effects of unobserved common causesfori,cpairinenumerate(confounder_pairs):cpair=list(cpair)cpair.sort()data[:,cpair[0]]+=causal_func(confounders[:,i])data[:,cpair[1]]+=causal_func(confounders[:,i])fori1inrange(n_variables)[0:n_variables]:data[:,i1]=data[:,i1]/np.std(data[:,i1])fori2inrange(n_variables)[i1+1:n_variables+1]:# Adding direct effects between observed variablesif[i1,i2]incausal_pairs:data[:,i2]+=causal_func(data[:,i1])# Adding undirected effects between observed variables mediated through unobserved variablesif[i1,i2]inintermediate_pairs:interm=causal_func(data[:,i1])+get_noise(n)interm=interm/np.std(interm)data[:,i2]+=causal_func(interm)returndatasample_size=2000X=create_data(sample_size)
To run causal discovery, we create a CAMUV object and call the fit
method.
model=lingam.CAMUV()model.fit(X)
Using the adjacency_matrix_ properties, we can see the adjacency matrix as a result of the causal discovery. When the value of a variable pair is np.nan, the variables have a UCP or UBP.
B1=np.array([[0.0,0.0,0.0,0.0,0.0,get_coef(),0.0],[0.0,0.0,0.0,0.0,0.0,get_coef(),0.0],[get_coef(),get_coef(),0.0,0.0,0.0,0.0,0.0],[0.0,0.0,get_coef(),0.0,0.0,0.0,get_coef()],[0.0,0.0,get_coef(),0.0,0.0,0.0,get_coef()],[0.0,0.0,0.0,0.0,0.0,0.0,0.0],[0.0,0.0,0.0,0.0,0.0,0.0,0.0]])samples=1000x5=get_external_effect(samples)x6=get_external_effect(samples)x0=x5*B1[0,5]+get_external_effect(samples)x1=x5*B1[1,5]+get_external_effect(samples)x2=x0*B1[2,0]+x1*B1[2,1]+get_external_effect(samples)x3=x2*B1[3,2]+x6*B1[3,6]+get_external_effect(samples)x4=x2*B1[4,2]+x6*B1[4,6]+get_external_effect(samples)# x5, x6 is a latent variable.X1=pd.DataFrame(np.array([x0,x1,x2,x3,x4,x5]).T,columns=['x0','x1','x2','x3','x4','x5'])
B2=np.array([[0.0,0.0,0.0,0.0,0.0,get_coef(),0.0],[0.0,0.0,0.0,0.0,0.0,get_coef(),0.0],[get_coef(),get_coef(),0.0,0.0,0.0,0.0,0.0],[0.0,0.0,get_coef(),0.0,0.0,0.0,get_coef()],[0.0,0.0,get_coef(),0.0,0.0,0.0,get_coef()],[0.0,0.0,0.0,0.0,0.0,0.0,0.0],[0.0,0.0,0.0,0.0,0.0,0.0,0.0]])samples=1000x5=get_external_effect(samples)x6=get_external_effect(samples)x0=x5*B2[0,5]+get_external_effect(samples)x1=x5*B2[1,5]+get_external_effect(samples)x2=x0*B2[2,0]+x1*B2[2,1]+get_external_effect(samples)x3=x2*B2[3,2]+x6*B2[3,6]+get_external_effect(samples)x4=x2*B2[4,2]+x6*B2[4,6]+get_external_effect(samples)# x5, x6 is a latent variable.X2=pd.DataFrame(np.array([x0,x1,x2,x3,x4,x5]).T,columns=['x0','x1','x2','x3','x4','x5'])
Also, using the adjacency_matrix_ properties, we can see the adjacency matrix as a result of the causal discovery. The coefficients between variables with latent confounders are np.nan.
To compare, we run MultiGroupRCD with single dataset concatenating two datasets. You can see that the causal structure cannot be estimated correctly for a single dataset.
By using utils.extract_ancestors, which implements Algorithm 1 of RCD method [1], we can extract the ancestors of variables.
Since RCD allows for the existence of unobserved common causes, we can search for ancestors even when there are unobserved common causes, as in the following example.
defget_coef():coef=random.random()returncoefifcoef>=0.5elsecoef-1.0get_external_effect=lambdan:np.random.normal(0.0,0.5,n)**3B=np.array([[0.0,0.0,0.0,0.0,0.0,get_coef(),0.0],[0.0,0.0,0.0,0.0,0.0,get_coef(),0.0],[get_coef(),get_coef(),0.0,0.0,0.0,0.0,0.0],[0.0,0.0,get_coef(),0.0,0.0,0.0,get_coef()],[0.0,0.0,get_coef(),0.0,0.0,0.0,get_coef()],[0.0,0.0,0.0,0.0,0.0,0.0,0.0],[0.0,0.0,0.0,0.0,0.0,0.0,0.0]])samples=500f0=get_external_effect(samples)f1=get_external_effect(samples)x0=f0*B[0,5]+get_external_effect(samples)x1=f0*B[1,5]+get_external_effect(samples)x2=x0*B[2,0]+x1*B[2,1]+get_external_effect(samples)x3=x2*B[3,2]+f1*B[3,6]+get_external_effect(samples)x4=x2*B[4,2]+f1*B[4,6]+get_external_effect(samples)# f0 and f1 are latent confoundersX=pd.DataFrame(np.array([x0,x1,x2,x3,x4]).T,columns=['x0','x1','x2','x3','x4'])make_dot(B,labels=['x0','x1','x2','x3','x4','f0','f1'])
M=extract_ancestors(X)foriinrange(X.shape[1]):iflen(M[i])==0:print(f'x{i} has no ancestors.')else:print(f'The ancestors of x{i} are '+', '.join([f'x{n}'forninM[i]]))
x6=np.random.uniform(size=1000)x3=2.0*x6+np.random.uniform(size=1000)x0=0.5*x3+np.random.uniform(size=1000)x2=2.0*x6+np.random.uniform(size=1000)x1=0.5*x0+0.5*x2+np.random.uniform(size=1000)x5=0.5*x0+np.random.uniform(size=1000)x4=0.5*x0-0.5*x2+np.random.uniform(size=1000)# The latent variable x6 is not included.X=pd.DataFrame(np.array([x0,x1,x2,x3,x4,x5]).T,columns=['x0','x1','x2','x3','x4','x5'])X.head()
measure ({'pwling', 'kernel', 'pwling_fast'}, optional (default='pwling')) – Measure to evaluate independence: ‘pwling’ [2] or ‘kernel’ [1].
For fast execution with GPU, ‘pwling_fast’ can be used (culingam is required).
adjacency_matrices_ – The list of adjacency matrix B for multiple datasets.
The shape of B is (n_features, n_features), where
n_features is the number of features.
Evaluate the statistical reliability of DAG based on the bootstrapping.
Parameters:
X_list (array-like, shape (X, ...)) – Multiple datasets for training, where X is an dataset.
The shape of ‘’X’’ is (n_samples, n_features),
where n_samples is the number of samples and n_features is the number of features.
n_sampling (int) – Number of bootstrapping samples.
Returns:
results – Returns the results of bootstrapping for multiple datasets.
X_list (array-like, shape (X, ...)) – Multiple datasets for training, where X is an dataset.
The shape of ‘’X’’ is (n_samples, n_features),
where n_samples is the number of samples and n_features is the number of features.
from_index – Index of source variable to estimate total effect.
to_index – Index of destination variable to estimate total effect.
X_list (list, shape [X, ...]) – Multiple datasets for training, where X is an dataset.
The shape of ‘’X’’ is (n_samples, n_features),
where n_samples is the number of samples and n_features is the number of features.
Calculate the p-value matrix of independence between error variables.
Parameters:
X_list (array-like, shape (X, ...)) – Multiple datasets for training, where X is an dataset.
The shape of ‘’X’’ is (n_samples, n_features),
where n_samples is the number of samples and n_features is the number of features.
Returns:
independence_p_values – p-value matrix of independence between error variables.
lags (int, optional (default=1)) – Number of lags.
criterion ({‘aic’, ‘fpe’, ‘hqic’, ‘bic’, None}, optional (default='bic')) – Criterion to decide the best lags within lags.
Searching the best lags is disabled if criterion is None.
prune (boolean, optional (default=True)) – Whether to prune the adjacency matrix of lags.
ar_coefs (array-like, optional (default=None)) – Coefficients of AR model. Estimating AR model is skipped if specified ar_coefs.
Shape must be (lags, n_features, n_features).
lingam_model (lingam object inherits 'lingam._BaseLiNGAM', optional (default=None)) – LiNGAM model for causal discovery. If None, DirectLiNGAM algorithm is selected.
random_state (int, optional (default=None)) – random_state is the seed used by the random number generator.
order (turple, length = 2, optional (default=(1, 1))) – Number of lags for AR and MA model.
criterion ({'aic', 'bic', 'hqic', None}, optional (default='bic')) – Criterion to decide the best order in the all combinations of order.
Searching the best order is disabled if criterion is None.
prune (boolean, optional (default=True)) – Whether to prune the adjacency matrix of lags.
max_iter (int, optional (default=100)) – Maximm number of iterations to estimate VARMA model.
ar_coefs (array-like, optional (default=None)) – Coefficients of AR of ARMA. Estimating ARMA model is skipped if specified ar_coefs and ma_coefs.
Shape must be (order[0], n_features, n_features).
ma_coefs (array-like, optional (default=None)) – Coefficients of MA of ARMA. Estimating ARMA model is skipped if specified ar_coefs and ma_coefs.
Shape must be (order[1], n_features, n_features).
lingam_model (lingam object inherits 'lingam._BaseLiNGAM', optional (default=None)) – LiNGAM model for causal discovery. If None, DirectLiNGAM algorithm is selected.
random_state (int, optional (default=None)) – random_state is the seed used by the random number generator.
causal_order_ – The causal order of fitted model, where
n_features is the number of features.
Return type:
array-like, shape (n_features)
estimate_total_effect(X, E, from_index, to_index, from_lag=0)[source]
Estimate total effect using causal model.
Parameters:
X (array-like, shape (n_samples, n_features)) – Original data, where n_samples is the number of samples
and n_features is the number of features.
E (array-like, shape (n_samples, n_features)) – Original error data, where n_samples is the number of samples
and n_features is the number of features.
from_index – Index of source variable to estimate total effect.
to_index – Index of destination variable to estimate total effect.
adjacency_matrices_ – The list of adjacency matrix B(t,t) and B(t,t-τ) for longitudinal datasets.
The shape of B(t,t) and B(t,t-τ) is (n_features, n_features), where
n_features is the number of features.
If the previous data required for the calculation are not available,
such as B(t,t) or B(t,t-τ) at t=0, all elements of the matrix are nan.
Evaluate the statistical reliability of DAG based on the bootstrapping.
Parameters:
X_list (array-like, shape (X, ...)) – Longitudinal multiple datasets for training, where X is an dataset.
The shape of ‘’X’’ is (n_samples, n_features),
where n_samples is the number of samples and n_features is the number of features.
n_sampling (int) – Number of bootstrapping samples.
Returns:
results – Returns the results of bootstrapping for multiple datasets.
X_t (array-like, shape (timepoint, n_samples, n_features)) – Original data, where n_samples is the number of samples
and n_features is the number of features.
_t (from) – The timepoint of source variable.
from_index – Index of source variable to estimate total effect.
to_t – The timepoint of destination variable.
to_index – Index of destination variable to estimate total effect.
X_list (list, shape [X, ...]) – Longitudinal multiple datasets for training, where X is an dataset.
The shape of X is (n_samples, n_features),
where n_samples is the number of samples and n_features is the number of features.
residuals_ – Residuals of regression, where E is an dataset.
The shape of E is (n_samples, n_features),
where n_samples is the number of samples and n_features is the number of features.
Get causal direction count as a result of bootstrapping.
Parameters:
n_directions (int, optional (default=None)) – If int, then The top n_directions items are included in the result
min_causal_effect (float, optional (default=None)) – Threshold for detecting causal direction.
If float, then causal directions with absolute values of causal effects
less than min_causal_effect are excluded.
split_by_causal_effect_sign (boolean, optional (default=False)) – If True, then causal directions are split depending on the sign of the causal effect.
Returns:
causal_direction_counts – List of causal directions sorted by count in descending order.
The dictionary has the following format:
n_dags (int, optional (default=None)) – If int, then The top n_dags items are included in the result
min_causal_effect (float, optional (default=None)) – Threshold for detecting causal direction.
If float, then causal directions with absolute values of causal effects less than
min_causal_effect are excluded.
split_by_causal_effect_sign (boolean, optional (default=False)) – If True, then causal directions are split depending on the sign of the causal effect.
Returns:
directed_acyclic_graph_counts – List of directed acyclic graphs sorted by count in descending order.
The dictionary has the following format:
{'dag':[n_dags],'count':[n_dags]}.
where n_dags is the number of directed acyclic graphs.
Get all paths from the start variable to the end variable and their bootstrap probabilities.
Parameters:
from_index (int) – Index of the variable at the start of the path.
to_index (int) – Index of the variable at the end of the path.
min_causal_effect (float, optional (default=None)) – Threshold for detecting causal direction.
Causal directions with absolute values of causal effects less than min_causal_effect are excluded.
Returns:
paths – List of path and bootstrap probability.
The dictionary has the following format:
min_causal_effect (float, optional (default=None)) – Threshold for detecting causal direction.
If float, then causal directions with absolute values of causal effects less than
min_causal_effect are excluded.
Returns:
probabilities – List of bootstrap probability matrix.
min_causal_effect (float, optional (default=None)) – Threshold for detecting causal direction.
If float, then causal directions with absolute values of causal effects less than
min_causal_effect are excluded.
Returns:
total_causal_effects – List of bootstrap total causal effect sorted by probability in descending order.
The dictionary has the following format:
resampled_indices_ – The list of original index of resampled samples,
where n_sampling is the number of bootstrap sampling
and resample_size is the size of each subsample set.
Get causal direction count as a result of bootstrapping.
Parameters:
n_directions (int, optional (default=None)) – If int, then The top n_directions items are included in the result
min_causal_effect (float, optional (default=None)) – Threshold for detecting causal direction.
If float, then causal directions with absolute values of causal effects
less than min_causal_effect are excluded.
split_by_causal_effect_sign (boolean, optional (default=False)) – If True, then causal directions are split depending on the sign of the causal effect.
Returns:
causal_direction_counts – List of causal directions sorted by count in descending order.
The dictionary has the following format:
n_dags (int, optional (default=None)) – If int, then The top n_dags items are included in the result
min_causal_effect (float, optional (default=None)) – Threshold for detecting causal direction.
If float, then causal directions with absolute values of causal effects less than
min_causal_effect are excluded.
split_by_causal_effect_sign (boolean, optional (default=False)) – If True, then causal directions are split depending on the sign of the causal effect.
Returns:
directed_acyclic_graph_counts – List of directed acyclic graphs sorted by count in descending order.
The dictionary has the following format:
{'dag':[n_dags],'count':[n_dags]}.
where n_dags is the number of directed acyclic graphs.
Get all paths from the start variable to the end variable and their bootstrap probabilities.
Parameters:
from_index (int) – Index of the variable at the start of the path.
to_index (int) – Index of the variable at the end of the path.
from_lag (int) – Number of lag at the start of the path.
from_lag should be greater than or equal to to_lag.
to_lag (int) – Number of lag at the end of the path.
from_lag should be greater than or equal to to_lag.
min_causal_effect (float, optional (default=None)) – Threshold for detecting causal direction.
Causal directions with absolute values of causal effects less than min_causal_effect are excluded.
Returns:
paths – List of path and bootstrap probability.
The dictionary has the following format:
min_causal_effect (float, optional (default=None)) – Threshold for detecting causal direction.
If float, then causal directions with absolute values of causal effects less than
min_causal_effect are excluded.
Returns:
probabilities – List of bootstrap probability matrix.
min_causal_effect (float, optional (default=None)) – Threshold for detecting causal direction.
If float, then causal directions with absolute values of causal effects less than
min_causal_effect are excluded.
Returns:
total_causal_effects – List of bootstrap total causal effect sorted by probability in descending order.
The dictionary has the following format:
resampled_indices_ – The list of original index of resampled samples,
where n_sampling is the number of bootstrap sampling
and resample_size is the size of each subsample set.
Get causal direction count as a result of bootstrapping.
Parameters:
n_directions (int, optional (default=None)) – If int, then The top n_directions items are included in the result
min_causal_effect (float, optional (default=None)) – Threshold for detecting causal direction.
If float, then causal directions with absolute values of causal effects
less than min_causal_effect are excluded.
split_by_causal_effect_sign (boolean, optional (default=False)) – If True, then causal directions are split depending on the sign of the causal effect.
Returns:
causal_direction_counts – List of causal directions sorted by count in descending order.
The dictionary has the following format:
n_dags (int, optional (default=None)) – If int, then The top n_dags items are included in the result
min_causal_effect (float, optional (default=None)) – Threshold for detecting causal direction.
If float, then causal directions with absolute values of causal effects less than
min_causal_effect are excluded.
split_by_causal_effect_sign (boolean, optional (default=False)) – If True, then causal directions are split depending on the sign of the causal effect.
Returns:
directed_acyclic_graph_counts – List of directed acyclic graphs sorted by count in descending order.
The dictionary has the following format:
{'dag':[n_dags],'count':[n_dags]}.
where n_dags is the number of directed acyclic graphs.
Get all paths from the start variable to the end variable and their bootstrap probabilities.
Parameters:
from_index (int) – Index of the variable at the start of the path.
to_index (int) – Index of the variable at the end of the path.
from_lag (int) – Number of lag at the start of the path.
from_lag should be greater than or equal to to_lag.
to_lag (int) – Number of lag at the end of the path.
from_lag should be greater than or equal to to_lag.
min_causal_effect (float, optional (default=None)) – Threshold for detecting causal direction.
Causal directions with absolute values of causal effects less than min_causal_effect are excluded.
Returns:
paths – List of path and bootstrap probability.
The dictionary has the following format:
min_causal_effect (float, optional (default=None)) – Threshold for detecting causal direction.
If float, then causal directions with absolute values of causal effects less than
min_causal_effect are excluded.
Returns:
probabilities – List of bootstrap probability matrix.
min_causal_effect (float, optional (default=None)) – Threshold for detecting causal direction.
If float, then causal directions with absolute values of causal effects less than
min_causal_effect are excluded.
Returns:
total_causal_effects – List of bootstrap total causal effect sorted by probability in descending order.
The dictionary has the following format:
resampled_indices_ – The list of original index of resampled samples,
where n_sampling is the number of bootstrap sampling
and resample_size is the size of each subsample set.
Get causal direction count as a result of bootstrapping.
Parameters:
n_directions (int, optional (default=None)) – If int, then The top n_directions items are included in the result
min_causal_effect (float, optional (default=None)) – Threshold for detecting causal direction.
If float, then causal directions with absolute values of causal
effects less than min_causal_effect are excluded.
split_by_causal_effect_sign (boolean, optional (default=False)) – If True, then causal directions are split depending on the sign of the causal effect.
Returns:
causal_direction_counts – List of causal directions sorted by count in descending order.
The dictionary has the following format:
n_dags (int, optional (default=None)) – If int, then The top n_dags items are included in the result
min_causal_effect (float, optional (default=None)) – Threshold for detecting causal direction.
If float, then causal directions with absolute values of causal effects less than
min_causal_effect are excluded.
split_by_causal_effect_sign (boolean, optional (default=False)) – If True, then causal directions are split depending on the sign of the causal effect.
Returns:
directed_acyclic_graph_counts – List of directed acyclic graphs sorted by count in descending order.
The dictionary has the following format:
{'dag':[n_dags],'count':[n_dags]}.
where n_dags is the number of directed acyclic graphs.
Get all paths from the start variable to the end variable and their bootstrap probabilities.
Parameters:
from_index (int) – Index of the variable at the start of the path.
to_index (int) – Index of the variable at the end of the path.
from_t (int) – The starting timepoint of the path.
to_t (int) – The end timepoint of the path.
min_causal_effect (float, optional (default=None)) – Threshold for detecting causal direction.
Causal directions with absolute values of causal effects less than min_causal_effect are excluded.
Returns:
paths – List of path and bootstrap probability.
The dictionary has the following format:
min_causal_effect (float, optional (default=None)) – Threshold for detecting causal direction.
If float, then causal directions with absolute values of causal effects less than
min_causal_effect are excluded.
Returns:
probabilities – List of bootstrap probability matrix.
min_causal_effect (float, optional (default=None)) – Threshold for detecting causal direction.
If float, then causal directions with absolute values of causal effects less than
min_causal_effect are excluded.
Returns:
total_causal_effects – List of bootstrap total causal effect sorted by probability in descending order.
The dictionary has the following format:
random_state (int, optional (default=None)) – random_state is the seed used by the random number generator.
alpha (float, optional (default=0.1)) – Significant level of statistical test. If alpha=0.0, rejection does not occur in statistical tests.
regressor (regressor object implementing 'fit' and 'predict' function (default=None)) – Regressor to compute residuals.
This regressor object must have fit``methodand``predict function like scikit-learn’s model.
Prior knowledge used for causal discovery, where n_features is the number of features.
The elements of prior knowledge matrix are defined as follows [1]:
0 : \(x_i\) does not have a directed path to \(x_j\)
1 : \(x_i\) has a directed path to \(x_j\)
-1 : No prior knowledge is available to know if either of the two cases above (0 or 1) is true.
independence ({'hsic', 'fcorr'}, optional (default='hsic')) – Methods to determine independence. If ‘hsic’ is set, test for independence by HSIC.
If ‘fcorr’ is set, independence is determined by F-correlation.
ind_corr (float, optional (default=0.5)) – The threshold value for determining independence by F-correlation;
independence is determined when the value of F-correlation is below this threshold value.
max_explanatory_num (int, optional (default=2)) – Maximum number of explanatory variables.
cor_alpha (float, optional (default=0.01)) – Alpha level for pearson correlation.
ind_alpha (float, optional (default=0.01)) – Alpha level for HSIC.
shapiro_alpha (float, optional (default=0.01)) – Alpha level for Shapiro-Wilk test.
MLHSICR (bool, optional (default=False)) – If True, use MLHSICR for multiple regression, if False, use OLS for multiple regression.
bw_method (str, optional (default=``mdbs``)) –
The method used to calculate the bandwidth of the HSIC.
mdbs : Median distance between samples.
scott : Scott’s Rule of Thumb.
silverman : Silverman’s Rule of Thumb.
independence ({'hsic', 'fcorr'}, optional (default='hsic')) – Methods to determine independence. If ‘hsic’ is set, test for independence by HSIC.
If ‘fcorr’ is set, independence is determined by F-correlation.
ind_corr (float, optional (default=0.5)) – The threshold value for determining independence by F-correlation;
independence is determined when the value of F-correlation is below this threshold value.
causal_model (lingam object inherits 'lingam._BaseLiNGAM' or array-like with shape (n_features, n_features)) – Causal model for calculating causal effects.
The lingam object is lingam.DirectLiNGAM or lingam.ICALiNGAM, and fit function
needs to be executed already.
For array-like, adjacency matrix to estimate causal effect, where n_features is the number of features.
Estimate the intervention effect with the prediction model.
Parameters:
X (array-like, shape (n_samples, n_features)) – Original data, where n_samples is the number of samples
and n_features is the number of features.
target_index (int) – Index of target variable.
pred_model (model object implementing 'predict' or 'predict_proba') – Model to predict the expectation. For linear regression or non-linear reggression,
model object must have predict method.
For logistic regression, model object must have predict_proba method.
Returns:
intervention_effects – Estimated values of intervention effect.
The first column of the list is the value of ‘E[Y|do(Xi=mean)]-E[Y|do(Xi=mean+std)]’,
and the second column is the value of ‘E[Y|do(Xi=mean)]–E[Y|do(Xi=mean-std)]’.
The maximum value in this array is the feature having the greatest intervention effect.
Estimate of the intervention such that the expectation of
the prediction of the post-intervention observations is equal
or close to a specified value.
Parameters:
X (array-like, shape (n_samples, n_features)) – Original data, where n_samples is the number of samples
and n_features is the number of features.
target_index (int) – Index of target variable.
pred_model (model object.) – Model to predict the expectation. Only linear regression model can be specified.
Model object musst have coef_ and intercept_ attributes.
intervention_index (int) – Index of variable to apply intervention.
desired_output – Desired expected post-intervention output of prediction.
Returns:
optimal_intervention – Optimal intervention on intervention_index variable.
Print causal directions of bootstrap result to stdout.
Parameters:
cdc (dict) – List of causal directions sorted by count in descending order.
This can be set the value returned by BootstrapResult.get_causal_direction_counts() method.
n_sampling (int) – Number of bootstrapping samples.
labels (array-like, optional (default=None)) – List of feature lables.
If set labels, the output feature name will be the specified label.
dagc (dict) – List of directed acyclic graphs sorted by count in descending order.
This can be set the value returned by BootstrapResult.get_directed_acyclic_graph_counts() method.
n_sampling (int) – Number of bootstrapping samples.
labels (array-like, optional (default=None)) – List of feature lables.
If set labels, the output feature name will be the specified label.
exogenous_variables (array-like, shape (index, ...), optional (default=None)) – List of exogenous variables(index).
Prior knowledge is created with the specified variables as exogenous variables.
sink_variables (array-like, shape (index, ...), optional (default=None)) – List of sink variables(index).
Prior knowledge is created with the specified variables as sink variables.
paths (array-like, shape ((index, index), ...), optional (default=None)) – List of variables(index) pairs with directed path.
If (i,j), prior knowledge is created that xi has a directed path to xj.
no_paths (array-like, shape ((index, index), ...), optional (default=None)) – List of variables(index) pairs without directed path.
If (i,j), prior knowledge is created that xi does not have a directed path to xj.
Returns:
prior_knowledge – Return matrix of prior knowledge used for causal discovery.
Directed graph source code in the DOT language with specified adjacency matrix.
Parameters:
adjacency_matrix (array-like with shape (n_features, n_features)) – Adjacency matrix to make graph, where n_features is the number of features.
labels (array-like, optional (default=None)) – Label to use for graph features.
lower_limit (float, optional (default=0.01)) – Threshold for drawing direction.
If float, then directions with absolute values of coefficients less than lower_limit are excluded.
prediction_feature_indices (array-like, optional (default=None)) – Indices to use as prediction features.
prediction_target_label (string, optional (default='Y(pred)'))) – Label to use for target variable of prediction.
prediction_line_color (string, optional (default='red')) – Line color to use for prediction’s graph.
prediction_coefs (array-like, optional (default=None)) – Coefficients to use for prediction’s graph.
prediction_feature_importance (array-like, optional (default=None)) – Feature importance to use for prediction’s graph.
path (tuple, optional (default=None)) – Path to highlight. Tuple of start index and end index.
path_color (string, optional (default=None)) – Colors to highlight a path.
dag (array-like, shape (n_features, n_features)) – The adjacency matrix to fine all paths, where n_features is the number of features.
from_index (int) – Index of the variable at the start of the path.
to_index (int) – Index of the variable at the end of the path.
min_causal_effect (float, optional (default=0.0)) – Threshold for detecting causal direction.
Causal directions with absolute values of causal effects less than min_causal_effect are excluded.
Returns:
paths (array-like, shape (n_paths)) – List of found path, where n_paths is the number of paths.
effects (array-like, shape (n_paths)) – List of causal effect, where n_paths is the number of paths.
X (pandas.DataFrame, shape (n_samples, n_features)) – Training data used to obtain cd_result.
cd_result (array-like with shape (n_features, n_features) or BootstrapResult) – Adjacency matrix or BootstrapResult. These are the results of a causal discovery.
estimator (estimator object) – estimator used for non-linear regression.
Regression with estimator using cause_name and covariates as explanatory
variables and effect_name as objective variable.
Those covariates are searched for in cd_result.
cause_name (str) – The name of the cause variable.
effect_name (str) – The name of the effect variable.
cause_positions (array-like, optional (default=None)) – List of positions from which causal effects are calculated.
By default, cause_positions stores the position at which the value range of X is divided
into 10 equal parts.
percentile (array-like, optional (default=None)) – A tuple consisting of three percentile values. Each value must be greater
than 0 and less than 100. By default, (95, 50, 5) is set.
fig (plt.Figure, optional (default=None)) – If fig is given, draw a figure in fig. If not given, plt.fig
is prepared internally.
boxplot (boolean, optional (default=False)) – If True, draw a box plot instead of a scatter plot for each cause_positions.
evaluate the given adjacency matrix and return fit indices
Parameters:
adjacency_matrix (array-like, shape (n_features, n_features)) – Adjacency matrix representing a causal graph.
The i-th column and row correspond to the i-th column of X.
X (array-like, shape (n_samples, n_features)) – Training data.
is_ordinal (array-like, shape (n_features,)) – Binary list. The i-th element represents that the i-th column of X is ordinal or not.
0 means not ordinal, otherwise ordinal.
Returns:
fit_indices – Fit indices. This API uses semopy’s calc_stats(). See semopy’s reference for details.
Fit the model to X with measurement structure and latent factors’ scales.
Parameters:
X (array-like, shape (n_samples, n_features)) – Training data, where n_samples is the number of samples
and n_features is the number of measurement features.
G_sign (array-like, shape (n_features, n_features_latent)) – Measurement structure matrix, where n_features_latent is
the number of latent factors and n_features is the
number of measurement features.
scale (array-like, shape (1, n_features_latent)) – Scales of the latent factors.
Estimated measurement matrix between measurement variables and
latent factors.
Returns:
measurement_matrix_ – The measurement matrix between measurement variables and
latent factors, where n_features_latent is the
number of latent factors and n_features is the
number of measurement variables.
Fit the model to X with measurement structure and latent factors’ scales.
Parameters:
X (array-like, shape (n_samples, n_features)) – Training data, where n_samples is the number of samples
of all domains and n_features is the number of features
of all domains.
G_sign (array-like, shape (n_features, n_features_latent)) – Measurement structure matrix, where n_features_latent is
the number of latent factors of all domains and n_features
is the number of measurement variables of all domains.
scale (array-like, shape (1, n_features_latent)) – Scales of the latent factors.
Estimated measurement matrix between measurement variables and
latent factors from all domains.
Returns:
measurement_matrix_ – The measurement matrix between measurement variables and
latent factors, where n_features_latent is the
number of latent factors and n_features is the
number of measurement variables from all domains.
Implementation of RESIT(regression with subsequent independence test) Algorithm [1]
References
Notes
RESIT algorithm returns an adjacency matrix consisting of zeros or ones,
rather than an adjacency matrix consisting of causal coefficients,
in order to estimate nonlinear causality.
regressor (regressor object implementing 'fit' and 'predict' function (default=None)) – Regressor to compute residuals.
This regressor object must have fit method and predict function like scikit-learn’s model.
random_state (int, optional (default=None)) – random_state is the seed used by the random number generator.
alpha (float, optional (default=0.01)) – Alpha level for HSIC independence test when removing superfluous edges.
X (array-like, shape (n_samples, n_features)) – Training data, where n_samples is the number of samples
and n_features is the number of observed variables.
dis_con (array-like, shape (1, n_features)) – Indicators of discrete or continuous variables, where “1”
indicates a continuous variable, while “0” a discrete
variable.
only_global (boolean, optional (default=False)) – If True, then the method will only perform the global optimization
to estimate the causal structure, without the local search phase.
is_poisson (boolean, optional (default=False)) – If True, then the method will use poisson regression model to compute
the log-likelihood in the local search phase.
num_explanatory_vals (int, optional (default=2)) – Maximum number of explanatory variables.
independence ({'hsic', 'fcorr'}, optional (default='hsic')) – Methods to determine independence. If ‘hsic’ is set, test for independence by HSIC.
If ‘fcorr’ is set, independence is determined by F-correlation.
ind_corr (float, optional (default=0.5)) – The threshold value for determining independence by F-correlation;
independence is determined when the value of F-correlation is below this threshold value.
max_explanatory_num (int, optional (default=2)) – Maximum number of explanatory variables.
cor_alpha (float, optional (default=0.01)) – Alpha level for pearson correlation.
ind_alpha (float, optional (default=0.01)) – Alpha level for HSIC.
shapiro_alpha (float, optional (default=0.01)) – Alpha level for Shapiro-Wilk test.
MLHSICR (bool, optional (default=True)) – If True, use MLHSICR for multiple regression, if False, use OLS for multiple regression.
bw_method (str, optional (default=``mdbs``)) –
The method used to calculate the bandwidth of the HSIC.
mdbs : Median distance between samples.
scott : Scott’s Rule of Thumb.
silverman : Silverman’s Rule of Thumb.
independence ({'hsic', 'fcorr'}, optional (default='hsic')) – Methods to determine independence. If ‘hsic’ is set, test for independence by HSIC.
If ‘fcorr’ is set, independence is determined by F-correlation.
ind_corr (float, optional (default=0.5)) – The threshold value for determining independence by F-correlation;
independence is determined when the value of F-correlation is below this threshold value.
adjacency_matrices_ – The list of adjacency matrix B for multiple datasets.
The shape of B is (n_features, n_features), where
n_features is the number of features.
Evaluate the statistical reliability of DAG based on the bootstrapping.
Parameters:
X_list (array-like, shape (X, ...)) – Multiple datasets for training, where X is an dataset.
The shape of ‘’X’’ is (n_samples, n_features),
where n_samples is the number of samples and n_features is the number of features.
n_sampling (int) – Number of bootstrapping samples.
Returns:
results – Returns the results of bootstrapping for multiple datasets.
X_list (array-like, shape (X, ...)) – Multiple datasets for training, where X is an dataset.
The shape of ‘’X’’ is (n_samples, n_features),
where n_samples is the number of samples and n_features is the number of features.
from_index – Index of source variable to estimate total effect.
to_index – Index of destination variable to estimate total effect.
X_list (list, shape [X, ...]) – Multiple datasets for training, where X is an dataset.
The shape of ‘’X’’ is (n_samples, n_features),
where n_samples is the number of samples and n_features is the number of features.
Calculate the p-value matrix of independence between error variables.
Parameters:
X_list (array-like, shape (X, ...)) – Multiple datasets for training, where X is an dataset.
The shape of ‘’X’’ is (n_samples, n_features),
where n_samples is the number of samples and n_features is the number of features.
Returns:
independence_p_values – p-value matrix of independence between error variables.