VARMALiNGAM
Model
VARMALiNGAM [3] is an extension of the basic LiNGAM model [1] to time series cases. It combines the basic LiNGAM model with the classic vector autoregressive moving average models (VARMA). It enables analyzing both lagged and contemporaneous (instantaneous) causal relations, whereas the classic VARMA only analyzes lagged causal relations. This VARMALiNGAM model also is an extension of the VARLiNGAM model [2]. It uses VARMA to analyze lagged causal relations instead of VAR. This VARMALiNGAM makes the following assumptions similarly to the basic LiNGAM model [1]:
Linearity
Non-Gaussian continuous error variables (except at most one)
Acyclicity of contemporaneous causal relations
No hidden common causes between contempraneous error variables
Denote observed variables at time point \({t}\) by \({x}_{i}(t)\) and error variables by \({e}_{i}(t)\). Collect them in vectors \({x}(t)\) and \({e}(t)\), respectivelly. Further, denote by matrices \({B}_{\tau}\) and \({\Omega}_{\omega}\) coefficient matrices with time lags \({\tau}\) and \({\omega}\), respectivelly.
Due to the acyclicity assumption of contemporaneous causal relations, the adjacency matrix $B_0$ can be permuted to be strictly lower-triangular by a simultaneous row and column permutation. The error variables \({e}_{i}(t)\) are independent due to the assumption of no hidden common causes.
Then, mathematically, the model for observed variable vector \({x}(t)\) is written as
$$ x(t) = \sum_{ \tau = 0}^k B_{ \tau } x(t - \tau) + e(t) - \sum_{ \omega = 1}^{\ell} \Omega_{ \omega } e(t- \omega).$$
Example applications are found here, especially in Section. Economics/Finance/Marketing. For example, [3] uses the VARLiNGAM model to to study the processes of firm growth and firm performance using microeconomic data and to analyse the effects of monetary policy using macroeconomic data.
References
Import and settings
In this example, we need to import numpy
, pandas
, and
graphviz
in addition to lingam
.
import numpy as np
import pandas as pd
import graphviz
import lingam
from lingam.utils import make_dot, print_causal_directions, print_dagc
import warnings
warnings.filterwarnings('ignore')
print([np.__version__, pd.__version__, graphviz.__version__, lingam.__version__])
np.set_printoptions(precision=3, suppress=True)
np.random.seed(0)
['1.24.4', '2.0.3', '0.20.1', '1.8.3']
Test data
We create test data consisting of 5 variables.
psi0 = np.array([
[ 0. , 0. , -0.25, 0. , 0. ],
[-0.38, 0. , 0.14, 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. ],
[ 0.44, -0.2 , -0.09, 0. , 0. ],
[ 0.07, -0.06, 0. , 0.07, 0. ]
])
phi1 = np.array([
[-0.04, -0.29, -0.26, 0.14, 0.47],
[-0.42, 0.2 , 0.1 , 0.24, 0.25],
[-0.25, 0.18, -0.06, 0.15, 0.18],
[ 0.22, 0.39, 0.08, 0.12, -0.37],
[-0.43, 0.09, -0.23, 0.16, 0.25]
])
theta1 = np.array([
[ 0.15, -0.02, -0.3 , -0.2 , 0.21],
[ 0.32, 0.12, -0.11, 0.03, 0.42],
[-0.07, -0.5 , 0.03, -0.27, -0.21],
[-0.17, 0.35, 0.25, 0.24, -0.25],
[ 0.09, 0.4 , 0.41, 0.24, -0.31]
])
causal_order = [2, 0, 1, 3, 4]
# data generated from psi0 and phi1 and theta1, causal_order
X = np.loadtxt('data/sample_data_varma_lingam.csv', delimiter=',')
Causal Discovery
To run causal discovery, we create a VARMALiNGAM
object and call the fit()
method.
model = lingam.VARMALiNGAM(order=(1, 1), criterion=None)
model.fit(X)
<lingam.varma_lingam.VARMALiNGAM at 0x1acfc3fa6d8>
Using the causal_order_
properties, we can see the causal ordering as a result of the causal discovery.
model.causal_order_
[2, 0, 1, 3, 4]
Also, using the adjacency_matrices_
properties, we can see the adjacency matrix as a result of the causal discovery.
# psi0
model.adjacency_matrices_[0][0]
array([[ 0. , 0. , -0.194, 0. , 0. ],
[-0.354, 0. , 0.191, 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. ],
[ 0.558, -0.228, 0. , 0. , 0. ],
[ 0.115, 0. , 0. , 0. , 0. ]])
# psi1
model.adjacency_matrices_[0][1]
array([[ 0. , -0.394, -0.509, 0. , 0.659],
[-0.3 , 0. , 0. , 0.211, 0.404],
[-0.281, 0.21 , 0. , 0.118, 0.25 ],
[ 0.082, 0.762, 0.178, 0.137, -0.819],
[-0.507, 0. , -0.278, 0. , 0.336]])
# omega0
model.adjacency_matrices_[1][0]
array([[ 0. , 0. , 0. , 0. , 0. ],
[ 0.209, 0.365, 0. , 0. , 0.531],
[ 0. , -0.579, -0.105, -0.298, -0.235],
[ 0. , 0.171, 0.414, 0.302, 0. ],
[ 0.297, 0.435, 0.482, 0.376, -0.438]])
Using DirectLiNGAM
for the residuals_
properties, we can
calculate psi0 matrix.
dlingam = lingam.DirectLiNGAM()
dlingam.fit(model.residuals_)
dlingam.adjacency_matrix_
array([[ 0. , 0. , -0.238, 0. , 0. ],
[-0.392, 0. , 0.182, 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. ],
[ 0.587, -0.209, 0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. ]])
We can draw a causal graph by utility funciton
labels = ['y0(t)', 'y1(t)', 'y2(t)', 'y3(t)', 'y4(t)', 'y0(t-1)', 'y1(t-1)', 'y2(t-1)', 'y3(t-1)', 'y4(t-1)']
make_dot(np.hstack(model.adjacency_matrices_[0]), lower_limit=0.3, ignore_shape=True, labels=labels)
Independence between error variables
To check if the LiNGAM assumption is broken, we can get p-values of independence between error variables. The value in the i-th row and j-th column of the obtained matrix shows the p-value of the independence of the error variables \(e_i\) and \(e_j\).
p_values = model.get_error_independence_p_values()
print(p_values)
[[0. 0.622 0.388 0. 0.539]
[0.622 0. 0.087 0.469 0.069]
[0.388 0.087 0. 0.248 0.229]
[0. 0.469 0.248 0. 0.021]
[0.539 0.069 0.229 0.021 0. ]]
Bootstrap
Bootstrapping
We call bootstrap()
method instead of fit()
. Here, the second argument specifies the number of bootstrap sampling.
model = lingam.VARMALiNGAM()
result = model.bootstrap(X, n_sampling=100)
Causal Directions
Since BootstrapResult
object is returned, we can get the ranking of the causal directions extracted by get_causal_direction_counts()
method. In the following sample code, n_directions
option is limited to the causal directions of the top 8 rankings, and min_causal_effect
option is limited to causal directions with a coefficient of 0.4 or more.
cdc = result.get_causal_direction_counts(n_directions=8, min_causal_effect=0.4, split_by_causal_effect_sign=True)
We can check the result by utility function.
labels = ['y0(t)', 'y1(t)', 'y2(t)', 'y3(t)', 'y4(t)', 'y0(t-1)', 'y1(t-1)', 'y2(t-1)', 'y3(t-1)', 'y4(t-1)', 'e0(t-1)', 'e1(t-1)', 'e2(t-1)', 'e3(t-1)', 'e4(t-1)']
print_causal_directions(cdc, 100, labels=labels)
y3(t) <--- y4(t-1) (b<0) (98.0%)
y3(t) <--- y1(t-1) (b>0) (98.0%)
y0(t) <--- y4(t-1) (b>0) (96.0%)
y1(t) <--- e4(t-1) (b>0) (91.0%)
y2(t) <--- e1(t-1) (b<0) (80.0%)
y4(t) <--- e2(t-1) (b>0) (71.0%)
y1(t) <--- e0(t-1) (b>0) (64.0%)
y2(t) <--- e4(t-1) (b<0) (62.0%)
Directed Acyclic Graphs
Also, using the get_directed_acyclic_graph_counts()
method, we can get the ranking of the DAGs extracted. In the following sample code, n_dags
option is limited to the dags of the top 3 rankings, and min_causal_effect
option is limited to causal directions with a coefficient of 0.3 or more.
dagc = result.get_directed_acyclic_graph_counts(n_dags=3, min_causal_effect=0.3, split_by_causal_effect_sign=True)
We can check the result by utility function.
print_dagc(dagc, 100, labels=labels)
DAG[0]: 1.0%
y0(t) <--- y1(t) (b<0)
y0(t) <--- y1(t-1) (b<0)
y0(t) <--- y2(t-1) (b<0)
y0(t) <--- y4(t-1) (b>0)
y0(t) <--- e0(t-1) (b>0)
y0(t) <--- e1(t-1) (b<0)
y0(t) <--- e3(t-1) (b>0)
y0(t) <--- e4(t-1) (b>0)
y1(t) <--- y2(t) (b>0)
y1(t) <--- y2(t-1) (b>0)
y1(t) <--- y3(t-1) (b>0)
y1(t) <--- y4(t-1) (b<0)
y1(t) <--- e0(t-1) (b>0)
y1(t) <--- e2(t-1) (b>0)
y1(t) <--- e3(t-1) (b>0)
y1(t) <--- e4(t-1) (b>0)
y2(t) <--- y4(t-1) (b>0)
y2(t) <--- e0(t-1) (b<0)
y2(t) <--- e2(t-1) (b<0)
y2(t) <--- e3(t-1) (b<0)
y2(t) <--- e4(t-1) (b<0)
y3(t) <--- y1(t) (b<0)
y3(t) <--- y2(t) (b>0)
y3(t) <--- y1(t-1) (b>0)
y3(t) <--- y2(t-1) (b>0)
y3(t) <--- y3(t-1) (b>0)
y3(t) <--- y4(t-1) (b<0)
y3(t) <--- e0(t-1) (b>0)
y3(t) <--- e2(t-1) (b>0)
y3(t) <--- e3(t-1) (b>0)
y3(t) <--- e4(t-1) (b>0)
y4(t) <--- y0(t) (b>0)
y4(t) <--- y1(t) (b>0)
y4(t) <--- y3(t) (b>0)
y4(t) <--- y1(t-1) (b>0)
y4(t) <--- y2(t-1) (b>0)
y4(t) <--- y3(t-1) (b>0)
y4(t) <--- y4(t-1) (b<0)
y4(t) <--- e0(t-1) (b<0)
y4(t) <--- e1(t-1) (b>0)
y4(t) <--- e2(t-1) (b>0)
y4(t) <--- e3(t-1) (b<0)
y4(t) <--- e4(t-1) (b<0)
DAG[1]: 1.0%
y0(t) <--- y1(t-1) (b<0)
y0(t) <--- y2(t-1) (b<0)
y0(t) <--- y4(t-1) (b>0)
y0(t) <--- e0(t-1) (b>0)
y1(t) <--- y3(t) (b<0)
y1(t) <--- y0(t-1) (b<0)
y1(t) <--- y1(t-1) (b>0)
y1(t) <--- e0(t-1) (b>0)
y1(t) <--- e1(t-1) (b>0)
y1(t) <--- e4(t-1) (b>0)
y2(t) <--- y1(t) (b>0)
y2(t) <--- y3(t) (b>0)
y2(t) <--- y4(t-1) (b>0)
y2(t) <--- e0(t-1) (b<0)
y2(t) <--- e1(t-1) (b<0)
y2(t) <--- e3(t-1) (b>0)
y2(t) <--- e4(t-1) (b<0)
y3(t) <--- y0(t) (b>0)
y3(t) <--- y1(t-1) (b>0)
y3(t) <--- y4(t-1) (b<0)
y3(t) <--- e0(t-1) (b<0)
y3(t) <--- e1(t-1) (b>0)
y3(t) <--- e2(t-1) (b>0)
y4(t) <--- y0(t) (b>0)
y4(t) <--- y0(t-1) (b<0)
y4(t) <--- y1(t-1) (b>0)
y4(t) <--- y4(t-1) (b<0)
y4(t) <--- e0(t-1) (b<0)
y4(t) <--- e1(t-1) (b>0)
y4(t) <--- e2(t-1) (b>0)
DAG[2]: 1.0%
y0(t) <--- y1(t-1) (b<0)
y0(t) <--- y2(t-1) (b<0)
y0(t) <--- y4(t-1) (b>0)
y0(t) <--- e0(t-1) (b>0)
y1(t) <--- y0(t) (b<0)
y1(t) <--- y2(t) (b>0)
y1(t) <--- y4(t-1) (b>0)
y1(t) <--- e0(t-1) (b>0)
y1(t) <--- e1(t-1) (b>0)
y1(t) <--- e2(t-1) (b>0)
y1(t) <--- e3(t-1) (b>0)
y1(t) <--- e4(t-1) (b>0)
y2(t) <--- y0(t) (b<0)
y2(t) <--- y0(t-1) (b<0)
y2(t) <--- y4(t-1) (b>0)
y2(t) <--- e1(t-1) (b<0)
y2(t) <--- e2(t-1) (b<0)
y2(t) <--- e3(t-1) (b<0)
y2(t) <--- e4(t-1) (b<0)
y3(t) <--- y1(t) (b<0)
y3(t) <--- y2(t) (b>0)
y3(t) <--- y1(t-1) (b>0)
y3(t) <--- y2(t-1) (b>0)
y3(t) <--- y3(t-1) (b>0)
y3(t) <--- y4(t-1) (b<0)
y3(t) <--- e1(t-1) (b>0)
y3(t) <--- e2(t-1) (b>0)
y3(t) <--- e3(t-1) (b>0)
y3(t) <--- e4(t-1) (b>0)
y4(t) <--- y0(t) (b>0)
y4(t) <--- y1(t) (b>0)
y4(t) <--- y3(t) (b>0)
y4(t) <--- y1(t-1) (b>0)
y4(t) <--- y2(t-1) (b>0)
y4(t) <--- y4(t-1) (b<0)
y4(t) <--- e0(t-1) (b<0)
y4(t) <--- e2(t-1) (b>0)
y4(t) <--- e4(t-1) (b<0)
Probability
Using the get_probabilities()
method, we can get the probability of bootstrapping.
prob = result.get_probabilities(min_causal_effect=0.1)
print('Probability of psi0:\n', prob[0])
print('Probability of psi1:\n', prob[1])
print('Probability of omega1:\n', prob[2])
Probability of psi0:
[[0. 0.26 0.46 0.26 0.3 ]
[0.53 0. 0.54 0.37 0.33]
[0.22 0.41 0. 0.38 0.12]
[0.6 0.62 0.35 0. 0.38]
[0.69 0.28 0.27 0.43 0. ]]
Probability of psi1:
[[0.41 1. 1. 0.2 1. ]
[0.64 0.63 0.6 0.83 0.85]
[0.76 0.69 0.64 0.47 0.95]
[0.55 1. 0.71 0.73 1. ]
[0.94 0.79 0.71 0.53 0.74]]
Probability of omega1:
[[0.76 0.35 0.54 0.43 0.47]
[0.87 0.77 0.63 0.5 1. ]
[0.63 0.95 0.66 0.84 0.93]
[0.41 0.85 0.88 0.68 0.49]
[0.66 0.81 0.92 0.58 0.69]]
Total Causal Effects
Using the get_total causal_effects()
method, we can get the list of
total causal effect. The total causal effects we can get are dictionary
type variable. We can display the list nicely by assigning it to
pandas.DataFrame. Also, we have replaced the variable index with a label
below.
causal_effects = result.get_total_causal_effects(min_causal_effect=0.01)
df = pd.DataFrame(causal_effects)
df['from'] = df['from'].apply(lambda x : labels[x])
df['to'] = df['to'].apply(lambda x : labels[x])
df
from | to | effect | probability | |
---|---|---|---|---|
0 | y0(t-1) | y4(t) | -0.454092 | 1.00 |
1 | y4(t-1) | y3(t) | -0.593869 | 1.00 |
2 | y1(t-1) | y3(t) | 0.514145 | 1.00 |
3 | y1(t-1) | y0(t) | -0.357521 | 1.00 |
4 | y2(t-1) | y0(t) | -0.443562 | 1.00 |
5 | y4(t-1) | y0(t) | 0.573678 | 1.00 |
6 | y4(t-1) | y2(t) | 0.360151 | 0.97 |
7 | y4(t-1) | y4(t) | 0.213879 | 0.96 |
8 | y1(t-1) | y1(t) | 0.206331 | 0.96 |
9 | y4(t-1) | y1(t) | 0.235616 | 0.95 |
10 | y0(t-1) | y1(t) | -0.364361 | 0.95 |
11 | y3(t-1) | y1(t) | 0.250602 | 0.94 |
12 | y0(t-1) | y2(t) | -0.267122 | 0.94 |
13 | y2(t-1) | y1(t) | 0.221743 | 0.93 |
14 | y2(t-1) | y4(t) | -0.213938 | 0.92 |
15 | y1(t-1) | y4(t) | 0.095743 | 0.92 |
16 | y0(t-1) | y3(t) | 0.177089 | 0.91 |
17 | y1(t-1) | y2(t) | 0.135946 | 0.90 |
18 | y3(t-1) | y3(t) | 0.150796 | 0.88 |
19 | y2(t-1) | y3(t) | -0.021971 | 0.84 |
20 | y3(t-1) | y4(t) | 0.170749 | 0.79 |
21 | y2(t-1) | y2(t) | -0.137767 | 0.77 |
22 | y0(t-1) | y0(t) | -0.094192 | 0.75 |
23 | y0(t) | y4(t) | 0.934934 | 0.70 |
24 | y3(t-1) | y2(t) | 0.141032 | 0.66 |
25 | y0(t) | y3(t) | 0.636926 | 0.63 |
26 | y1(t) | y3(t) | -0.296396 | 0.63 |
27 | y3(t-1) | y0(t) | -0.027274 | 0.63 |
28 | y0(t) | y1(t) | -0.469409 | 0.61 |
29 | y2(t) | y1(t) | 0.815024 | 0.59 |
30 | y2(t) | y3(t) | -0.102868 | 0.57 |
31 | y2(t) | y0(t) | -0.180943 | 0.53 |
32 | y2(t) | y4(t) | -0.054386 | 0.49 |
33 | y4(t) | y3(t) | 0.132928 | 0.45 |
34 | y3(t) | y4(t) | 0.453095 | 0.44 |
35 | y0(t) | y2(t) | -0.149761 | 0.42 |
36 | y4(t) | y1(t) | 0.119746 | 0.41 |
37 | y1(t) | y2(t) | 0.564823 | 0.41 |
38 | y3(t) | y1(t) | -0.706491 | 0.37 |
39 | y1(t) | y4(t) | -0.038562 | 0.37 |
40 | y3(t) | y2(t) | 0.111094 | 0.35 |
41 | y3(t) | y0(t) | 0.311717 | 0.34 |
42 | y1(t) | y0(t) | -0.300326 | 0.33 |
43 | y4(t) | y2(t) | 0.139237 | 0.32 |
44 | y4(t) | y0(t) | 0.405747 | 0.30 |
We can easily perform sorting operations with pandas.DataFrame.
df.sort_values('effect', ascending=False).head()
from | to | effect | probability | |
---|---|---|---|---|
23 | y0(t) | y4(t) | 0.934934 | 0.70 |
29 | y2(t) | y1(t) | 0.815024 | 0.59 |
25 | y0(t) | y3(t) | 0.636926 | 0.63 |
5 | y4(t-1) | y0(t) | 0.573678 | 1.00 |
37 | y1(t) | y2(t) | 0.564823 | 0.41 |
And with pandas.DataFrame, we can easily filter by keywords. The following code extracts the causal direction towards y2(t).
df[df['to']=='y2(t)'].head()
from | to | effect | probability | |
---|---|---|---|---|
6 | y4(t-1) | y2(t) | 0.360151 | 0.97 |
12 | y0(t-1) | y2(t) | -0.267122 | 0.94 |
17 | y1(t-1) | y2(t) | 0.135946 | 0.90 |
21 | y2(t-1) | y2(t) | -0.137767 | 0.77 |
24 | y3(t-1) | y2(t) | 0.141032 | 0.66 |
Because it holds the raw data of the causal effect (the original data for calculating the median), it is possible to draw a histogram of the values of the causal effect, as shown below.
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()
%matplotlib inline
from_index = 5 # index of y0(t-1). (index:0)+(n_features:5)*(lag:1) = 5
to_index = 2 # index of y2(t). (index:2)+(n_features:5)*(lag:0) = 2
plt.hist(result.total_effects_[:, to_index, from_index])