EvaluateModelFit
This example explains how to use lingam.utils.evaluate_model_fit. This function returns the mode fit of the given adjacency matrix to the data.
Import and settings
import numpy as np
import pandas as pd
from scipy.special import expit
import lingam
from lingam.utils import make_dot
print([np.__version__, pd.__version__, lingam.__version__])
import warnings
warnings.filterwarnings("ignore")
np.set_printoptions(precision=3, suppress=True)
np.random.seed(100)
['1.24.4', '2.0.3', '1.8.2']
When all variables are continuous data
Test data
x3 = np.random.uniform(size=1000)
x0 = 3.0*x3 + np.random.uniform(size=1000)
x2 = 6.0*x3 + np.random.uniform(size=1000)
x1 = 3.0*x0 + 2.0*x2 + np.random.uniform(size=1000)
x5 = 4.0*x0 + np.random.uniform(size=1000)
x4 = 8.0*x0 - 1.0*x2 + np.random.uniform(size=1000)
X = pd.DataFrame(np.array([x0, x1, x2, x3, x4, x5]).T ,columns=['x0', 'x1', 'x2', 'x3', 'x4', 'x5'])
X.head()
x0 | x1 | x2 | x3 | x4 | x5 | |
---|---|---|---|---|---|---|
0 | 1.657947 | 12.090323 | 3.519873 | 0.543405 | 10.182785 | 7.401408 |
1 | 1.217345 | 7.607388 | 1.693219 | 0.278369 | 8.758949 | 4.912979 |
2 | 2.226804 | 13.483555 | 3.201513 | 0.424518 | 15.398626 | 9.098729 |
3 | 2.756527 | 20.654225 | 6.037873 | 0.844776 | 16.795156 | 11.147294 |
4 | 0.319283 | 3.340782 | 0.727265 | 0.004719 | 2.343100 | 2.037974 |
Causal Discovery
Perform causal discovery to obtain the adjacency matrix.
model = lingam.DirectLiNGAM()
model.fit(X)
model.adjacency_matrix_
array([[ 0. , 0. , 0. , 2.994, 0. , 0. ],
[ 2.995, 0. , 1.993, 0. , 0. , 0. ],
[ 0. , 0. , 0. , 5.957, 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. , 0. ],
[ 7.998, 0. , -1.005, 0. , 0. , 0. ],
[ 3.98 , 0. , 0. , 0. , 0. , 0. ]])
Evaluation
Calculate the model fit of the given adjacency matrix to given data.
lingam.utils.evaluate_model_fit(model.adjacency_matrix_, X)
DoF | DoF Baseline | chi2 | chi2 p-value | chi2 Baseline | CFI | GFI | AGFI | NFI | TLI | RMSEA | AIC | BIC | LogLik | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Value | 16 | 16 | 997.342767 | 0.0 | 22997.243286 | 0.957298 | 0.956632 | 0.956632 | 0.956632 | 0.957298 | 0.247781 | 8.005314 | 32.544091 | 0.997343 |
Test data
x6 = np.random.uniform(size=1000)
x3 = 2.0*x6 + np.random.uniform(size=1000)
x0 = 0.5*x3 + np.random.uniform(size=1000)
x2 = 2.0*x6 + np.random.uniform(size=1000)
x1 = 0.5*x0 + 0.5*x2 + np.random.uniform(size=1000)
x5 = 0.5*x0 + np.random.uniform(size=1000)
x4 = 0.5*x0 - 0.5*x2 + np.random.uniform(size=1000)
# The latent variable x6 is not included.
X = pd.DataFrame(np.array([x0, x1, x2, x3, x4, x5]).T, columns=['x0', 'x1', 'x2', 'x3', 'x4', 'x5'])
X.head()
x0 | x1 | x2 | x3 | x4 | x5 | |
---|---|---|---|---|---|---|
0 | 0.978424 | 1.966955 | 1.219048 | 1.746943 | 0.761499 | 0.942972 |
1 | 1.164124 | 2.652780 | 2.153412 | 2.317986 | 0.427684 | 1.144585 |
2 | 1.160532 | 1.978590 | 0.919055 | 1.066110 | 0.603656 | 1.329139 |
3 | 1.502959 | 1.833784 | 1.748939 | 1.234851 | 0.447353 | 1.188017 |
4 | 1.948636 | 2.457468 | 1.535006 | 2.073317 | 0.501208 | 1.155161 |
Causal Discovery
nan represents having a hidden common cause.
model = lingam.BottomUpParceLiNGAM()
model.fit(X)
model.adjacency_matrix_
array([[ 0. , nan, 0. , nan, 0. , 0. ],
[ nan, 0. , 0. , nan, 0. , 0. ],
[-0.22 , 0.593, 0. , 0.564, 0. , 0. ],
[ nan, nan, 0. , 0. , 0. , 0. ],
[ 0.542, 0. , -0.529, 0. , 0. , 0. ],
[ 0.506, 0. , 0. , 0. , 0. , 0. ]])
lingam.utils.evaluate_model_fit(model.adjacency_matrix_, X)
DoF | DoF Baseline | chi2 | chi2 p-value | chi2 Baseline | CFI | GFI | AGFI | NFI | TLI | RMSEA | AIC | BIC | LogLik | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Value | 3 | 15 | 1673.491434 | 0.0 | 4158.502617 | 0.596841 | 0.597574 | -1.012132 | 0.597574 | -1.015796 | 0.746584 | 32.653017 | 120.992612 | 1.673491 |
When the data has ordinal variables
Test data
x3 = np.random.uniform(size=1000)
x0 = 0.6*x3 + np.random.uniform(size=1000)
# discrete
x2 = 1.2*x3 + np.random.uniform(size=1000)
x2 = expit(x2 - np.mean(x2))
vec_func = np.vectorize(lambda p: np.random.choice([0, 1], p=[p, 1 - p]))
x2 = vec_func(x2)
x1 = 0.6*x0 + 0.4*x2 + np.random.uniform(size=1000)
x5 = 0.8*x0 + np.random.uniform(size=1000)
x4 = 1.6*x0 - 0.2*x2 + np.random.uniform(size=1000)
X = pd.DataFrame(np.array([x0, x1, x2, x3, x4, x5]).T ,columns=['x0', 'x1', 'x2', 'x3', 'x4', 'x5'])
X.head()
x0 | x1 | x2 | x3 | x4 | x5 | |
---|---|---|---|---|---|---|
0 | 0.471823 | 1.426239 | 1.0 | 0.129133 | 1.535926 | 0.567324 |
1 | 0.738933 | 1.723219 | 1.0 | 0.327512 | 1.806484 | 1.056211 |
2 | 1.143877 | 1.962664 | 1.0 | 0.538189 | 2.075554 | 1.865132 |
3 | 0.326486 | 0.946426 | 1.0 | 0.302415 | 0.675984 | 0.857528 |
4 | 0.942822 | 0.882616 | 0.0 | 0.529399 | 2.002522 | 1.063416 |
adjacency_matrix = np.array([
[0.0, 0.0, 0.0, 0.6, 0.0, 0.0],
[0.6, 0.0, 0.4, 0.0, 0.0, 0.0],
[0.0, 0.0, 0.0, 1.2, 0.0, 0.0],
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
[1.6, 0.0,-0.2, 0.0, 0.0, 0.0],
[0.8, 0.0, 0.0, 0.0, 0.0, 0.0]]
)
Specify whether each variable is an ordinal variable in is_ordinal.
lingam.utils.evaluate_model_fit(adjacency_matrix, X, is_ordinal=[0, 0, 1, 0, 0, 0])
DoF | DoF Baseline | chi2 | chi2 p-value | chi2 Baseline | CFI | GFI | AGFI | NFI | TLI | RMSEA | AIC | BIC | LogLik | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Value | 16 | 16 | 2239.89739 | 0.0 | 2733.058196 | 0.181505 | 0.180443 | 0.180443 | 0.180443 | 0.181505 | 0.373005 | 5.520205 | 30.058982 | 2.239897 |