EvaluateModelFit

This example explains how to use lingam.utils.evaluate_model_fit. This function returns the mode fit of the given adjacency matrix to the data.

Import and settings

import numpy as np
import pandas as pd
from scipy.special import expit
import lingam
from lingam.utils import make_dot

print([np.__version__, pd.__version__, lingam.__version__])

import warnings
warnings.filterwarnings("ignore")

np.set_printoptions(precision=3, suppress=True)
np.random.seed(100)
['1.24.4', '2.0.3', '1.8.2']

When all variables are continuous data

Test data

x3 = np.random.uniform(size=1000)
x0 = 3.0*x3 + np.random.uniform(size=1000)
x2 = 6.0*x3 + np.random.uniform(size=1000)
x1 = 3.0*x0 + 2.0*x2 + np.random.uniform(size=1000)
x5 = 4.0*x0 + np.random.uniform(size=1000)
x4 = 8.0*x0 - 1.0*x2 + np.random.uniform(size=1000)
X = pd.DataFrame(np.array([x0, x1, x2, x3, x4, x5]).T ,columns=['x0', 'x1', 'x2', 'x3', 'x4', 'x5'])
X.head()
x0 x1 x2 x3 x4 x5
0 1.657947 12.090323 3.519873 0.543405 10.182785 7.401408
1 1.217345 7.607388 1.693219 0.278369 8.758949 4.912979
2 2.226804 13.483555 3.201513 0.424518 15.398626 9.098729
3 2.756527 20.654225 6.037873 0.844776 16.795156 11.147294
4 0.319283 3.340782 0.727265 0.004719 2.343100 2.037974

Causal Discovery

Perform causal discovery to obtain the adjacency matrix.

model = lingam.DirectLiNGAM()
model.fit(X)
model.adjacency_matrix_
array([[ 0.   ,  0.   ,  0.   ,  2.994,  0.   ,  0.   ],
       [ 2.995,  0.   ,  1.993,  0.   ,  0.   ,  0.   ],
       [ 0.   ,  0.   ,  0.   ,  5.957,  0.   ,  0.   ],
       [ 0.   ,  0.   ,  0.   ,  0.   ,  0.   ,  0.   ],
       [ 7.998,  0.   , -1.005,  0.   ,  0.   ,  0.   ],
       [ 3.98 ,  0.   ,  0.   ,  0.   ,  0.   ,  0.   ]])

Evaluation

Calculate the model fit of the given adjacency matrix to given data.

lingam.utils.evaluate_model_fit(model.adjacency_matrix_, X)
DoF DoF Baseline chi2 chi2 p-value chi2 Baseline CFI GFI AGFI NFI TLI RMSEA AIC BIC LogLik
Value 16 16 997.342767 0.0 22997.243286 0.957298 0.956632 0.956632 0.956632 0.957298 0.247781 8.005314 32.544091 0.997343

When the data has hidden common causes

Test data

x6 = np.random.uniform(size=1000)
x3 = 2.0*x6 + np.random.uniform(size=1000)
x0 = 0.5*x3 + np.random.uniform(size=1000)
x2 = 2.0*x6 + np.random.uniform(size=1000)
x1 = 0.5*x0 + 0.5*x2 + np.random.uniform(size=1000)
x5 = 0.5*x0 + np.random.uniform(size=1000)
x4 = 0.5*x0 - 0.5*x2 + np.random.uniform(size=1000)

# The latent variable x6 is not included.
X = pd.DataFrame(np.array([x0, x1, x2, x3, x4, x5]).T, columns=['x0', 'x1', 'x2', 'x3', 'x4', 'x5'])

X.head()
x0 x1 x2 x3 x4 x5
0 0.978424 1.966955 1.219048 1.746943 0.761499 0.942972
1 1.164124 2.652780 2.153412 2.317986 0.427684 1.144585
2 1.160532 1.978590 0.919055 1.066110 0.603656 1.329139
3 1.502959 1.833784 1.748939 1.234851 0.447353 1.188017
4 1.948636 2.457468 1.535006 2.073317 0.501208 1.155161

Causal Discovery

nan represents having a hidden common cause.

model = lingam.BottomUpParceLiNGAM()
model.fit(X)
model.adjacency_matrix_
array([[ 0.   ,    nan,  0.   ,    nan,  0.   ,  0.   ],
       [   nan,  0.   ,  0.   ,    nan,  0.   ,  0.   ],
       [-0.22 ,  0.593,  0.   ,  0.564,  0.   ,  0.   ],
       [   nan,    nan,  0.   ,  0.   ,  0.   ,  0.   ],
       [ 0.542,  0.   , -0.529,  0.   ,  0.   ,  0.   ],
       [ 0.506,  0.   ,  0.   ,  0.   ,  0.   ,  0.   ]])
lingam.utils.evaluate_model_fit(model.adjacency_matrix_, X)
DoF DoF Baseline chi2 chi2 p-value chi2 Baseline CFI GFI AGFI NFI TLI RMSEA AIC BIC LogLik
Value 3 15 1673.491434 0.0 4158.502617 0.596841 0.597574 -1.012132 0.597574 -1.015796 0.746584 32.653017 120.992612 1.673491

When the data has ordinal variables

Test data

x3 = np.random.uniform(size=1000)
x0 = 0.6*x3 + np.random.uniform(size=1000)

# discrete
x2 = 1.2*x3 + np.random.uniform(size=1000)
x2 = expit(x2 - np.mean(x2))
vec_func = np.vectorize(lambda p: np.random.choice([0, 1], p=[p, 1 - p]))
x2 = vec_func(x2)

x1 = 0.6*x0 + 0.4*x2 + np.random.uniform(size=1000)
x5 = 0.8*x0 + np.random.uniform(size=1000)
x4 = 1.6*x0 - 0.2*x2 + np.random.uniform(size=1000)
X = pd.DataFrame(np.array([x0, x1, x2, x3, x4, x5]).T ,columns=['x0', 'x1', 'x2', 'x3', 'x4', 'x5'])
X.head()
x0 x1 x2 x3 x4 x5
0 0.471823 1.426239 1.0 0.129133 1.535926 0.567324
1 0.738933 1.723219 1.0 0.327512 1.806484 1.056211
2 1.143877 1.962664 1.0 0.538189 2.075554 1.865132
3 0.326486 0.946426 1.0 0.302415 0.675984 0.857528
4 0.942822 0.882616 0.0 0.529399 2.002522 1.063416

adjacency_matrix = np.array([
    [0.0, 0.0, 0.0, 0.6, 0.0, 0.0],
    [0.6, 0.0, 0.4, 0.0, 0.0, 0.0],
    [0.0, 0.0, 0.0, 1.2, 0.0, 0.0],
    [0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
    [1.6, 0.0,-0.2, 0.0, 0.0, 0.0],
    [0.8, 0.0, 0.0, 0.0, 0.0, 0.0]]
)

Specify whether each variable is an ordinal variable in is_ordinal.

lingam.utils.evaluate_model_fit(adjacency_matrix, X, is_ordinal=[0, 0, 1, 0, 0, 0])
DoF DoF Baseline chi2 chi2 p-value chi2 Baseline CFI GFI AGFI NFI TLI RMSEA AIC BIC LogLik
Value 16 16 2239.89739 0.0 2733.058196 0.181505 0.180443 0.180443 0.180443 0.181505 0.373005 5.520205 30.058982 2.239897