EvaluateModelFit

This example explains how to use lingam.utils.evaluate_model_fit. This function returns the mode fit of the given adjacency matrix to the data.

Import and settings

import numpy as np
import pandas as pd
from scipy.special import expit
import lingam
from lingam.utils import make_dot

print([np.__version__, pd.__version__, lingam.__version__])

import warnings
warnings.filterwarnings("ignore")

np.set_printoptions(precision=3, suppress=True)
np.random.seed(100)

['1.24.4', '2.0.3', '1.8.2']

When all variables are continuous data

Test data

x3 = np.random.uniform(size=1000)
x0 = 3.0*x3 + np.random.uniform(size=1000)
x2 = 6.0*x3 + np.random.uniform(size=1000)
x1 = 3.0*x0 + 2.0*x2 + np.random.uniform(size=1000)
x5 = 4.0*x0 + np.random.uniform(size=1000)
x4 = 8.0*x0 - 1.0*x2 + np.random.uniform(size=1000)
X = pd.DataFrame(np.array([x0, x1, x2, x3, x4, x5]).T ,columns=['x0', 'x1', 'x2', 'x3', 'x4', 'x5'])
X.head()

	x0	x1	x2	x3	x4	x5
0	1.657947	12.090323	3.519873	0.543405	10.182785	7.401408
1	1.217345	7.607388	1.693219	0.278369	8.758949	4.912979
2	2.226804	13.483555	3.201513	0.424518	15.398626	9.098729
3	2.756527	20.654225	6.037873	0.844776	16.795156	11.147294
4	0.319283	3.340782	0.727265	0.004719	2.343100	2.037974

Causal Discovery

Perform causal discovery to obtain the adjacency matrix.

model = lingam.DirectLiNGAM()
model.fit(X)
model.adjacency_matrix_

array([[ 0.   ,  0.   ,  0.   ,  2.994,  0.   ,  0.   ],
       [ 2.995,  0.   ,  1.993,  0.   ,  0.   ,  0.   ],
       [ 0.   ,  0.   ,  0.   ,  5.957,  0.   ,  0.   ],
       [ 0.   ,  0.   ,  0.   ,  0.   ,  0.   ,  0.   ],
       [ 7.998,  0.   , -1.005,  0.   ,  0.   ,  0.   ],
       [ 3.98 ,  0.   ,  0.   ,  0.   ,  0.   ,  0.   ]])

Evaluation

Calculate the model fit of the given adjacency matrix to given data.

lingam.utils.evaluate_model_fit(model.adjacency_matrix_, X)

	DoF	DoF Baseline	chi2	chi2 p-value	chi2 Baseline	CFI	GFI	AGFI	NFI	TLI	RMSEA	AIC	BIC	LogLik
Value	16	16	997.342767	0.0	22997.243286	0.957298	0.956632	0.956632	0.956632	0.957298	0.247781	8.005314	32.544091	0.997343

When the data has hidden common causes

Test data

x6 = np.random.uniform(size=1000)
x3 = 2.0*x6 + np.random.uniform(size=1000)
x0 = 0.5*x3 + np.random.uniform(size=1000)
x2 = 2.0*x6 + np.random.uniform(size=1000)
x1 = 0.5*x0 + 0.5*x2 + np.random.uniform(size=1000)
x5 = 0.5*x0 + np.random.uniform(size=1000)
x4 = 0.5*x0 - 0.5*x2 + np.random.uniform(size=1000)

# The latent variable x6 is not included.
X = pd.DataFrame(np.array([x0, x1, x2, x3, x4, x5]).T, columns=['x0', 'x1', 'x2', 'x3', 'x4', 'x5'])

X.head()

	x0	x1	x2	x3	x4	x5
0	0.978424	1.966955	1.219048	1.746943	0.761499	0.942972
1	1.164124	2.652780	2.153412	2.317986	0.427684	1.144585
2	1.160532	1.978590	0.919055	1.066110	0.603656	1.329139
3	1.502959	1.833784	1.748939	1.234851	0.447353	1.188017
4	1.948636	2.457468	1.535006	2.073317	0.501208	1.155161

Causal Discovery

nan represents having a hidden common cause.

model = lingam.BottomUpParceLiNGAM()
model.fit(X)
model.adjacency_matrix_

array([[ 0.   ,    nan,  0.   ,    nan,  0.   ,  0.   ],
       [   nan,  0.   ,  0.   ,    nan,  0.   ,  0.   ],
       [-0.22 ,  0.593,  0.   ,  0.564,  0.   ,  0.   ],
       [   nan,    nan,  0.   ,  0.   ,  0.   ,  0.   ],
       [ 0.542,  0.   , -0.529,  0.   ,  0.   ,  0.   ],
       [ 0.506,  0.   ,  0.   ,  0.   ,  0.   ,  0.   ]])

lingam.utils.evaluate_model_fit(model.adjacency_matrix_, X)

	DoF	DoF Baseline	chi2	chi2 p-value	chi2 Baseline	CFI	GFI	AGFI	NFI	TLI	RMSEA	AIC	BIC	LogLik
Value	3	15	1673.491434	0.0	4158.502617	0.596841	0.597574	-1.012132	0.597574	-1.015796	0.746584	32.653017	120.992612	1.673491

When the data has ordinal variables

Test data

x3 = np.random.uniform(size=1000)
x0 = 0.6*x3 + np.random.uniform(size=1000)

# discrete
x2 = 1.2*x3 + np.random.uniform(size=1000)
x2 = expit(x2 - np.mean(x2))
vec_func = np.vectorize(lambda p: np.random.choice([0, 1], p=[p, 1 - p]))
x2 = vec_func(x2)

x1 = 0.6*x0 + 0.4*x2 + np.random.uniform(size=1000)
x5 = 0.8*x0 + np.random.uniform(size=1000)
x4 = 1.6*x0 - 0.2*x2 + np.random.uniform(size=1000)
X = pd.DataFrame(np.array([x0, x1, x2, x3, x4, x5]).T ,columns=['x0', 'x1', 'x2', 'x3', 'x4', 'x5'])
X.head()

	x0	x1	x2	x3	x4	x5
0	0.471823	1.426239	1.0	0.129133	1.535926	0.567324
1	0.738933	1.723219	1.0	0.327512	1.806484	1.056211
2	1.143877	1.962664	1.0	0.538189	2.075554	1.865132
3	0.326486	0.946426	1.0	0.302415	0.675984	0.857528
4	0.942822	0.882616	0.0	0.529399	2.002522	1.063416

adjacency_matrix = np.array([
    [0.0, 0.0, 0.0, 0.6, 0.0, 0.0],
    [0.6, 0.0, 0.4, 0.0, 0.0, 0.0],
    [0.0, 0.0, 0.0, 1.2, 0.0, 0.0],
    [0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
    [1.6, 0.0,-0.2, 0.0, 0.0, 0.0],
    [0.8, 0.0, 0.0, 0.0, 0.0, 0.0]]
)

Specify whether each variable is an ordinal variable in is_ordinal.

lingam.utils.evaluate_model_fit(adjacency_matrix, X, is_ordinal=[0, 0, 1, 0, 0, 0])

	DoF	DoF Baseline	chi2	chi2 p-value	chi2 Baseline	CFI	GFI	AGFI	NFI	TLI	RMSEA	AIC	BIC	LogLik
Value	16	16	2239.89739	0.0	2733.058196	0.181505	0.180443	0.180443	0.180443	0.181505	0.373005	5.520205	30.058982	2.239897

Read the Docs v: latest

Versions: latest; stable

Downloads: pdf; html; epub

On Read the Docs: Project Home; Builds