LiM

Model

Linear Mixed (LiM) causal discovery algorithm [1] extends LiNGAM to handle the mixed data that consists of both continuous and discrete variables. The estimation is performed by first globally optimizing the log-likelihood function on the joint distribution of data with the acyclicity constraint, and then applying a local combinatorial search to output a causal graph.

This method makes the following assumptions.

  1. Continous variables and binary variables.
  2. Linearity
  3. Acyclicity
  4. No hidden common causes
  5. Baselines are the same when predicting one binary variable from the other for every pair of binary variables.

References

[1]Y. Zeng, S. Shimizu, H. Matsui, F. Sun. Causal discovery for linear mixed data. In Proc. First Conference on Causal Learning and Reasoning (CLeaR2022). PMLR 177, pp. 994-1009, 2022.

Import and settings

In this example, we need to import numpy, and random, in addition to lingam.

import numpy as np
import random
import lingam
import lingam.utils as ut

print([np.__version__, lingam.__version__])
['1.20.3', '1.6.0']

Test data

First, we generate a causal structure with 2 variables, where one of them is randomly set to be a discrete variable.

ut.set_random_seed(1)
n_samples, n_features, n_edges, graph_type, sem_type = 1000, 2, 1, 'ER', 'mixed_random_i_dis'
B_true = ut.simulate_dag(n_features, n_edges, graph_type)
W_true = ut.simulate_parameter(B_true)  # row to column

no_dis = np.random.randint(1, n_features)  # number of discrete vars.
print('There are %d discrete variable(s).' % (no_dis))
nodes = [iii for iii in range(n_features)]
dis_var = random.sample(nodes, no_dis) # randomly select no_dis discrete variables
dis_con = np.full((1, n_features), np.inf)
for iii in range(n_features):
    if iii in dis_var:
        dis_con[0, iii] = 0  # 1:continuous;   0:discrete
    else:
        dis_con[0, iii] = 1

X = ut.simulate_linear_mixed_sem(W_true, n_samples, sem_type, dis_con)

print('The true adjacency matrix is:\n', W_true)
There are 1 discrete variable(s).
The true adjacency matrix is:
[[0.        0.       ]
[1.3082251 0.       ]]

Causal Discovery for linear mixed data

To run causal discovery, we create a LiM object and call the fit method.

model = lingam.LiM()
model.fit(X, dis_con)
<lingam.lim.LiM at 0x174d475f850>

Using the _adjacency_matrix properties, we can see the estimated adjacency matrix between mixed variables.

print('The estimated adjacency matrix is:\n', model._adjacency_matrix)
The estimated adjacency matrix is:
[[ 0.        ,  0.        ],
 [-1.09938457,  0.        ]]