# LiM

## Model

Linear Mixed (LiM) causal discovery algorithm [1] extends LiNGAM to handle the mixed data that consists of both continuous and discrete variables. The estimation is performed by first globally optimizing the log-likelihood function on the joint distribution of data with the acyclicity constraint, and then applying a local combinatorial search to output a causal graph.

This method is based on the LiM model as shown below,

i) As for the continuous variable, its value assigned to each of \(x_i\) is a linear function of its parent variables denoted by \(x_{\mathrm{pa}(i)}\) plus a non-Gaussian error term \(e_i\), that is,

where the error terms \(e_i\) are continuous random variables with non-Gaussian densities, and the error variables \(e_i\) are independent of each other. The coefficients \(b_{ij}\) and intercepts \(c_i\) are constants.

ii) As for the discrete variable, its value equals 1 if the linear function of its parent variables \(x_{\mathrm{pa}(i)}\) plus a Logistic error term \(e_i\) is larger than 0, otherwise, its value equals 0. That is,

where the error terms \(e_i\) follow the Logistic distribution, while the other notations are identical to those in continuous variables.

This method makes the following assumptions.

Continous variables and binary variables.

Linearity

Acyclicity

No hidden common causes

Baselines are the same when predicting one binary variable from the other for every pair of binary variables.

References

## Import and settings

In this example, we need to import `numpy`

, and `random`

,
in addition to `lingam`

.

```
import numpy as np
import random
import lingam
import lingam.utils as ut
print([np.__version__, lingam.__version__])
```

```
['1.20.3', '1.6.0']
```

## Test data

First, we generate a causal structure with 2 variables, where one of them is randomly set to be a discrete variable.

```
ut.set_random_seed(1)
n_samples, n_features, n_edges, graph_type, sem_type = 1000, 2, 1, 'ER', 'mixed_random_i_dis'
B_true = ut.simulate_dag(n_features, n_edges, graph_type)
W_true = ut.simulate_parameter(B_true) # row to column
no_dis = np.random.randint(1, n_features) # number of discrete vars.
print('There are %d discrete variable(s).' % (no_dis))
nodes = [iii for iii in range(n_features)]
dis_var = random.sample(nodes, no_dis) # randomly select no_dis discrete variables
dis_con = np.full((1, n_features), np.inf)
for iii in range(n_features):
if iii in dis_var:
dis_con[0, iii] = 0 # 1:continuous; 0:discrete
else:
dis_con[0, iii] = 1
X = ut.simulate_linear_mixed_sem(W_true, n_samples, sem_type, dis_con)
print('The true adjacency matrix is:\n', W_true)
```

```
There are 1 discrete variable(s).
The true adjacency matrix is:
[[0. 0. ]
[1.3082251 0. ]]
```

## Causal Discovery for linear mixed data

To run causal discovery, we create a `LiM`

object and call the `fit`

method.

```
model = lingam.LiM()
model.fit(X, dis_con, only_global=True)
```

```
<lingam.lim.LiM at 0x174d475f850>
```

Using the `_adjacency_matrix`

properties, we can see the estimated adjacency matrix between mixed variables.

```
print('The estimated adjacency matrix is:\n', model._adjacency_matrix)
```

```
The estimated adjacency matrix is:
[[ 0. , 0. ],
[-1.09938457, 0. ]]
```