> For the complete documentation index, see [llms.txt](https://docs.causalchamber.ai/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.causalchamber.ai/case-studies/causal-inference/generating-real-data-with-a-known-causal-structure.md).

# Generating real data with a known causal structure

The [Light Tunnel Mk2](/the-chambers/light-tunnel-mk2.md) and the [Wind Tunnel Mk2](#wind-tunnel-mk2) come with a causal ground truth, built from background knowledge and empirically validated using [randomized experiments](https://cchamber-box.s3.eu-central-2.amazonaws.com/nature_paper_appendices.pdf#page=21). We express this ground truth as a *map of effects*, a directed graph showing how the chamber variables affect each other.

{% tabs %}
{% tab title="Light Tunnel Mk2" %}

<figure><img src="/files/iB6bZdiW4lSaqPJbS5xu" alt=""><figcaption><p>Right click to download the image (available under a <a href="https://creativecommons.org/licenses/by-nc/4.0/">CC BY-NC 4.0</a> non-commercial license).</p></figcaption></figure>

For an exhaustive description of each edge and a visualization of the corresponding effect, see the [documentation](/the-chambers/light-tunnel-mk2.md#red-green-blue-ir_1-2-3-vis_1-2-3) for this chamber.
{% endtab %}

{% tab title="Wind Tunnel Mk2" %}

<figure><img src="/files/4DeGPD0e257KvY2sPis3" alt=""><figcaption><p>Right click to download the image (available under a <a href="https://creativecommons.org/licenses/by-nc/4.0/">CC BY-NC 4.0</a> non-commercial license).</p></figcaption></figure>

For an exhaustive description of each edge and a visualization of the corresponding effect, see the [documentation](/the-chambers/wind-tunnel-mk2.md#load_in-out-rpm_in-out-current_in-out-current_in-out_raw) for this chamber.
{% endtab %}
{% endtabs %}

### Using the ground truth graph

{% hint style="warning" %}
Because these graphs describe effects in a real system, there are some things you need to consider before using them as a causal ground truth. Read below!
{% endhint %}

The graphs above can be interpreted as a causal ground truth, as formalized in Gamella et al. (2025, [Appendix V](https://cchamber-box.s3.eu-central-2.amazonaws.com/nature_paper_appendices.pdf#page=21)), i.e., an edge X $$\to$$ Y signifies that—for some value of the other chamber inputs—an intervention on X will change the distribution of subsequent measurements of Y. This is the statement that we empirically validate, giving us a common framework for both instantaneous and time-lagged effects. There are two things you should consider:

1. **The graph should not be taken as a graphical model** of statistical dependencies, as external influences on the system may create additional dependencies between variables. These are documented [here](/the-chambers/light-tunnel-mk2.md#external-influences) and [here](/the-chambers/wind-tunnel-mk2.md#external-influences).
2. We empirically validate each edge in the graph using a randomized controlled trial (RCT) with large sample sizes. This allows us to verify that an edge exists, i.e., there is a significant effect between two variables. On the other hand, the absence of an edge between two variables does not preclude the existence of a causal effect between them; it simply means we could not find a significant effect.

Ultimately, such issues arise with any real, non-simulated system. They are a symptom of evaluating learned causal models by comparing them with a "gold-standard model", i.e., validating a model with another model. This only makes sense in a computer simulation, where the data-generating model *is* *the truth*. A more robust (and natural) way to evaluate a learned causal model is to verify its interventional predictions directly—see the section on [generating interventional data](#generating-interventional-data).

### Generating data: an example

So, how do we generate data from these causal structures? Here is a general recipe to sample from the complete graph or any subgraph of your choice. As the running example for the rest of this guide, we will focus on the following subgraph from the [Light Tunnel Mk2](/the-chambers/light-tunnel-mk2.md).

<figure><img src="/files/LdYjubSyTDlndfmx8XL6" alt="" width="563"><figcaption><p>The running example for this section.</p></figcaption></figure>

Here's how it works.

{% stepper %}
{% step %}

#### **Select your variables**

Choose the subset of variables that should be part of the dataset. If you are sampling from one of the standard [hardware configurations](/the-chambers/how-they-work.md#hardware-configurations) (i.e., exogenous inputs & sensor parameters), then the map of effects directly becomes the [induced subgraph](https://en.wikipedia.org/wiki/Induced_subgraph) of these variables. See [Adding causal effects](#adding-causal-effects) to sample from more complex causal structures.

> Example: in the graph above, our variables are `green`, `pol_1`, `ir_1` and `ir_3`
> {% endstep %}

{% step %}

#### **Define a distribution over your inputs**

Now, define the distribution or stochastic process from which you will sample your inputs. You can sample them independently or from an SCM, introducing additional dependencies. Check the variable table of the corresponding [hardware configuration](/the-chambers/how-they-work.md#hardware-configurations) for the valid values.

> Example: after checking the [variables table](https://cchamber-box.s3.eu-central-2.amazonaws.com/config_doc_lt_mk2_standard.pdf), we will sample our inputs `green` and `pol_1` independently and uniformly at random from `{0,1,...,128}` and `[0, 90]`, respectively.
> {% endstep %}

{% step %}

#### **Set your inputs and take measurements**

Now, for each draw of your inputs, use the [SET instruction](/the-chambers/how-they-work.md#set-instruction) to set them in the hardware, and the [MEASURE instruction](/the-chambers/how-they-work.md#measure-instruction) to take a measurement. If the chamber has lagged effects (e.g., `load_in` $$\to$$ `rpm_in` in the [Wind Tunnel](/the-chambers/wind-tunnel-mk2.md#load_in-out-rpm_in-out-current_in-out-current_in-out_raw)) and you want to measure after the system reaches equilibrium, you can add a [WAIT instruction](/the-chambers/how-they-work.md#wait-instruction).
{% endstep %}
{% endstepper %}

#### Putting it together

Here is the complete code for the example above, using the [experiment queue](/remote-lab/using-the-experiment-queue.md) to collect the data.

```python
# Connect to the Remote Lab
from causalchamber import lab
rlab = lab.Lab(credentials_file = 'path/to/file')

# Define a new experiment
experiment = rlab.new_experiment(chamber_id = 'lt-ptdm-fu3p', config = 'standard')

# Sample inputs
from numpy.random import uniform, randint
green = randint(0, 256, size=100)
pol_1 = uniform(0, 90, size=100)

# Set inputs & take measurements
for g,p in zip(green, pol_1):
  experiment.set('green', g)
  experiment.set('pol_1', p)
  experiment.measure(n=1)

# Submit experiment
experiment.submit(tag='example')
```

You can then [monitor the experiment](/remote-lab/using-the-experiment-queue.md#monitoring-your-experiments) and [download the data](/remote-lab/using-the-experiment-queue.md#downloading-the-data) once it's finished. To simplify the above syntax, you can also define the experiment [from a pandas dataframe](/remote-lab/using-the-experiment-queue.md#generating-instructions-from-a-pandas-dataframe).

Let's visualize the results!

{% tabs %}
{% tab title="Figure 1" %}

<figure><img src="/files/9yaSb1JGvEG8OdVDMMlJ" alt="" width="406"><figcaption><p>Visualization of the resulting data, with the inputs <code>green</code> and <code>pol_1</code> on the x-axis, and the sensor measurements <code>ir_1</code> and <code>ir_3</code> on the y-axis.  As expected from the ground-truth graph, <code>green</code> has an effect on both measurements, whereas <code>pol_1</code> affects only <code>ir_3</code>. Following <a href="/spaces/UqYDL9yvLTNUYW7H1Q6t/pages/nfKs9660N39HBagy17gT#pol_1-2-ir_3-vis_3">Malus' law</a>, as <code>pol_1</code> approaches 90 degrees, the polarizer chain blocks most of the light reaching the third sensor, reducing the effect of <code>green</code> on <code>ir_3</code>. See the <a href="/spaces/UqYDL9yvLTNUYW7H1Q6t/pages/nfKs9660N39HBagy17gT#chamber-diagram-and-variables">Chamber diagram</a> for the placement of the different components.</p></figcaption></figure>
{% endtab %}

{% tab title="Code" %}
To reproduce the figure:

```python
# Download experiment into a dataframe
df = rlab.download_data('50fa0624-54b8-4d87-8520-3149984aba44', root='/tmp').dataframe

# Make a pairplot
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.colors as mcolors
import numpy as np

x_vars = ['green', 'pol_1']
y_vars = ['ir_1', 'ir_3']

g = sns.PairGrid(df, x_vars=x_vars, y_vars=y_vars)

def plot_colored(x, y, **kwargs):
    plt.gca().scatter(
        x, y,
        c=df['pol_1'],
        cmap='magma',
        edgecolors='none',
    )

g.map(plot_colored)

g.figure.legends.clear()

# Add colorbar axes manually — outside the grid, no space stolen
cax1 = g.figure.add_axes([1.02, 0.35, 0.02, 0.25])

# pol_1 colorbar (magma, face color)
pol1_norm = mcolors.Normalize(vmin=df['pol_1'].min(), vmax=df['pol_1'].max())
sm_pol1 = plt.cm.ScalarMappable(cmap='magma', norm=pol1_norm)
sm_pol1.set_array([])
cb1 = g.figure.colorbar(sm_pol1, cax=cax1)
cax1.set_title('pol_1', loc='center', fontsize=10)
cb1.set_ticks([])
cb1.outline.set_visible(False)
```

{% endtab %}
{% endtabs %}

### Modifying causal effects

The Chambers are designed so that every effect between two variables can be modified by means of a third variable (i.e., a *mechanism change*). For example, we can manipulate the parameters of all sensors (marked with a <mark style="color:pink;">P</mark> in the graphs), controlling their behavior and the resulting measurements. For example, we can use `diode_ir_1` to [change the photodiode](/the-chambers/light-tunnel-mk2.md#diode_ir_j-ir_j-diode_vis_j-vis_j-j-1-2-3) used to produce the measurement `ir_1`, altering the incoming effects.

<figure><img src="/files/qCn6iVbBszJHzoqL51Uc" alt="" width="563"><figcaption></figcaption></figure>

To see this, let's repeat the previous experiment, but set `diode_ir_1=1` to use a smaller photodiode instead.

<pre class="language-python"><code class="lang-python"># Connect to the Remote Lab
from causalchamber import lab
rlab = lab.Lab(credentials_file = 'path/to/file')

# Define a new experiment
experiment = rlab.new_experiment(chamber_id = 'lt-ptdm-fu3p', config = 'standard')

# Sample inputs
from numpy.random import uniform, randint
green = randint(0, 256, size=100)
pol_1 = uniform(0, 90, size=100)

# Use diode_ir_1 to modify the effect between green and ir_1
<strong>experiment.set('diode_ir_1', 1)
</strong>
# Set inputs &#x26; take measurements
for g,p in zip(green, pol_1):
  experiment.set('green', g)
  experiment.set('pol_1', p)
  experiment.measure(n=1)

# Submit experiment
experiment.submit(tag='example-diode-1')
</code></pre>

By using a smaller photodiode, we have reduced the sensor's sensitivity. If we plot the data, we can see that the effect between `green` and `ir_1` is now "weaker" when compared to the original experiment (shown in gray).

{% tabs %}
{% tab title="Figure 2" %}

<figure><img src="/files/z0JvUbxOfwQGRkRclauu" alt="" width="406"><figcaption><p>Repeating the experiment in <a href="#figure-1">Figure 1</a> (data shown in gray) but using a smaller photodiode (<code>diode_ir_1=1</code>) to produce the measurement <code>ir_1</code>. The effect of <code>green</code> on <code>ir_1</code> is weaker (smaller slope), while the other variables remain unaffected.</p></figcaption></figure>
{% endtab %}

{% tab title="Code" %}
To reproduce the figure:

```python
# Download the experiments into dataframes
df_orig = rlab.download_data('50fa0624-54b8-4d87-8520-3149984aba44', root='/tmp').dataframe
df = rlab.download_data('e6ef4a1d-6d68-46d6-b802-94191ee6ec62', root='/tmp').dataframe

# Overlay the two pairplots
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.colors as mcolors
import numpy as np

x_vars = ['green', 'pol_1']
y_vars = ['ir_1', 'ir_3']

g = sns.PairGrid(df, x_vars=x_vars, y_vars=y_vars)

def plot_gray(x, y, **kwargs):
    plt.gca().scatter(df_orig[x.name], df_orig[y.name], color='#aaaaaa', alpha=1, edgecolor='#a8a8a8')

def plot_colored(x, y, **kwargs):
    plt.gca().scatter(
        x, y,
        c=df['pol_1'],
        cmap='magma',
        edgecolors='none',
    )

g.map(plot_gray)
g.map(plot_colored)

# Remove the default legend seaborn adds
g.figure.legends.clear()

# Add colorbar axes manually — outside the grid, no space stolen
cax1 = g.figure.add_axes([1.02, 0.35, 0.02, 0.25])

# pol_1 colorbar (magma, face color)
pol1_norm = mcolors.Normalize(vmin=df['pol_1'].min(), vmax=df['pol_1'].max())
sm_pol1 = plt.cm.ScalarMappable(cmap='magma', norm=pol1_norm)
sm_pol1.set_array([])
cb1 = g.figure.colorbar(sm_pol1, cax=cax1)
cax1.set_title('pol_1', loc='center', fontsize=10)
cb1.set_ticks([])
cb1.outline.set_visible(False)
```

{% endtab %}
{% endtabs %}

### Removing causal effects

In some cases, we can also completely remove a causal effect. For example, in the [Light Tunnel Mk2](/the-chambers/light-tunnel-mk2.md)

* we can **disable the polarizer motors** by setting the variables `mot_1_enabled` and `mot_2_enabled` to zero; changes in `pol_1` (`pol_2`) will no longer affect the actual polarizer position, eliminating the outgoing edges from these variables.
* we can use the variables [`res_*`](#user-content-fn-1)[^1] and `offset_*` to [saturate the analog sensors](/the-chambers/light-tunnel-mk2.md#offset-sps-res_current_ls-current_ls-current_ls_raw) that produce the measurements [`current_*`](#user-content-fn-2)[^2] and `angle_*` , removing the edges coming into these variables.

As an example, let's repeat the experiment from [Figure 1](#figure-1), but disable the polarizer motor to cancel the effect of `pol_1` on `ir_3`.

<figure><img src="/files/chkCTvxMxvovg7YRwHvG" alt="" width="563"><figcaption></figcaption></figure>

To do this, we need to set `mot_1_enabled = 0` before starting the experiment:

<pre class="language-python"><code class="lang-python"># Connect to the Remote Lab
from causalchamber import lab
rlab = lab.Lab(credentials_file = 'path/to/file')

# Define a new experiment
experiment = rlab.new_experiment(chamber_id = 'lt-ptdm-fu3p', config = 'standard')

# Sample inputs
from numpy.random import uniform, randint
green = randint(0, 256, size=100)
pol_1 = uniform(0, 90, size=100)

# Use mot_1_enabled to disable the polarizer motor
<strong>experiment.set('mot_1_enabled', 0)
</strong>
# Set inputs &#x26; take measurements
for g,p in zip(green, pol_1):
  experiment.set('green', g)
  experiment.set('pol_1', p)
  experiment.measure(n=1)

# Submit experiment
experiment.submit(tag='example-motor-disabled')
</code></pre>

Let's visualize the result. As expected, `pol_1` no longer has an effect on `ir_3`. All other variables remain the same.

{% tabs %}
{% tab title="Figure 3" %}

<figure><img src="/files/GnN7cHORg8Dq0Jn6Hbfn" alt="" width="406"><figcaption><p>Repeating the experiment in <a href="#figure-1">Figure 1</a> (data shown in gray) but disabling the motor of the first polarizer by setting <code>mot_1_enabled=0</code>. As a result, <code>pol_1</code> no longer has an effect on the polarizer position, removing its effect on <code>ir_3</code>.</p></figcaption></figure>
{% endtab %}

{% tab title="Code" %}
To create the figure:

```python
# Download the experiments into dataframes
df_orig = rlab.download_data('50fa0624-54b8-4d87-8520-3149984aba44', root='/tmp').dataframe
df = rlab.download_data('a0cde850-faf8-4b42-b5bc-94b2eae783e6', root='/tmp').dataframe

# Overlay the two pairplots
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.colors as mcolors
import numpy as np

x_vars = ['green', 'pol_1']
y_vars = ['ir_1', 'ir_3']

g = sns.PairGrid(df, x_vars=x_vars, y_vars=y_vars)

def plot_gray(x, y, **kwargs):
    plt.gca().scatter(df_orig[x.name], df_orig[y.name], color='#aaaaaa', alpha=1, edgecolor='#a8a8a8')

def plot_colored(x, y, **kwargs):
    plt.gca().scatter(
        x, y,
        c=df['pol_1'],
        cmap='magma',
        edgecolors='none',
    )

g.map(plot_gray)
g.map(plot_colored)

# Remove the default legend seaborn adds
g.figure.legends.clear()

# Add colorbar axes manually — outside the grid, no space stolen
cax1 = g.figure.add_axes([1.02, 0.35, 0.02, 0.25])

# pol_1 colorbar (magma, face color)
pol1_norm = mcolors.Normalize(vmin=df['pol_1'].min(), vmax=df['pol_1'].max())
sm_pol1 = plt.cm.ScalarMappable(cmap='magma', norm=pol1_norm)
sm_pol1.set_array([])
cb1 = g.figure.colorbar(sm_pol1, cax=cax1)
cax1.set_title('pol_1', loc='center', fontsize=10)
cb1.set_ticks([])
cb1.outline.set_visible(False)
```

{% endtab %}
{% endtabs %}

### Adding causal effects

In the standard [hardware configurations](/the-chambers/how-they-work.md#hardware-configurations), all the inputs and sensor parameters (marked with <mark style="color:orange;">I</mark> and <mark style="color:pink;">P</mark> in the graphs) must be set by the user. As a result, these are exogenous variables, and the ground truth is a [bipartite graph](https://en.wikipedia.org/wiki/Bipartite_graph), with all edges directed from an input or sensor parameter to a sensor measurement (marked with <mark style="color:violet;">M</mark> in the graph).

In addition to sampling the inputs and parameters from an SCM, we can introduce additional effects by setting inputs or sensor parameters as functions **of sensor measurements.** This is done automatically by the chamber in some [hardware configurations](/the-chambers/how-they-work.md#hardware-configurations); for example, in the [`linked_leds`](https://cchamber-box.s3.eu-central-2.amazonaws.com/config_doc_lt_mk2_linked_leds.pdf) and [`linked_leds_sigmoid`](https://cchamber-box.s3.eu-central-2.amazonaws.com/config_doc_lt_mk2_linked_leds_sigmoid.pdf) configurations, the chamber sets `led_2_uv` and `led_3_uv` as a function of `ir_1` and `ir_2`, respectively.

By operating the chamber in [interactive (real-time) mode](/remote-lab/running-a-real-time-experiment.md), you can also perform these operations on your end, allowing you to add arbitrary effects between variables. As an illustration, let's use this technique to add an additional effect from `ir_1` to `led_3_uv` in the running example of this section:

<figure><img src="/files/O6nqpKotZjuYvBgmJynV" alt="" width="375"><figcaption></figcaption></figure>

The code is similar to the previous experiments, but we connect to a chamber in [real-time](/remote-lab/running-a-real-time-experiment.md) and split our measurement step into two parts: first, we measure `ir_1`, and then we set `led_3_uv` and measure `ir_3`.

{% tabs %}
{% tab title="Experiment code" %}

```python
# Connect to a chamber in real-time (interactive) mode
from causalchamber import lab
chamber = lab.Chamber(chamber_id = 'lt-demo-ch4lu',
                      config='standard',
                      credentials_file='path/to/file')

# Sample inputs
from numpy.random import uniform, randint
green = randint(0, 256, size=100)
pol_1 = uniform(0, 90, size=100)

# Measurement loop
measurements = []
for i,(g,p) in enumerate(zip(green, pol_1)):
  print(f'Collecting measurement: {i+1} / {len(green)}', end='\r')  
  chamber.set('green', g)
  chamber.set('pol_1', p)
  # Step 1: measure ir_1
  ir_1 = chamber.measure(n=1).iloc[0].ir_1
  # Step 2: set led_3_uv as a function of ir_1, measure ir_3
  led_3_uv = int(max(0, min(4095, ir_1 * 0.0624)))
  chamber.set('led_3_uv', led_3_uv)
  ir_3 = chamber.measure(n=1).iloc[0].ir_3
  measurements.append((g, p, ir_1, led_3_uv, ir_3))

# Concatenate measurements into a dataframe
import pandas as pd
df = pd.DataFrame(measurements,
                  columns = ['green', 'pol_1', 'ir_1', 'led_3_uv', 'ir_3'])
```

{% endtab %}

{% tab title="With batched instructions" %}
You can speed up the experiment by [submitting multiple instructions in a single request](/remote-lab/running-a-real-time-experiment.md#submitting-multiple-instructions-at-once):

```python
# Connect to a chamber in real-time (interactive) mode
from causalchamber import lab
chamber = lab.Chamber(chamber_id = 'lt-demo-ch4lu',
                      config='standard',
                      credentials_file='path/to/file')

# Sample inputs
from numpy.random import uniform, randint
green = randint(0, 256, size=100)
pol_1 = uniform(0, 90, size=100)

# Measurement loop
measurements = []
for i,(g,p) in enumerate(zip(green, pol_1)):
  print(f'Collecting measurement: {i+1} / {len(green)}', end='\r')
  batch = chamber.new_batch()
  batch.set('green', g)
  batch.set('pol_1', p)
  # Step 1: measure ir_1
  batch.measure(n=1)
  ir_1 = batch.submit().iloc[0].ir_1
  # Step 2: set led_3_uv as a function of ir_1, measure ir_3
  led_3_uv = int(max(0, min(4095, ir_1 * 0.0624)))
  batch = chamber.new_batch()
  batch.set('led_3_uv', led_3_uv)
  batch.measure(n=1)
  ir_3 = batch.submit().iloc[0].ir_3
  measurements.append((g, p, ir_1, led_3_uv, ir_3))

# Concatenate measurements into a dataframe
import pandas as pd
df = pd.DataFrame(measurements,
                  columns = ['green', 'pol_1', 'ir_1', 'led_3_uv', 'ir_3'])
```

{% endtab %}
{% endtabs %}

Let's visualize the results. As before, we plot the original experiment from [Figure 1](#figure-1) in gray. To visualize the new causal effect from `ir_1` to `ir_3`—resulting from the new edge `ir_1` $$\to$$ `led_3_uv` and the existing effect `led_3_uv` $$\to$$`ir_3`—we color the new datapoints as follows: the fill color corresponds to the value of `pol_1`, and the edge color to the value of `ir_1`.

{% tabs %}
{% tab title="Figure 4" %}

<figure><img src="/files/Rn6WXKwOHPwnKjR6paRA" alt="" width="406"><figcaption><p>Repeating the experiment in <a href="#figure-1">Figure 1</a> (data shown in gray) with an additional edge from <code>ir_1</code> to <code>led_3_uv</code>. This creates a causal relationship between <code>ir_1</code> and <code>ir_3</code>. To visualize this new dependency, we color the edge of each datapoint according to the value of <code>ir_1</code>. </p></figcaption></figure>
{% endtab %}

{% tab title="Code" %}
To create the figure:

```python
# Download the original experiment
df_orig = rlab.download_data('50fa0624-54b8-4d87-8520-3149984aba44', root='/tmp').dataframe

# Overlay the two pairplots
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.colors as mcolors
import numpy as np

x_vars = ['green', 'pol_1']
y_vars = ['ir_1', 'ir_3']

g = sns.PairGrid(df, x_vars=x_vars, y_vars=y_vars)

def plot_gray(x, y, **kwargs):
    plt.gca().scatter(df_orig[x.name], df_orig[y.name], color='#aaaaaa', alpha=1, edgecolor='#a8a8a8')

ir1_norm = (df['ir_1'] - df['ir_1'].min()) / (df['ir_1'].max() - df['ir_1'].min())
edge_colors = np.column_stack([
    ir1_norm,
    np.zeros(len(ir1_norm)),
    ir1_norm,
])

def plot_colored_edges(x, y, **kwargs):
    plt.gca().scatter(
        x, y,
        c=df['pol_1'],
        cmap='cividis',
        edgecolors=edge_colors,
        linewidths=0.8,
    )

g.map(plot_gray)
g.map(plot_colored_edges)

g.figure.legends.clear()

# Add colorbar axes manually — outside the grid, no space stolen
# [left, bottom, width, height] in figure coordinates
cax1 = g.figure.add_axes([1.02, 0.55, 0.02, 0.25])  # upper colorbar
cax2 = g.figure.add_axes([1.02, 0.20, 0.02, 0.25])  # lower colorbar

# pol_1 colorbar (magma, face color)
pol1_norm = mcolors.Normalize(vmin=df['pol_1'].min(), vmax=df['pol_1'].max())
sm_pol1 = plt.cm.ScalarMappable(cmap='cividis', norm=pol1_norm)
sm_pol1.set_array([])
cb1 = g.figure.colorbar(sm_pol1, cax=cax1)
cax1.set_title('pol_1', loc='center', fontsize=10)
cb1.set_ticks([])
cb1.outline.set_visible(False)

# ir_1 colorbar (black→green, edge color)
ir1_cmap = mcolors.LinearSegmentedColormap.from_list('black_green', ['black', '#ff00ff'])
ir1_norm_obj = mcolors.Normalize(vmin=df['ir_1'].min(), vmax=df['ir_1'].max())
sm_ir1 = plt.cm.ScalarMappable(cmap=ir1_cmap, norm=ir1_norm_obj)
sm_ir1.set_array([])
cb2 = g.figure.colorbar(sm_ir1, cax=cax2)
cax2.set_title('ir_1', loc='center', fontsize=10)
cb2.set_ticks([])
cb2.outline.set_visible(False)
```

{% endtab %}
{% endtabs %}

#### A note of caution

When adding new causal effects, you need to be careful not to create dependencies between successive measurements. For example, let's try to add the edge `ir_1` $$\to$$ `red` to our running example:

<figure><img src="/files/bJScrVhjcxfkVCH3y5J7" alt="" width="375"><figcaption></figcaption></figure>

If we [naively implement](#naive-implementation) this into our two-step procedure above, we will create a dependency between successive measurements, breaking the i.i.d. assumption. Because `red` also affects `ir_1` , the value of `ir_1` in each measurement step will depend on the value of `red` in the previous step.

<figure><img src="/files/wetSJvvJn2s7pYPyCwbs" alt="" width="563"><figcaption></figcaption></figure>

As a simple [workaround](#workaround), you can set the relevant inputs (`red` in this case) to a constant value at the beginning of each measurement cycle, breaking the dependency.

{% tabs %}
{% tab title="Naive implementation" %}

<pre class="language-python"><code class="lang-python"># Connect to a chamber in real-time (interactive) mode
from causalchamber import lab
chamber = lab.Chamber(chamber_id = 'lt-demo-ch4lu',
                      config='standard',
                      credentials_file='path/to/file')

# Sample inputs
from numpy.random import uniform, randint
green = randint(0, 256, size=100)
pol_1 = uniform(0, 90, size=100)

# Measurement loop
measurements = []
for i,(g,p) in enumerate(zip(green, pol_1)):
  print(f'Collecting measurement: {i+1} / {len(green)}', end='\r')  
  chamber.set('green', g)
  chamber.set('pol_1', p)
  # Step 1: measure ir_1
  ir_1 = chamber.measure(n=1).iloc[0].ir_1
  # Step 2: set red as a function of ir_1, measure ir_3
  red = int(max(0, min(255, ir_1 * 0.0038)))
<strong>  chamber.set('red', red) # WARNING: dependency between measurements
</strong>  ir_3 = chamber.measure(n=1).iloc[0].ir_3
  measurements.append((g, p, ir_1, red, ir_3))

# Concatenate measurements into a dataframe
import pandas as pd
df = pd.DataFrame(measurements,
                  columns = ['green', 'pol_1', 'ir_1', 'red', 'ir_3'])
</code></pre>

{% endtab %}

{% tab title="Workaround" %}

<pre class="language-python"><code class="lang-python"># Connect to a chamber in real-time (interactive) mode
from causalchamber import lab
chamber = lab.Chamber(chamber_id = 'lt-demo-ch4lu',
                      config='standard',
                      credentials_file='path/to/file')

# Sample inputs
from numpy.random import uniform, randint
green = randint(0, 256, size=100)
pol_1 = uniform(0, 90, size=100)

# Measurement loop
measurements = []
for i,(g,p) in enumerate(zip(green, pol_1)):
  print(f'Collecting measurement: {i+1} / {len(green)}', end='\r')
<strong>  chamber.set('red', 0) # Removes the dependency
</strong>  chamber.set('green', g)
  chamber.set('pol_1', p)
  # Step 1: measure ir_1
  ir_1 = chamber.measure(n=1).iloc[0].ir_1
  # Step 2: set red as a function of ir_1, measure ir_3
  red = int(max(0, min(255, ir_1 * 0.0038)))
  chamber.set('red', red)
  ir_3 = chamber.measure(n=1).iloc[0].ir_3
  measurements.append((g, p, ir_1, red, ir_3))

# Concatenate measurements into a dataframe
import pandas as pd
df = pd.DataFrame(measurements,
                  columns = ['green', 'pol_1', 'ir_1', 'red', 'ir_3'])
</code></pre>

{% endtab %}
{% endtabs %}

### Generating interventional data

There are several ways to generate interventional data, depending on which variable receives the intervention.

* If they are exogenous, you can **intervene on inputs and sensor parameters** (marked by <mark style="color:orange;">I</mark> and <mark style="color:pink;">P</mark> in the graphs) by changing the distribution or the process from which you sample them.
* To **intervene on sensor measurements** (marked by <mark style="color:violet;">M</mark> in the graphs), you can modify an underlying sensor parameter—or another third variable—while excluding it from the dataset. You can also intervene on hidden confounders (see [Introducing confounders](#introducing-confounders) below).
* You can also perform **mechanism changes** by [modifying](#modifying-causal-effects) or [removing](#removing-causal-effects) causal effects.

#### An example

Let's see how this works for the [running example](#generating-data-an-example) of this section. We will perform interventions on the input `green` and the sensor measurement `ir_1`.

{% tabs %}
{% tab title="Intervention on green" %}
In this case, the intervention consists of changing the distribution from which we sample the input `green`, e.g., from a uniform to a truncated normal.

<pre class="language-python"><code class="lang-python"># Connect to the Remote Lab
from causalchamber import lab
rlab = lab.Lab(credentials_file = 'path/to/file')

# Define a new experiment
experiment = rlab.new_experiment(chamber_id = 'lt-ptdm-fu3p', config = 'standard')

# Sample inputs
from numpy.random import uniform
from scipy.stats import truncnorm
<strong>green = truncnorm.rvs(-128 / 20, 127 / 20, loc=128, scale=20, size=100).astype(int)
</strong>pol_1 = uniform(0, 90, size=100)

# Set inputs &#x26; take measurements
for g,p in zip(green, pol_1):
  experiment.set('green', g)
  experiment.set('pol_1', p)
  experiment.measure(n=1)

# Submit experiment
experiment.submit(tag='example-int-green')
</code></pre>

{% endtab %}

{% tab title="ir\_1" %}
For the intervention on `ir_1`, we will set the variable `led_1_uv` from `0` (the default) to `2048`. This increases the brightness of the UV LED placed by the 1<sup>st</sup> light sensor (see the [Chamber diagram](/the-chambers/light-tunnel-mk2.md#chamber-diagram-and-variables)), creating an additive shift in the values of `ir_1`. In the [`standard`](https://cchamber-box.s3.eu-central-2.amazonaws.com/config_doc_lt_mk2_standard.pdf) configuration, the LED only turns on when the sensor is taking a measurement, avoiding interference with other variables.

<pre class="language-python"><code class="lang-python"># Connect to the Remote Lab
from causalchamber import lab
rlab = lab.Lab(credentials_file = 'path/to/file')

# Define a new experiment
experiment = rlab.new_experiment(chamber_id = 'lt-ptdm-fu3p', config = 'standard')

# Sample inputs
from numpy.random import uniform, randint
green = randint(0, 256, size=100)
pol_1 = uniform(0, 90, size=100)

# Intervene on ir_1 through led_1_uv
<strong>experiment.set('led_1_uv', 2048)
</strong>
# Set inputs &#x26; take measurements
for g,p in zip(green, pol_1):
  experiment.set('green', g)
  experiment.set('pol_1', p)
  experiment.measure(n=1)

# Submit experiment
experiment.submit(tag='example-int-ir_1')
</code></pre>

{% endtab %}
{% endtabs %}

Let's visualize the results!

{% tabs %}
{% tab title="Figure 5" %}

<figure><img src="/files/rtVu9J973uV4TMqklExu" alt="" width="503"><figcaption><p>Repeating the experiment in <a href="#figure-1">Figure 1</a> (data shown in gray) with an intervention on <code>green</code> and an intervention on <code>ir_1</code> (through the variable <code>led_1_uv</code>).</p></figcaption></figure>
{% endtab %}

{% tab title="Figure code" %}
To create the figure:

```python
# Download experiment into a dataframe
df_orig = rlab.download_data('50fa0624-54b8-4d87-8520-3149984aba44', root='/tmp').dataframe
df_green = rlab.download_data('02bd34ca-7497-442c-ac74-ea9dc3645b58', root='/tmp').dataframe
df_ir_1 = rlab.download_data('6ec1a885-9a9f-45f1-a4d6-1a2f4a852882', root='/tmp').dataframe

# Make a pairplot
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np

x_vars = ['green', 'pol_1']
y_vars = ['ir_1', 'ir_3']

# Tag each dataframe with its source
df_orig['_source'] = 'orig'
df_green['_source'] = 'green'
df_ir_1['_source'] = 'ir_1'

df = pd.concat([df_orig, df_green, df_ir_1], ignore_index=True)
df = df.sample(n=len(df), replace=False)

color_map = {
    'orig':  '#aaaaaa',
    'green': '#6BCB77',
    'ir_1':  '#785ef0',
}

label_map = {
    'orig':  'None',
    'green': 'green',
    'ir_1':  'ir_1 (led_1_uv)',
}

g = sns.PairGrid(df, x_vars=x_vars, y_vars=y_vars)

def plot_colored(x, y, **kwargs):
    colors = df.loc[x.index, '_source'].map(color_map)
    plt.gca().scatter(x, y, c=colors, edgecolors='none')

g.map(plot_colored)

# Add legend
legend_handles = [
    plt.scatter([], [], color=color_map[key], label=label_map[key], edgecolors='none')
    for key in color_map
]
g.figure.legend(
    handles=legend_handles,
    title='Intervention',
    bbox_to_anchor=(1.02, 0.65),
    loc='upper left',
    frameon=False,
)
```

{% endtab %}
{% endtabs %}

### Introducing confounders

To introduce a hidden confounder or a latent variable, you can sample an input from a distribution or process of your choice, but exclude it from the dataset. This will introduce confounding between the sensor measurements affected by this input.

In our [running example](#generating-data-an-example), we can sample the input `red` to create a confounder between the measurements `ir_1` and `ir_3`.

<figure><img src="/files/fHqWBCLvT7LSVJWG8kWV" alt="" width="375"><figcaption></figcaption></figure>

Let's run the experiment and visualize the results.

{% tabs %}
{% tab title="Figure 6" %}

<figure><img src="/files/JfLakVv5jEA4T0rca7zC" alt="" width="406"><figcaption><p>Repeating the experiment in <a href="#figure-1">Figure 1</a> (data shown in gray) with the additional input <code>red</code> acting as a latent common cause between <code>ir_1</code> and <code>ir_3</code>.</p></figcaption></figure>
{% endtab %}

{% tab title="Experiment" %}

<pre class="language-python"><code class="lang-python"># Connect to the Remote Lab
from causalchamber import lab
rlab = lab.Lab(credentials_file = 'path/to/file')

# Define a new experiment
experiment = rlab.new_experiment(chamber_id = 'lt-ptdm-fu3p', config = 'standard')

# Sample inputs
from numpy.random import uniform, randint
green = randint(0, 256, size=100)
pol_1 = uniform(0, 90, size=100)
<strong>red = randint(0, 256, size=100)
</strong>
# Set inputs &#x26; take measurements
for g,p,r in zip(green, pol_1, red):
  experiment.set('green', g)    
  experiment.set('pol_1', p)
<strong>  experiment.set('red', r)
</strong>  experiment.measure(n=1)

# Submit experiment
experiment.submit(tag='example-confounder')
</code></pre>

{% endtab %}

{% tab title="Figure code" %}
To reproduce the figure:

```python
# Download the original experiment
df_orig = rlab.download_data('50fa0624-54b8-4d87-8520-3149984aba44', root='/tmp').dataframe
df = rlab.download_data('3d940c83-9a99-4bc8-a17b-bfc552ffd43f', root='/tmp').dataframe

# Overlay the two pairplots
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.colors as mcolors
import numpy as np

x_vars = ['green', 'pol_1']
y_vars = ['ir_1', 'ir_3']

g = sns.PairGrid(df, x_vars=x_vars, y_vars=y_vars)

def plot_gray(x, y, **kwargs):
    plt.gca().scatter(df_orig[x.name], df_orig[y.name], color='#aaaaaa', alpha=1, edgecolor='#a8a8a8')

# Normalize red channel by dividing by 255 (natural 8-bit range)
red_norm = df['red'] / 255
edge_colors = np.column_stack([
    red_norm,
    np.zeros(len(red_norm)),
    np.zeros(len(red_norm)),
])

def plot_colored_edges(x, y, **kwargs):
    plt.gca().scatter(
        x, y,
        c=df['pol_1'],
        cmap='viridis',
        edgecolors=edge_colors,
        linewidths=0.8,
    )

g.map(plot_gray)
g.map(plot_colored_edges)

g.figure.legends.clear()

# Add colorbar axes manually — outside the grid, no space stolen
# [left, bottom, width, height] in figure coordinates
cax1 = g.figure.add_axes([1.02, 0.55, 0.02, 0.25])  # upper colorbar
cax2 = g.figure.add_axes([1.02, 0.20, 0.02, 0.25])  # lower colorbar

# pol_1 colorbar (cividis, face color)
pol1_norm = mcolors.Normalize(vmin=df['pol_1'].min(), vmax=df['pol_1'].max())
sm_pol1 = plt.cm.ScalarMappable(cmap='viridis', norm=pol1_norm)
sm_pol1.set_array([])
cb1 = g.figure.colorbar(sm_pol1, cax=cax1)
cax1.set_title('pol_1', loc='center', fontsize=10)
cb1.set_ticks([])
cb1.outline.set_visible(False)

# red colorbar (black→red, edge color)
red_cmap = mcolors.LinearSegmentedColormap.from_list('black_red', ['black', 'red'])
red_norm_obj = mcolors.Normalize(vmin=0, vmax=255)
sm_red = plt.cm.ScalarMappable(cmap=red_cmap, norm=red_norm_obj)
sm_red.set_array([])
cb2 = g.figure.colorbar(sm_red, cax=cax2)
cax2.set_title('red', loc='center', fontsize=10)
cb2.set_ticks([])
cb2.outline.set_visible(False)
```

{% endtab %}
{% endtabs %}

### Citation

If you use this documentation, our [open-source datasets](https://github.com/juangamella/causal-chamber), or the [Remote Lab](/remote-lab/quickstart.md) in your scientific work, please consider citing:

{% code overflow="wrap" %}

```bibtex
﻿@article{gamella2025chamber,
  author={Gamella, Juan L. and Peters, Jonas and B{\"u}hlmann, Peter},
  title={Causal chambers as a real-world physical testbed for {AI} methodology},
  journal={Nature Machine Intelligence},
  doi={10.1038/s42256-024-00964-x},
  year={2025},
}
```

{% endcode %}

To directly reference this blog post, you can cite

```bibtex
@misc{chambers2026realdata,
    author = {Gamella, Juan L.},
    title = {Generating real-world data with a known causal structure},
    howpublished = {Causal Chamber®, Research Guides},
    month = {June 8,},
    year = {2026},
    url = {https://docs.causalchamber.ai/case-studies/causal-inference/sampling-real-world-data-from-a-known-causal-structure},
    note = {Accessed: YYYY-MM-DD}
}

```

### References

> \[Gamella 2025] \[[PDF](https://www.nature.com/articles/s42256-024-00964-x)] Gamella, Juan L., Peters, Jonas & Bühlmann, Peter. Causal chambers as a real-world physical testbed for AI methodology. *Nat Mach Intell* 7, 107–118 (2025).

[^1]: Here we use `*` as a wildcard, i.e., to symbolize the variables `res_current_ls`, `res_current_mot_1`, etc.<br>

[^2]: Here we use `*` as a wildcard, i.e., to symbolize the variables `current_ls`, `current_mot_1`, etc.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.causalchamber.ai/case-studies/causal-inference/generating-real-data-with-a-known-causal-structure.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.