# Graduate seminar course

<figure><img src="/files/3BevyNkWUcNfk7tETf14" alt=""><figcaption></figcaption></figure>

In this graduate seminar, students implement core ML methods from scratch and evaluate them against experimental data they collect from a physical system — highlighting common misconceptions, and providing firsthand experience of how theory can both fail and succeed under real-world conditions.

The course, originally called "*From ML Theory to Practice*", was run at the [University of Potsdam](https://www.uni-potsdam.de/en/university-of-potsdam) in the fall semester of 2025. It was designed by [Juan L. Gamella](https://github.com/juangamella) and [Simon Bing](https://simonbing.github.io/), with Simon as the instructor for the Fall 2025 edition.

The course gives graduate (or advanced undergraduate) students in CS, Statistics, or Data Science a taste of what research in machine learning looks like in practice. Over the semester, students independently read research literature, implement a selection of core methods from scratch — classifier two-sample tests, VAEs, Gaussian processes, Bayesian optimization — and apply them to real experimental data they collect from the [Causal Chambers](https://causalchamber.ai/), a set of physical devices designed for ML and causal-inference research.

{% hint style="info" %}
See the [course repository](https://github.com/juangamella/seminar-course) for the complete details, schedule, and materials.
{% endhint %}

### How the course is run

The course follows a [flipped classroom](https://en.wikipedia.org/wiki/Flipped_classroom) approach. Each week, students read the assigned literature at home, work through the current project notebook, and meet for a 2-hour session with the instructor. In each session:

* Students present their solution to the previous project.
* Open questions about the literature or the project are discussed with the instructor.
* The next topic and project are introduced.

Between sessions, take-home work alternates between reading (in preparation for a new topic) and implementing (working through the corresponding project notebook). See the [course repository](https://github.com/juangamella/seminar-course) for the complete details and materials.

{% hint style="info" %}
The instructor solutions and additional support material are hosted in a separate, private repository. You can request access through [this form](https://forms.causalchamber.ai/seminar-course-solutions).
{% endhint %}

### Course outline

The course is split into 8 projects of varying length. Some are done in-session with the instructor, while others are intended for the students to work at home.

Where possible, projects use the existing [open-source datasets](https://github.com/juangamella/causal-chamber) and need no live access to a Causal Chamber® (see *Needs API* below). The remaining projects have students collect their own data through the [Remote Lab](/remote-lab/quickstart.md).

#### Projects

{% stepper %}
{% step %}

#### Understanding linear models on synthetic data

<table data-header-hidden><thead><tr><th width="126.06298828125"></th><th width="172.9813232421875"></th><th width="143.293701171875"></th></tr></thead><tbody><tr><td><a href="https://github.com/juangamella/seminar-course/tree/main/project_11">Project page</a></td><td><a href="https://github.com/juangamella/seminar-course/blob/main/project_11/project_11_linear_models_synthetic.ipynb">Exercise notebook</a></td><td>Needs API: no</td></tr></tbody></table>

The goal is to familiarize students with the linear model and expose them to **common misconceptions** about p-values, confidence and prediction intervals, and related concepts. The project serves as a warm-up for the course and the submission system.
{% endstep %}

{% step %}

#### **Intermezzo:** collecting data from a Causal Chamber®

<table data-header-hidden><thead><tr><th width="126.06298828125"></th><th width="172.9813232421875"></th><th width="143.293701171875"></th></tr></thead><tbody><tr><td><a href="https://github.com/juangamella/seminar-course/tree/main/intermezzo">Project page</a></td><td><a href="https://github.com/juangamella/seminar-course/blob/main/intermezzo/intermezzo.ipynb">Exercise notebook</a></td><td>Needs API: yes</td></tr></tbody></table>

Together with the instructor, the students set up their credentials and learn to collect data from the [Chambers](/the-chambers/how-they-work.md) using the [Remote Lab](/remote-lab/quickstart.md).
{% endstep %}

{% step %}

#### Linear models and real-world data

<table data-header-hidden><thead><tr><th width="126.06298828125"></th><th width="172.9813232421875"></th><th width="143.293701171875"></th></tr></thead><tbody><tr><td><a href="https://github.com/juangamella/seminar-course/tree/main/project_12">Project page</a></td><td><a href="https://github.com/juangamella/seminar-course/blob/main/project_12/project_12_linear_models_real.ipynb">Exercise notebook</a></td><td>Needs API: yes</td></tr></tbody></table>

The students apply the linear model to experimental data collected from the [Chambers](/the-chambers/how-they-work.md) and experience how the model breaks down under assumption violations. They witness the effect of **multicollinearity**, and collect data to observe the principle of **causal invariance** and the minimax formulation of causality.
{% endstep %}

{% step %}

#### Causality, RCTs, and two-sample testing

<table data-header-hidden><thead><tr><th width="126.06298828125"></th><th width="172.9813232421875"></th><th width="143.293701171875"></th></tr></thead><tbody><tr><td><a href="https://github.com/juangamella/seminar-course/tree/main/project_2">Project page</a></td><td><a href="https://github.com/juangamella/seminar-course/blob/main/project_2/project_2_rcts_testing.ipynb">Exercise notebook</a></td><td>Needs API: yes</td></tr></tbody></table>

The students learn the basics of experiment design, randomized controlled trials, and statistical hypothesis testing. They apply what they've learned to **test a real causal hypothesis** in the physical system of the [Chambers](/the-chambers/how-they-work.md), and repeat the experiment under different conditions to witness the problems with **p-value peeking**.
{% endstep %}

{% step %}

#### Classifier two-sample tests

<table data-header-hidden><thead><tr><th width="126.06298828125"></th><th width="172.9813232421875"></th><th width="143.293701171875"></th></tr></thead><tbody><tr><td><a href="https://github.com/juangamella/seminar-course/tree/main/project_3">Project page</a></td><td><a href="https://github.com/juangamella/seminar-course/blob/main/project_3/project_3_c2st.ipynb">Exercise notebook</a></td><td>Needs API: no</td></tr></tbody></table>

As a follow-up to the previous project, the students learn about classifier two-sample tests as a tool for hypothesis testing on high-dimensional data. They build a complete classifier and test from scratch, and apply it to an image dataset from the [Chambers](/the-chambers/how-they-work.md).
{% endstep %}

{% step %}

#### Generative models: VAEs

<table data-header-hidden><thead><tr><th width="126.06298828125"></th><th width="172.9813232421875"></th><th width="143.293701171875"></th></tr></thead><tbody><tr><td><a href="https://github.com/juangamella/seminar-course/tree/main/project_4">Project page</a></td><td><a href="https://github.com/juangamella/seminar-course/blob/main/project_4/project_4_vaes.ipynb">Exercise notebook</a></td><td>Needs API: no</td></tr></tbody></table>

The students implement an autoencoder and a variational autoencoder (VAE) from scratch, and apply them to a representation learning problem using experimental data with a ground truth from the [Chambers](/the-chambers/how-they-work.md).
{% endstep %}

{% step %}

#### Gaussian Processes (GPs)

<table data-header-hidden><thead><tr><th width="126.06298828125"></th><th width="172.9813232421875"></th><th width="143.293701171875"></th></tr></thead><tbody><tr><td><a href="https://github.com/juangamella/seminar-course/tree/main/project_51">Project page</a></td><td><a href="https://github.com/juangamella/seminar-course/blob/main/project_51/project_51_gps.ipynb">Exercise notebook</a></td><td>Needs API: no</td></tr></tbody></table>

The students build kernels and the machinery for Gaussian process regression and sampling from scratch. They apply the machinery to synthetic data and learn to incorporate background knowledge by combining kernels. The goal is to familiarize students with GPs as preparation for the final project in Bayesian optimization.
{% endstep %}

{% step %}

#### Bayesian Optimization

<table data-header-hidden><thead><tr><th width="126.06298828125"></th><th width="172.9813232421875"></th><th width="143.293701171875"></th></tr></thead><tbody><tr><td><a href="https://github.com/juangamella/seminar-course/tree/main/project_52">Project page</a></td><td><a href="https://github.com/juangamella/seminar-course/blob/main/project_52/project_52_BayesOpt.ipynb">Exercise notebook</a></td><td>Needs API: yes</td></tr></tbody></table>

The students apply what they've learned about GPs to build a complete Bayesian optimization pipeline and solve an optimization problem in the real, physical system of the [Chambers](/the-chambers/how-they-work.md).
{% endstep %}
{% endstepper %}

### For instructors

Interested in running this course at your institution? The course materials are publicly available in the [course repository](https://github.com/juangamella/seminar-course) under a [CC-BY-4.0](https://github.com/juangamella/seminar-course/tree/main#license) license. For instructor solutions, support material, or questions about adapting the course, reach out through [this form](https://forms.causalchamber.ai/seminar-course-solutions).<br>


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.causalchamber.ai/case-studies/education/graduate-seminar-course.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
