Anomaly detection with time-series

How our subscribers at SCANIA use the chambers to test anomaly detection algorithms.

[We believe the setup is useful for anyone testing anomaly detection and root-cause analysis, and we've written this research guide so you can do the same]

The problem

[A live system (truck / engine / bioreactor, etc). we have a stream (time-series) of data from different variables: actuators, control inputs and sensor outputs. There is some causal relationship between these variables: actuators and control inputs drive the sensor measurements, which are also affected by external, unmeasured influences). The goal of an anomaly detection algorithm is to monitor the stream of data, and detect when an anomaly has happened; for extra credit, what was the cause of this anomaly (root cause analysis).]

[Testing these algorithms outside of a computer simulation is difficult in practice for two reasons: (1) we don't know when anomalies happened, and (2) we don't know the underlying causal structure. This means we can't really check the answers that our algorithm gave us. As with many other fields, the gap between simulation and real, deployment scenarios is huge, making it difficult to properly test and refine our algorithms.]

As always, this is the gap that the chambers fill. They provide a real but controlled environment where (1) we can introduce anomalies at will, (2) we know the causal structure, and (3) there are still external, unmeasured influences that make the problem complicated enough.

Of course, an algorithm that works on the chambers does not necessarily work outside. But as one of the researchers at SCANIA told us "if it doesn't work on the Chamber it won't work on the real thing".

A testbed for anomaly detection

Because it produces time-series data from a dynamical system, the Wind Tunnel Mk2 was the obvious choice for this problem.

The chamber exposes data from 44 variables, from sensor measurements to control inputs. We can pick subsets of these to make up our data stream and use the rest to introduce anomalies.

Building a task

[To build a task, we pick variables: sensor measurements and control inputs that will drive them; then we pick other inputs and sensor parameters to act as 3rd variables, which modulate the relationships between our task variables, allowing us to introduce anomalies with a high degree of control.

chevron-rightscratchhashtag

To collect data: we set our control inputs of the chamber to follow a stochastic process of your choice (e.g., a random walk, real trajectories from some human user, etc.) and take measurements of all variables through time. This is our baseline, i.e., when the machine is operating normally.

[Chamber variables dropdown]

It exposes [X variables], including sensor measurements and control inputs / parameters. We will split the latter in two groups: some will be part of our data stream, and some will be used to introduce anomalies (see the relationship map).

[Then, we can introduce anomalies]

[graph]

An example

Let's start with a simple example [produce data for an anomaly detection algorithm: goal -> decide at which point in time the anomaly has occurred] using only a few variables:

  • Control inputs: fan loads load_in and load_out

  • Sensor measurements: fan speeds (rpm_in/out), fan currents (current_in/out_raw), air pressure (pressure_upwind/downwind/intake/ambient) and global chamber current (supply_current).

As anomalies

  • Leak in the system: hatch; at random or related to the rest of the system -> at high inner pressure (downwind), probability is higher.

  • Sensor drift: reference voltages -> fluctuations at random with low probability.

Below is the experimental data with the code to collect it from the Remote Lab.

Anomaly index

[To help you design different tasks, we find this map useful].

[Reformulation of the map of physical effects (causal ground-truth) of the Wind Tunnel Mk2].

[We reformulate the graph as] relationships (gray lines / arrows) in the Wind Tunnel Mk2 and the 3rd variables that modulate them (green arrows). See the relationship index below for complete details and accompanying experiments and plots.

[TODO: update graph: remove pressure_ambient, add edges from hatch/rpm_* to load_*->rpm_* relationships]

For each of the relationships in the graph (gray lines / arrows), you can find detailed description of each relationship and modulation, together with a visualization and the code to reproduce it.

circle-info

See the Map of physical effects for additional descriptions and visualizations of each edge.

Note: Each relationship (gray and green lines/arrows) can be seen independently in the index of physical effects for this chamber.

chevron-rightload_in/out \longrightarrow rpm_in/outhashtag

The fan loads load_in and load_out control the speed of the fans in an open-loop fashion. In steady state, when keeping all other variables (e.g., hatch position other fan constant) the relationship between load and speed is linear (Fig. 1).

When the load is set to zero, the fan is completely powered off and no longer produces a tachometer signal, i.e., the corresponding speed measurement (rpm_in/out) is the last measured speed (Fig. 2).

[Figure 2: A: powered off fan, see experiment in old paper]

Modulation by res_rpm_in/out

The relationship can be modulated through the variables res_rpm_in/out, which control the resolution of the speed sensor of each fan. At lower resolution, the quantization error is higher, and is larger for higher fan speeds (Fig. 3).

[Figure 3: impulse response + variations of resolution]

The results for rpm_out and res_rpm_out are virtually the same and not shown.

Modulation by the other fan & hatch

In the Wind Tunnel, the fans work in tandem, i.e., one pushes air in and another out of the chamber. Thus, if considering a single fan, the relationship between its load and speed is modulated by the speed of the other fan (Fig. 4).

[Figure 4: A: time series and B: steady state, for intake fan load and exhaust fan levels]

The strength of this modulation is itself affected by the hatch position: the coupling between fan speeds is reduced when the hatch is open, creating an additional flow of air to/from the outside (Fig 5.)

[Figure 5: impulses experiment, showing how coupling changes between fan speeds given the hatch position]

Note: the relationship between hatch and the actual hatch position (as measured by hatch_angle ) is itself modulated by the motor parameters (mot_enabled/max/steps), which control the torque of the sensor and its resolution (see the entry for hatch \longrightarrow hatch_angle below)

chevron-rightload_in/out \to current_in/out_rawhashtag

Because the load controls the fan speed, it also affects the electric current drawn by the fan. As for the speed, the change in current does not occur instantaneously (Fig 1a). In steady-state, and keeping all other variables constant, the relationship is cubic (Fig. 1, see also Appendix IV.1.1 of the original paper).

[Figure 6: A: impulse and B: steady state of load_in and current_rpm_in_raw]

The results for load_out and current_rpm_out_raw are virtually the same and not shown.

Modulation via sensor parameters

The chamber provides calibrated (current_in/out) and uncalibrated (current_in_raw, current_out_raw) measurements of the fan speeds. These measurements are affected by the parameters of the underlying analog sensors:

  • offset_current_in/out: the reference voltage of the sensor. Changing it creates an additive shift in the measurements (Fig 1a)

  • sps_current_in/out: controls the oversampling rate of the sensor, controlling the noise-to-signal ratio (i.e., variance, precision) of the underlying measurements (Fig 1b)

  • res_current_in/out: controls the measurement range (resolution) of the sensor. Higher values correspond to smaller measurement ranges, increasing the resolution but risking saturation the sensor if the actual value falls outside this range (Fig 1c).

[Figure 7: constant load, 2 rows (raw / calibrated), 4 columns: A: changes in offset, B: changes in sps, C: changes in res (with saturation)

Caption: Visualization of how the sensor parameters offset/sps/res_current_in affect the raw measurements (top: current_in_raw) and the calibrated measurements of the fan current (bottom: current_in). The calibrated measurements compensate for changes in the reference voltage (offset_*) and resolution (sps_*), as long as saturation does not occur. The effects for *current_out* are the same and not shown.

chevron-rightrpm_out - rpm_inhashtag

In the Wind Tunnel, the fans work in tandem, i.e., one pushes air in and another out of the chamber. Thus, their speeds are coupled, i.e., changes in the speed of one fan will affect the other's, and viceversa.

Modulation by res_rpm_in/out

[Copy from above]

Modulation by hatch

[Copy / adapt from above]

  • impulse response for different hatch positions

  • steady state for different hatch positions

  • constant load and hatch opening

chevron-rightrpm_in/out - pressure_upwind/downwind/intake hashtag

The fans pump air into and out of the chamber, affecting the air pressure inside the tunnel. Thus, their speed (as measured by rpm_in/rpm_out) affect the pressure measurements inside the tunnel (pressure_upwind/downwind) and at its intake (pressure_intake).

[Figure 8: A: Impulse response (take one impulse from the dataset); B: steady state fan speeds vs. pressure heatmap]

Note: All pressure measurements are affected by the ambient atmospheric pressurearrow-up-right at the location of our lab. To control for this effect, the variable pressure_ambient provides a direct measurement of ambient atmospheric pressure that is unaffected by the other chamber variables.

Modulation via hatch

The effect of the fan speeds on the air pressure is modulated by the hatch position, which controls an additional flow of air to/from the outside

  • when the hatch is open, more air can escape, reducing the maximum possible change in the measurements pressure_upwind/downwind (Fig 2).

  • [CHECK] when the hatch is open, the impedancearrow-up-right of the system is reduced, increasing the airspeed over the intake barometer and decreasing the measurement pressure_intake .

[Figure 9: A: data from impulse experiment for different hatch positions, B: maybe steady-state heatmap?]

Note: the relationship between hatch and the actual hatch position (as measured by hatch_angle ) is itself modulated by the motor parameters (mot_enabled/max/steps), which control the torque of the sensor and its resolution (see the entry for hatch \longrightarrow hatch_angle below)

Modulation via osr_pressure_upwind/downwind/intake

The variables osr_pressure_upwind/downwind/intake/ambient set the oversampling rate of the barometers, affecting the noise-to-signal ratio (i.e., variance, precision) of the resulting measurements.

[Figure 10: time series from original paper figure, repeat the osr_barometersarrow-up-right experiment]

chevron-righthatch \to hatch_anglehashtag

The variable hatch_angle produces a measurement of the hatch position using a rotary encoder. Under normal functioning of the motor that controls the hatch, the relationship between hatch and hatch_angle is the identity, up to the quantization error of the sensor (Fig 1.a).

[Figure 11, hatch vs hatch angle: A: normal operation, B: different step sizes, C: effect of current (high, standard, low)]

Modulation via the motor parameters mot_enable/steps/max

The behaviour of the motor that opens the hatch can be controlled via three parameters:

  • mot_enabled : whether the motor is enabled. When not (mot_enabled=0), changes in hatch have no effect on the actual hatch position, as measured by hatch_angle.

  • mot_steps: controls the resolution of the motor, i.e., the number of steps per revolution.

  • mot_max : controls the amount of electric current that flows through the motor. At very high values, the motor may exhibit oscillation after large movements (Fig 2c). For small values of mot_steps and mot_max , the motor can miss steps, creating a mismatch between

Last updated