A Turnstile Solution to Streaming Anomaly Detection

Author:

Matt Gordon

Date Published:
March 24, 2017

In a vast sea of time series data, it can be difficult to determine which sequences of events constitute anomalies or sequences of interest.  Many time series datasets hold millions or billions of events which are individually not very interesting, even if it were possible for anyone to look closely at each one. However, when patterns or anomalies can be detected in sequences of events, more concentrated action can be taken on a much smaller subset of data.

Turnstile, a tool developed by Elder Research, ingests streaming event data and fires simple or complex triggers and alerts when interesting combinations of event sequences occur.  Turnstile uses a finite state machine (FSM) to process time series data one event at a time in chronological order. It functions like a smart turnstile for data; every record must pass through and get vetted at its gate.

Turnstile works by using a list of criteria defined by the user or developer to define which events are of particular interest. It contains several different tools and functions able to detect anomalies.  “A+B+C” analysis looks for a certain sequence of events. For example, a Turnstile trigger could detect when a customer buys both beer and diapers in a set amount of time, or if the sequence ever happens, regardless of the amount of time that passes.

Turnstile also has an exhaustive “bean-counting” capability that tracks an event’s statistics to compare it with the distribution of other events.  It can track statistics on the level of individual actors or globally across all actors.   The stats are measured over a moving window set to any length, such as over the past week, two weeks, month, etc. Anomolies can be triggered based on measures such as the mean, standard deviation, or z-score.  For example a trigger could be set to fire when an actor buys a quantity of fertilizer more than 3 standard deviations above the mean.  The moving statistics window can perform many mathematical transformations and calculations on numerical data and a range of simple functions on textual data.  Examples include arithmetic calculations, trigonometric functions for numeric fields, or trim and capitalization for text fields.

 

Functionally, a queue controller streams data through a Finite State Machine (FSM) at the heart of Turnstile.  XML configuration files define states and statistics of interest which are tracked in cache memory.  When a condition of interest is met, Turnstile fires a trigger, logging the anomaly and reporting it to stakeholders.

One example of note is the use of trigonometric functions to detect time anomalies.  For instance, say we want to know when customers make purchases at an unusual time.  First, plot each purchase time as a point on a 24-hour clock represented by a unit circle.  Using arithmetic and trigonometric functions translate each point to the Cartesian coordinate system.  There, it is easy to calculate the distance between points — a measure of the distance between different points in time.  Within Turnstile, one can define and create rolling stats which use the standard deviation, mean, or z-score of the distance, to create triggers based on the different times customers made a purchase.   This type of anomaly detection could also be used to identify insider threats who are accessing systems outside of normally expected times.

Turnstile can handle anomaly detection in a number of different ways depending on the specific instance and application.  Using simple A+B+C logic, rolling statistics, or multiple data transformations, Turnstile can provide valuable insights that might otherwise be overlooked in large datasets.

Want to learn more?

Request a consultation to speak to a data analytics consultant about how Elder Research can use Turnstile to help achieve your anomaly detection goals.
Contact Us