上QQ阅读APP看书，第一时间看更新

Defining unusual

Anomaly detection is something almost all of us have a basic intuition on. Humans are quite good at pattern recognition, so it should be of no surprise that if I asked a hundred people on the street "what's unusual?" in the following graph, a vast majority (including non-technical people) would identify the spike in the green line:

Similarly, let's say we asked "what's unusual?" using the following picture:

We will, again, likely get a majority that rightly claim that the seal is the unusual thing. But, people may struggle to articulate in salient terms the actual heuristics that are used in coming to those conclusions.

In the first case, the heuristic used to define the spike as unusual could be stated as follows:

Something is unusual if its behavior has significantly deviated from an established pattern or range based upon its past history

In the second case, the heuristic takes the following form:

Something is unusual if some characteristic of that entity is significantly different than the same characteristic of the other members of a set or population

These key definitions will be relevant to Elastic ML, as they form the two main fundamental modes of operation of the anomaly detection algorithms. As we will see, the user will have control over what mode of operation is employed for a particular use case.