Process Mining

business-process-management process-mining

Suppose we have a log of our process, but do not have the model of this process

A process mining algorithm is a function that

play-in = process discovery

the ‘‘ability to rediscover’’

is a property of a process mining algorithm to discover a model of some process
from logs that have been generated from this model
i.e. in case of Petri Nets, if $N$ is the original model and $N’$ is the discovered model, then $N \equiv N’$

Usually, first a Petri Net model is discovered

Measures

There are four conflicting criteria

Fitness
- the discovered network should allow the behavior seen in the logs
Precision
- the discovered network should not allow the behavior not seen in the logs
- too precise $\to$ bad generalization
Generalization
- the discovered model should generalize the behavior seen in the logs
Simplicity
- it should be as simple as possible
- too simple $\to$ low fitness

The main challenge of Process Mining is that all these criteria are conflicting:
It’s really hard to simultaneously satisfy all of them
this makes Process Mining to be a Multi-Objective Optimization problem

Can we replay the log?

Do we underfit the log?

if produced logs $\subseteq$ original logs

Do we overfit the log?

The simpler the model - the better

So Process Mining is difficult

These algorithms let you find a Petri Net from logs

$\alpha$ and $\alpha^+$ algorithms - simple, but tend to overfit, very susceptible to noise in logs
Region-Based Process Miner - state-based approach, still susceptible to noise
Genetic Process Miner - good performance, much less susceptible to noise

✏️ Edit on GitHub