RegionBased Process Miner
This is a statebased approach to Process Mining
Motivation
Consider the following Petri Net:
Suppose that for each place $p_i$ we identify the set of nodes $R_i$ in the graph (i.e. the set of markings) where $p_i$ has a token:
 $p_1: R_1 \equiv \{[p_1]\}$
 $p_2: R_2 \equiv \{[p_2, p_3], [p_2, p_5]\}$
 $p_3: R_3 \equiv \{[p_2, p_3], [p_3, p_4]\}$
 $p_4: R_4 \equiv \{[p_3, p_4], [p_4, p_5]\}$
 $p_5: R_5 \equiv \{[p_2, p_5], [p_4, p_5]\}$
 $p_6: R_6 \equiv \{[p_6]\}$

If we look carefully, we see that:
 for place $p_2$ in the region $R_2$
 $a$ brings us into the region $R_2$ and $b$ takes us out this region
 meanwhile transitions $c$ and $d$ are always either completely inside $R_2$ or completely outside $R_2$
 for $p_4$ in $R_4$
 $b$ brings in, $e$ takes out
 again $c$ and $d$ do not cross the boundaries of $R_4$
We can notice the same patterns for all the regions
 there are transitions that bring in and there are transitions that take out from the region
 and such transitions are connected with a place
 there also are transitions that keep us inside the region
Transition System
But usually we do not know where are the tokens, we may know only that there are some states:

 this is called a transition system: we know the states, but don't know which places have tokens
 yet we still can use this idea to recognize the structural properties
 and try to reconstruct the original model
How?
 create a transition system from logs
 find groups of nodes  regions  in the transition system
 add places to capture the behavior
Creating a Transition System
How to create a transition system?
 solution: an educated guess.
A state may depend on:
 (1) actions taken so far, and the number of times they were executed
 (2) actions we will do in the future
 (1) and (2)
So there are many way of "learning" the transition system
 this is about guessing and trial and errors
 and a little bit of knowledge of the domain is also helpful to choose the right abstraction
Past Without Abstraction
Sometimes also called "prefix automaton"
 to define the state we look at what we've seen so far
 we don't make any abstractions:
 i.e. the sequence $\langle a, b \rangle \ne \langle b, a \rangle $
Algo:
 at the beginning the pathsofar is empty
 after $a$ fires, you add it to the sequences of the seen events
For log $L = [abcd, acbd, acd]$ we have:
Downsides:
 not generalizing at all
 end up with one path for each trace
 want to "abstract" from this: want states to join after a while
Future Without Abstraction
Based on things that you will see in the logs
 some states can be merged with this approach
For log $L = [abcd, acbd, acd]$ we have:
Better
 we see that it can generalize a bit
 i.e. recognize the same suffixes of different trances and join them
Past with Multiset Abstraction
Now we add some abstraction:
 order in which we have seen the traces in not important anymore
 two states are considered to be the same if all activities we've seen so far fired equal number of times
 i.e. the tracessofar form a multiset
 and $[a, b, c] \equiv [a, c, b]$
 this works well for Petri Nets as long as there are no transitions with the same name
For log $L = [abcd, acbd, acd]$ we have:
Only Last Event Matters
Another abstraction:
 we keep only one most recently seen event
 so it's a Markov Chain with memory of 1
 and what you can do next depends only on what you've just seen
For log $L = [abcd, acbd, acd]$ we have:
Only Next Event Matters
Analogously,
 but here we care only about the next event
For log $L = [abcd, acbd, acd]$ we have:
Note:
 we see that firing $a$ can bring us to 3 different states
 not deterministic!
 you'll never see such thing in a Reachability Graph of a Petri Net
 most likely wrong abstraction
Examples
Consider this log: $L_2 = [abcd, abcdce, acbe, acdbce, acbdce]$
 for each trace in the log we try build a branch in the transition system
Past with Set Abstraction
$L_2 = [abcd, abcdce, acbe, acdbce, acbdce]$
$abcd$


$abcd$


$acbe$


$acdbce$


$acbdce$


But because of the set abstraction we must have lost some information:

 there's a selfloop that keeps us in the same state
 so there may be something wrong  we must make sure such things are expected
 in this log it's clear that it's not the case (we never see anything like $...cc...$, $...ccc...$, etc)
 so it allows too much behavior that we don't see in the logs
Conclusions
We see that it may be very hard to select the right way of constructing the transition system
 we need to try and see
 select an abstraction
 construct the transition system from it
 try to extract a petri net
 evaluate quality
 see if it makes sense
 some domain knowledge is always helpful
Theory of Regions
Regions
a region $R$ is a set of states of a transition system $T$ s.t.
 if a transition $t$ exits $R$, then all arcs labeled with $t$ must exist $R$
 if $t$ enters $R$, then all arcs labeled with $t$ must enter
 all other transitions $t$ must not cross the boundaries of $R$
Example 1
Consider this transition system:
Identify the following regions:

 (1): noting enters, only $a$ leaves: $a$ is connected to the input place $i$
 (2): $b$ enters, $e$ leaves, others don't cross the boundary: collect $b$ and $e$ via a place
 (3): $a$ enters, $b$ leaves: connect them via a place
 (4): connect $e$ to the output place $o$
 so we have:

Next, find two mode regions:

 (5): $a$ enters, $d$ leaves, connect them
 (6): $d$ enters, $e$ leaves, connect them

And finally:

 (7) $a$ enters, $c$ leaves
 (8) $c$ enters, $e$ leaves

So the discovered network is:
Examples of not regions:

 this is not a region: $c$ is both inside the region and exists it
 the same for $b$ and $d$
 it it corresponded to a place, it would be a very strange place:
 sometimes when a token is in $c$ it cold fire, but it would keep the token
 and sometimes it would consume the token
Example 2
For example, consider the following transition system:

 we see that there are 7 transitions: $A,B,C,D,E,F,G$

how to connect them?

 nothing enters this region
 transitions $A$ and $B$ leave the region
 so we must connect the input place with these transitions

 for this region, $B$ enters, $E$ leaves
 so we connect $B$ and $E$ with a place

 $A$ and $D$ enter; $C$ leave
 so we add one place, and connect $A$ and $D$ to it, and the place to $C$

 analogously we connect $A$ and $D$ via a place

 $C$ enters the region, $G$ and $F$ leave
 we connect $C$ to a place, and the place to $G$ and $F$


 and finally, $F$ and $E$ are connected to the final place
Minimal Regions
It's important to have regions as minimal as possible
 otherwise a region may correspond to several places
Regions

 all nodes together form a region
 all nodes except for the first and last one  also form a region
 enter by doing $a$, exit by doing $e$
 but we want to find minimal nontrivial regions
Trivial regions
 empty region (nothing takes in, nothing takes out  a region as well)
 all nodes together
All other regions:
 combination of two or more minimal regions
Properties
Let $S$ be a set of all states of a transition system
 trivial regions: $S$, $\varnothing$
 compliments: if $R$ is a region, then $S  R$ is also a region
 but the complimentary region is not necessarily minimal
Basic Algorithm
Compute $S^*$  the set of all minimal regions
 for each $R \in S^*$ generate a place:
 add arcs between
 postregion of $R$ (transitions that leave $R$)  output for that place
 preregion of $R$ (transitions that enter $R$)  input for that place
 add a token to each place that correspond to initial regions
The result  is the minimal saturated net
Limits
Loops of Length 1
 in this example $b$ can fire $\infty$ number of times
 but you cannot detect it  $b$ cannot be a preregion and a postregion at the same time
 there are mechanisms to overcome this problem  like in the $\alpha^+$ algorithm
Susceptibility to Noise
Suppose by mistake instead of $b$ you have $bb$
 that can completely spoil your logs
 so first we need to preprocess logs to make sure there are no noise
 or we can weaker the "region" condition  assume that it's enough to have transitions completely inside or outside only in 99% of cases
Sources