http://mlwiki.org/index.php?title=Genetic_Process_Miner&feed=atom&action=historyGenetic Process Miner - Revision history2024-03-29T07:44:38ZRevision history for this page on the wikiMediaWiki 1.25.3http://mlwiki.org/index.php?title=Genetic_Process_Miner&diff=822&oldid=prevAlexey at 11:24, 5 June 20182018-06-05T11:24:42Z<p></p>
<table class='diff diff-contentalign-left'>
<col class='diff-marker' />
<col class='diff-content' />
<col class='diff-marker' />
<col class='diff-content' />
<tr style='vertical-align: top;'>
<td colspan='2' style="background-color: white; color:black; text-align: center;">← Older revision</td>
<td colspan='2' style="background-color: white; color:black; text-align: center;">Revision as of 11:24, 5 June 2018</td>
</tr><tr><td colspan="2" class="diff-lineno" id="L132" >Line 132:</td>
<td colspan="2" class="diff-lineno">Line 132:</td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>Log-level fitness is  </div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>Log-level fitness is  </div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>* fitness, calculated for each trace and aggregated  </div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>* fitness, calculated for each trace and aggregated  </div></td></tr>
<tr><td class='diff-marker'>−</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>* <del class="diffchange diffchange-inline">$</del>\text{fitness}(L, N) = \cfrac{1}{2} \left( 1 - \cfrac{\sum_{\sigma \in L} L(\sigma) \times m_{N, \sigma}}{\sum_{\sigma \in L} L(\sigma) \times c_{N, \sigma} } \right)</div></td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>* <ins class="diffchange diffchange-inline"><math></ins>\text{fitness}(L, N) = \cfrac{1}{2} \left( 1 - \cfrac{\sum_{\sigma \in L} L(\sigma) \times m_{N, \sigma}}{\sum_{\sigma \in L} L(\sigma) \times c_{N, \sigma} } \right)</div></td></tr>
<tr><td class='diff-marker'>−</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>+ \cfrac{1}{2} \left( 1 - \cfrac{\sum_{\sigma \in L} L(\sigma) \times r_{N, \sigma}}{\sum_{\sigma \in L} L(\sigma) \times p_{N, \sigma} } \right)<del class="diffchange diffchange-inline">$</del></div></td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>+ \cfrac{1}{2} \left( 1 - \cfrac{\sum_{\sigma \in L} L(\sigma) \times r_{N, \sigma}}{\sum_{\sigma \in L} L(\sigma) \times p_{N, \sigma} } \right)<ins class="diffchange diffchange-inline"></math></ins></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>* $L(\sigma)$ how many times the trace $\sigma$ occurred in log $L$</div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>* $L(\sigma)$ how many times the trace $\sigma$ occurred in log $L$</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td></tr>
</table>Alexeyhttp://mlwiki.org/index.php?title=Genetic_Process_Miner&diff=374&oldid=prevAlexey at 14:53, 8 February 20142014-02-08T14:53:55Z<p></p>
<p><b>New page</b></p><div>== Genetic Process Miner ==<br />
This is an algorithm from the family of [[Genetic Algorithms]] for [[Process Mining|mining business processes]]<br />
* it's much more resilient to noise <br />
* allows for incremental improvement <br />
* can be combined with other approaches <br />
<br />
<br />
=== Overview ===<br />
https://raw.github.com/alexeygrigorev/wiki-figures/master/ulb/bpm/pm/gm-overview.png<br />
* first we create the initial population<br />
** can just randomly create some [[Petri Nets]], randomly adding places <br />
** or by applying [[Alpha Algorithm]] or [[Region-Based Process Miner]]<br />
* at each step we compute fitness for all the instances of a population<br />
* elitism - process of keeping the best ones<br />
* then all "survived" instances are considered as "parents"<br />
** cross-over - process of combining different petri nets (these petri-nets are already good solutions)<br />
** mutation adding some random changes for diversifying<br />
<br />
<br />
=== Cross-Over ===<br />
Cross-over<br />
* a process of producing a "child" by two parental instances <br />
* selecting parents<br />
** completely at random<br />
** using fitness to quicker converge to optima<br />
** note that we should not always select only "the best" - we need diversity<br />
<br />
<br />
How to create a petri net from two other petri nets?<br />
* suppose we have two parental petri nets:<br />
* https://raw.github.com/alexeygrigorev/wiki-figures/master/ulb/bpm/pm/gm-crossover-p.png<br />
* we may find some ''minimal cut'', cut the parents and make children from then <br />
** minimal cut is the minimal number of transitions to remove s.t. the net becomes completely disconnected <br />
** the same as in the [[Graph]] Theory: [[Minimal Cut]]<br />
* in this example: <br />
** can cut in $(e, f)$ because in both cases <br />
** from $e$ it goes to $g,h$ and on the left you have $a,b,c,d$ <br />
** so it's ideal <br />
* we take $(e, f)$ out and get two disconnected components <br />
** combine left and right parts of the petri nets to form children <br />
* https://raw.github.com/alexeygrigorev/wiki-figures/master/ulb/bpm/pm/gm-crossover-c.png<br />
<br />
<br />
=== Mutation ===<br />
Examples of mutations:<br />
* remove or add a place<br />
* add an arc<br />
<br />
<br />
=== Parameters ===<br />
So there are quite a few parameters that we have to provide:<br />
* objective function we want to optimize (below)<br />
* the way we do the cross-over<br />
* how do you select parents?<br />
* the way we do the mutations<br />
* elitism - top 10%, certain threshold, etc<br />
* how large the population should be<br />
* how many generations you'll run before termination<br />
* when we kill some instances?<br />
<br />
<br />
Change one parameter<br />
* $\Rightarrow$ different solution in the end<br />
<br />
Also <br />
* a lot of randomness is involved <br />
* it may be computationally expensive<br />
<br />
<br />
<br />
== Cost Function: Fitness ==<br />
=== Trace Level Fitness ===<br />
Define ''trace-level fitness'' as: <br />
* $\text{fitness}(\sigma, N) = \cfrac{1}{2} \left( 1 - \cfrac{m}{c} \right) + \cfrac{1}{2} \left( 1 - \cfrac{r}{p} \right)$<br />
* $m$ - # of missing tokens<br />
* $c$ - # of consumed tokens<br />
* $r$ - # of tokens left over when something reaches the output place<br />
* $p$ - # of produced tokens<br />
<br />
<br />
=== Example 1 ===<br />
Consider this example:<br />
* https://raw.github.com/alexeygrigorev/wiki-figures/master/ulb/bpm/pm/gm-fit-ex1.png<br />
* logtrace: $\sigma_1 = \langle abeg \rangle$<br />
* $m = 0, c = 0, r = 0, p = 1$ <br />
** $p = 1$ because there's a token in the input place<br />
* $a$ fires - it generates 2 tokens <br />
** $p \leftarrow p + 2 = 3$<br />
** $c \leftarrow c + 1 = 1$<br />
* $b$ fires, produces one token, consumes one token<br />
** $p \leftarrow p + 1 = 4$<br />
** $c \leftarrow c + 1 = 2$<br />
* $e$ needs to fire<br />
** but it cannot: one token is missing, so we add it and fire $e$<br />
** $p \leftarrow p + 1 = 5$<br />
** $c \leftarrow c + 2 = 4$<br />
** $m \leftarrow m + 1 = 1$<br />
* $g$ fires and we're done<br />
** $p \leftarrow p + 1 = 6$<br />
** $c \leftarrow c + 1 = 5$<br />
* but there's one remaining token that was left over after firing $a$ <br />
** $r \leftarrow r + 1 = 1$<br />
* and finally we remove one token from the output place<br />
** $c \leftarrow c + 1 = 6$<br />
* $\text{fitness}(\sigma_1, N_1) = \cfrac{1}{2} \left( 1 - \cfrac{1}{6} \right) + \cfrac{1}{2} \left( 1 - \cfrac{1}{6} \right) = \cfrac{5}{6}$<br />
<br />
<br />
=== Example 2 ===<br />
https://raw.github.com/alexeygrigorev/wiki-figures/master/ulb/bpm/pm/gm-fit-ex2.png<br />
* logtrace: $\sigma_2 = \langle adceh \rangle$<br />
* $p = 1, c = 0, m = 0, r = 0$<br />
* $a$ fires, produces 1, consumes 1<br />
** $p \leftarrow 2, c \leftarrow 1$<br />
* $d$ needs to fire<br />
** but there's no token in the place before $d$, so we need to put it there<br />
** $m \leftarrow 1$<br />
** $d$ fires: $p \leftarrow 3, c \leftarrow 2$<br />
* $c$ fires<br />
** $p \leftarrow 4, c \leftarrow 3$<br />
** note that $c$ and $d$ were executed in order different from the order that the model can produce<br />
* $e$ fires <br />
** $p \leftarrow 5, c \leftarrow 4$<br />
* $h$ fires<br />
** $p \leftarrow 6, c \leftarrow 5$<br />
* we're done<br />
** one token is left: $r \leftarrow 1$<br />
** taking the last token from $c$: $c \leftarrow 6$<br />
* $\text{fitness}(\sigma_2, N_2) = \cfrac{1}{2} \left( 1 - \cfrac{1}{6} \right) + \cfrac{1}{2} \left( 1 - \cfrac{1}{6} \right) = \cfrac{5}{6}$<br />
<br />
<br />
=== Log Level Fitness ===<br />
Log-level fitness is <br />
* fitness, calculated for each trace and aggregated <br />
* $\text{fitness}(L, N) = \cfrac{1}{2} \left( 1 - \cfrac{\sum_{\sigma \in L} L(\sigma) \times m_{N, \sigma}}{\sum_{\sigma \in L} L(\sigma) \times c_{N, \sigma} } \right)<br />
+ \cfrac{1}{2} \left( 1 - \cfrac{\sum_{\sigma \in L} L(\sigma) \times r_{N, \sigma}}{\sum_{\sigma \in L} L(\sigma) \times p_{N, \sigma} } \right)$<br />
* $L(\sigma)$ how many times the trace $\sigma$ occurred in log $L$<br />
<br />
<br />
<br />
== Links ==<br />
* http://www.processmining.org/blogs/pub2006/genetic_process_mining<br />
<br />
== Sources ==<br />
* [[Business Process Management (ULB)]]<br />
<br />
[[Category:Business Process Management]]<br />
[[Category:Process Mining]]</div>Alexey