Simulation of a Directly-Follows Graph (transient analysis)

Simulation of a DFG (transient analysis)

A time-related simulation permits to know how probable is that a process execution is terminated after a given amount of time. This leads to a better estimation of Service Level Agreements, or a better identification of the process instances that are most likely to have an high throughput time.

All this starts from a “performance” DFG, for example the one discovered from the running-example log

from pm4py.objects.log.importer.xes import importer as xes_importer
log = xes_importer.apply("C:/running-example.xes")
from pm4py.algo.discovery.dfg import algorithm as dfg_discovery
dfg_perf = dfg_discovery.apply(log, variant=dfg_discovery.Variants.PERFORMANCE)
from pm4py.statistics.start_activities.log import get as start_activities
from pm4py.statistics.end_activities.log import get as end_activities
sa = start_activities.get_start_activities(log)
ea = end_activities.get_end_activities(log)

<Technical Details>

For the simulation model, a  CTMC (Continuous Time Markov Chain) is built from the DFG. This model is very powerful, assuming that the frequency of the edges outgoing an activity is similar. If that is not the case (for example, when an outgoing edge has frequency 1 and the other 10000) the model works less well.

In order to ensure that the DFG contains, as much as possible, only frequent arcs, a filtering operation needs to be applied. For example, it is possible to use the variants-based filtering on the log. An example of application of variants-based filtering is:

from pm4py.algo.filtering.log.variants import variants_filter

log = variants_filter.apply_auto_filter(log)

Given that the edge contains the average of time between the states, it is assumed by the CTMC that the distribution of times follows an exponential distribution with the given average.

</Technical Details>

The simulation model can be easily constructed by doing

from pm4py.objects.stochastic_petri import ctmc
reach_graph, tang_reach_graph, stochastic_map, q_matrix = ctmc.get_tangible_reachability_and_q_matrix_from_dfg_performance(dfg_perf, parameters={"start_activities": sa, "end_activities": ea})
print(tang_reach_graph.states)

The last line prints the states of the model, that are:

{reinitiaterequest1, sink1, source1, paycompensation1, registerrequest1, decide1, rejectrequest1, examinethoroughly1, checkticket1, examinecasually1}

“source1” is the source state of the model (that is implicitly connected to the start activity “register request”). “sink1” is the terminal state of the model (that is implicitly connected to the end activities “pay compensation” and “reject request”). The other states of the model are the ones in which you go after executing the corresponding activity (for example, “decide1” is the state in which you sink after a “Decide” activity).

Starting from “source1”, we would like to know how much probable is that a process execution is already over after 2 days. To do that, we perform a transient analysis starting from the state “source1” specifiying as 172800 (2 days) the number of seconds:

# pick the source state
state = [x for x in tang_reach_graph.states if x.name == "source1"][0]
# analyse the distribution over the states of the system starting from the source after 172800.0 seconds (2 days)
transient_result = ctmc.transient_analysis_from_tangible_q_matrix_and_single_state(tang_reach_graph, q_matrix, state,
                                                                                 172800.0)

We get the following dictionary as output:

Counter({examinethoroughly1: 0.2962452842059696, sink1: 0.22722259091795302, decide1: 0.21581166958939804, checkticket1: 0.14875795862276098, examinecasually1: 0.10303611911547725, reinitiaterequest1: 0.008919293375341046, registerrequest1: 7.082354169889813e-06, rejectrequest1: 1.399193916663259e-09, paycompensation1: 4.1973640443858166e-10, source1: 9.448511222571739e-23})

That means that we have the 22,72% of probability to have already finished the process execution (being in the “sink” state) after 2 days. Let’s calculate that for 100 days (8640000 seconds):

Counter({sink1: 0.9999999999459624, examinethoroughly1: 2.439939596108015e-11, decide1: 1.773740724895241e-11, checkticket1: 7.016711086804287e-12, examinecasually1: 4.146293819445992e-12, reinitiaterequest1: 7.377366868580615e-13, rejectrequest1: 1.1499875063552587e-19, paycompensation1: 3.449783588381732e-20, registerrequest1: 3.227610398476988e-258, source1: 1.3702507295618392e-274})

According to the model, we have probability 99,999999995 % to have finished the process after 100 days! That is practically 100%.

Suppose we know how much probable is that, after a decision, the end of the process is reached in 10 days.

This can be done:

state = [x for x in tang_reach_graph.states if x.name == "decide1"][0]
transient_result = ctmc.transient_analysis_from_tangible_q_matrix_and_single_state(tang_reach_graph, q_matrix, state,
                                                                                   864000.0)
print(transient_result)

.

The result is:

Counter({sink1: 0.9293417034466963, examinethoroughly1: 0.03190639167001194, decide1: 0.02319465799554023, checkticket1: 0.009172805258533999, examinecasually1: 0.005419723629171662, reinitiaterequest1: 0.0009647178045538061, rejectrequest1: 1.5038030381048534e-10, paycompensation1: 4.511175132201897e-11, registerrequest1: 0.0, source1: 0.0})

so we have probability 92,9% to “sink” in 10 days after a decision.

.