Simulation of a Directly-Follows Graph (Monte Carlo simulation)

Simulation of a DFG (Monte Carlo simulation)

A time-related simulation permits to know how probable is that a process execution is terminated after a given amount of time. This leads to a better estimation of Service Level Agreements, or a better identification of the process instances that are most likely to have an high throughput time.

All this starts from a “performance” DFG, for example the one discovered from the running-example log

from pm4py.objects.log.importer.xes import importer as xes_importer
log = xes_importer.apply("C:/running-example.xes")
from pm4py.algo.discovery.dfg import algorithm as dfg_discovery
dfg_perf = dfg_discovery.apply(log, variant=dfg_discovery.Variants.PERFORMANCE)
from pm4py.statistics.start_activities.log import get as start_activities
from pm4py.statistics.end_activities.log import get as end_activities
sa = start_activities.get_start_activities(log)
ea = end_activities.get_end_activities(log)

and the knowledge of the case arrival ratio. The case arrival ratio is the amount of time that passes (in average, or median) between the arrival of two consecutive cases. It can be provided by the user or inferred from the event log. The inference from the event log is done by using the following command:

from pm4py.statistics.traces.log import case_arrival
ratio = case_arrival.get_case_arrival_avg(log)
print(ratio)

Using the DFG mining approach, it is possible to retrieve a Petri net model from the DFG. This kind of models is the “default” one for Monte Carlo simulation (because its execution semantics is very clear). Moreover, the Petri net extracted by the DFG mining approach is a sound workflow net (that gives other good properties to the model). The DFG mining approach can be applied in the following way:

from pm4py.objects.conversion.dfg import converter
net, im, fm = converter.apply(dfg_perf, variant=converter.Variants.VERSION_TO_PETRI_NET_ACTIVITY_DEFINES_PLACE,
                              parameters={converter.Variants.VERSION_TO_PETRI_NET_ACTIVITY_DEFINES_PLACE.value.Parameters.START_ACTIVITIES: sa,
                                          converter.Variants.VERSION_TO_PETRI_NET_ACTIVITY_DEFINES_PLACE.value.Parameters.END_ACTIVITIES: ea})

 

To perform a basic Montecarlo simulation, the following code can be used. The following is a sort of resource-constrained simulation, where it is assumed that a place can hold at most 1 token per time. Later, we will see how to provide an higher number of tokens that can be hosted by a place.

from pm4py.simulation.montecarlo import simulator as montecarlo_simulation
from pm4py.algo.conformance.tokenreplay.algorithm import Variants
parameters = {}
parameters[
    montecarlo_simulation.Variants.PETRI_SEMAPH_FIFO.value.Parameters.TOKEN_REPLAY_VARIANT] = Variants.BACKWARDS
parameters[montecarlo_simulation.Variants.PETRI_SEMAPH_FIFO.value.Parameters.PARAM_CASE_ARRIVAL_RATIO] = 10800
simulated_log, res = montecarlo_simulation.apply(log, net, im, fm, parameters=parameters)

During the replay operation, some debug messages are written to the screen. The main outputs of the simulation process are:

  • simulated_log: the traces that were simulated during the simulation.
  • res: the result of the simulation (Python dictionary)

Among res, that is the result of the simulation, we have the following keys:

  • places_interval_trees: an interval tree for each place, that hosts an interval for each time when it was “full” according to the specified maximum amount of tokens per place.
  • transitions_interval_trees: an interval tree for each transition, that contains all the time intervals in which the transition was enabled but not yet fired (so, the time between a transition was fully enabled and the consumption of the tokens from the input places)
  • cases_ex_time: a list containing the throughput times for all the cases of the log
  • median_cases_ex_time: the median throughput time of the cases in the simulated log
  • input_case_arrival_ratio: the case arrival ratio that was provided by the user, or automatically calculated from the event log.
  • total_cases_time: the difference between the last timestamp of the log, and the first timestamp of the simulated log.

The last four items of the previous list are simple Python objects (floats and lists in the specific). The interval trees objects can be used in the following way to get time-specific information. For example, the following code snippet:

import random
last_timestamp = max(event["time:timestamp"] for trace in log for event in trace).timestamp()
first_timestamp = min(event["time:timestamp"] for trace in log for event in trace).timestamp()
pick_trans = random.choice(list(res["transitions_interval_trees"]))
print(pick_trans)
n_div = 10
i = 0
while i < n_div:
    timestamp = first_timestamp + (last_timestamp - first_timestamp)/n_div * i
    print("\t", timestamp, len(res["transitions_interval_trees"][pick_trans][timestamp]))
    i = i + 1

Prints for a random transition in the model, the number of intervals that are overlapping for 11 different points (including the minimum and the maximum timestamp in the log) that are uniformly distributed across the time interval of the log.

The following code snippet instead prints, for a random transition in the model, the number of intervals that are overlapping for 11 different points (including the minimum and the maximum timestamp of the log) that are uniformly distributed across the time interval of the log:

import random
last_timestamp = max(event["time:timestamp"] for trace in log for event in trace).timestamp()
first_timestamp = min(event["time:timestamp"] for trace in log for event in trace).timestamp()
pick_place = random.choice(list(res["places_interval_trees"]))
print(pick_place)
n_div = 10
i = 0
while i < n_div:
    timestamp = first_timestamp + (last_timestamp - first_timestamp)/n_div * i
    print("\t", timestamp, len(res["places_interval_trees"][pick_place][timestamp]))
    i = i + 1

The information can be used to build some graphs like these (using external programs such as Microsoft Excel):

In the previous graph, different simulations with different case arrival ratios have been performed to see how the total execution time (recorded in the variable total_cases_time at the end of the simulation) changes

In the previous graph, different simulations with different case arrival ratios have been performed to see how the median execution time (recorded in the variable median_cases_ex_time at the end of the simulation).

The previous graph highlights which transitions are problematic at the increase of the case arrival ratio (so cases become more frequent).

  • An event log and a model (DFG) is considered.
  • Internally in the simulation, a replay operation is done between the log and the model.
  • The replay operation leads to the construction of a stochastic map that associates to each transition a probability distribution (for example, a normal distribution, an exponential distribution …). The probability distribution that maximizes the likelihood of the observed values during the replay is chosen. The user can force a specific transition (like exponential) if he wants.
  • Moreover, during the replay operation, the frequency of each transition is found. That helps in picking in a “weighted” way one of the transitions enabled in a marking, when the simulation occurs.
  • The simulation process occurs. For each one of the trace that are generated (the distance between the start of them is fixed) a thread is spawned, stochastic choices are made. The possibility to use a given place (depending on the maximum number of resources that is possible to use) is given by a semaphore object in Python.
  • A maximum amount of time is specified for the simulation. If one or more threads exceed that amount of time, the threads are killed and the corresponding trace is not added to the simulation log.

Hence, several parameters are important in order to perform a Monte Carlo simulation. These parameters, that are inside the petri_semaph_fifo class, are (ordered by importance)

  • PARAM_NUM_SIMULATIONS: number of simulations that are performed (the goal is to have such number of traces in the model)
  • PARAM_CASE_ARRIVAL_RATIO: the case arrival ratio that is specified by the user.
  • PARAM_MAP_RESOURCES_PER_PLACE: a map containing for each place of the Petri net the maximum amount of tokens
  • PARAM_DEFAULT_NUM_RESOURCES_PER_PLACE: if the map of resources per place is not specified, then use the specified maximum number of resources per place.
  • PARAM_MAX_THREAD_EXECUTION_TIME: specifies the maximum execution time of the simulation (for example, 60 seconds).
  • PARAM_SMALL_SCALE_FACTOR: specifies the ratio between the “real” time scale and the simulation time scale. A higher ratio means that the simulation goes faster but is in general less accurate. A lower ratio means that the simulation goes slower and is in general more accurate (in providing detailed diagnostics). The default choice is 864000 seconds (10 days). So that means that a second in the simulation is corresponding to 10 days of real log.
  • PARAM_ENABLE_DIAGNOSTICS: enables the printing of the simulation diagnostics through the usage of the “logging” class of Python
  • ACTIVITY_KEY: the attribute of the log that should be used as activity
  • TIMESTAMP_KEY: the attribute of the log that should be used as timestamp
  • TOKEN_REPLAY_VARIANT: the variant of the token-based replay to use:
    • token_replay: the classic variant, that cannot handle duplicate transitions.
    • backwards: the backwards token-based replay, that is slower but can handle invisible transitions
  • PARAM_FORCE_DISTRIBUTION: if specified, the distribution that is forced for the transitions (normal, exponential, …)
  • PARAM_DIAGN_INTERVAL: the time interval in which diagnostics should be printed (for example, diagnostics should be printed every 10 seconds).