Token-based replay diagnostics

The execution of token-based replay in PM4Py permits to obtain detailed information about transitions that did not execute correctly, or activities that are in the log and not in the model. In particular, executions that do not match the model are expected to take longer throughput time.

The diagnostics that are provided by PM4Py are the following:

  • Throughput analysis on the transitions that are executed in an unfit way according to the process model (the Petri net).
  • Throughput analysis on the activities that are not contained in the model.
  • Root Cause Analysis on the causes that lead to an unfit execution of the transitions.
  • Root Cause Analysis on the causes that lead to executing activities that are not contained in the process model.

To provide an execution contexts for the examples, a log must be loaded, and a model that is not perfectly fitting is required. To load the log, the following instructions could be used:

import os
from pm4py.objects.log.importer.xes import factory as xes_importer
log = xes_importer.import_log(os.path.join("tests", "input_data", "receipt.xes"))

To create an unfit model, a filtering operation producing a log where only part of the behavior is kept can be executed:

from pm4py.algo.filtering.log.auto_filter.auto_filter import apply_auto_filter
filtered_log = apply_auto_filter(log)

Then, applying the Inductive Miner IMDFB algorithm:

from pm4py.algo.discovery.inductive import factory as inductive_miner
net, initial_marking, final_marking = inductive_miner.apply(filtered_log)

The following model is obtained:

We then apply the token-based replay with special settings. In particular, with disable_variants set to True we avoid to replay only a case with variant; with enable_pltr_fitness set to True we tell the algorithm to return localized Conformance Checking application.

from pm4py.algo.conformance.tokenreplay import factory as token_based_replay
parameters_tbr = {"disable_variants": True, "enable_pltr_fitness": True}
replayed_traces, place_fitness, trans_fitness, unwanted_activities = token_based_replay.apply(log, net,
                                                                                              initial_marking,
                                                                                              final_marking,
                                                                                              parameters=parameters_tbr)

Then, we pass to diagnostics information.

Throughput Analysis

THROUGHPUT ANALYSIS ON UNFIT EXECUTION OF TRANSITIONS

To perform throughput analysis on the transitions that were executed unfit, and then print on the console the result, the following code could be used:

from pm4py.algo.conformance.tokenreplay.diagnostics import duration_diagnostics
trans_diagnostics = duration_diagnostics.diagnose_from_trans_fitness(log, trans_fitness)
for trans in trans_diagnostics:
    print(trans, trans_diagnostics[trans])

Obtaining the following output:

T02 Check confirmation of receipt {‘n_fit’: 1281, ‘n_underfed’: 35, ‘fit_median_time’: 7735.494, ‘underfed_median_time’: 974758.792, ‘relative_throughput’: 126.01118842571658}
T06 Determine necessity of stop advice {‘n_fit’: 788, ‘n_underfed’: 521, ‘fit_median_time’: 696.8575, ‘underfed_median_time’: 102354.892, ‘relative_throughput’: 146.88066355029545}

Where is clear (in this example) that unfit executions lead to much higher throughput times (from 126 to 146 times higher throughput time).

THROUGHPUT ANALYSIS ON EXECUTIONS CONTAINING ACTIVITIES THAT ARE NOT IN THE MODEL

To perform throughput analysis on the process executions containing activities that are not in the model, and then print the result on the screen, the following code could be used:

from pm4py.algo.conformance.tokenreplay.diagnostics import duration_diagnostics
act_diagnostics = duration_diagnostics.diagnose_from_notexisting_activities(log, unwanted_activities)
for act in act_diagnostics:
    print(act, act_diagnostics[act])

Obtaining the following output:

T03 Adjust confirmation of receipt {‘n_containing’: 37, ‘n_fit’: 1282, ‘fit_median_time’: 1189.2385, ‘containing_median_time’: 1031740.384, ‘relative_throughput’: 867.5638940380757}
T16 Report reasons to hold request {‘n_containing’: 20, ‘n_fit’: 1282, ‘fit_median_time’: 1189.2385, ‘containing_median_time’: 31433.8365, ‘relative_throughput’: 26.431902852119237}
T17 Check report Y to stop indication {‘n_containing’: 20, ‘n_fit’: 1282, ‘fit_median_time’: 1189.2385, ‘containing_median_time’: 31433.8365, ‘relative_throughput’: 26.431902852119237}

Where is clear (in this example) that the executions on the log not in the model lead to higher execution times (also 867 times higher!)

Root Cause Analysis

The output of root cause analysis in the diagnostics context is a decision tree that permits to understand the causes of a deviation. In the following examples, for each deviation, a different decision tree is built and visualized.

In the following examples, that consider the Receipt log, the decision trees will be built on the following choice of attributes (i.e. only org:group attribute will be considered).

# build decision trees
string_attributes = ["org:group"]
numeric_attributes = []

parameters = {"string_attributes": string_attributes, "numeric_attributes": numeric_attributes}

.

ROOT CAUSE ANALYSIS ON UNFIT EXECUTION OF TRANSITIONS

To perform root cause analysis on the transitions that are executed in an unfit way, the following code could be used:

from pm4py.algo.conformance.tokenreplay.diagnostics import root_cause_analysis
trans_root_cause = root_cause_analysis.diagnose_from_trans_fitness(log, trans_fitness, parameters=parameters)

To visualize the decision trees obtained by root cause analysis, the following code could be used:

from pm4py.visualization.decisiontree import factory as dt_vis_factory
for trans in trans_root_cause:
    clf = trans_root_cause[trans]["clf"]
    feature_names = trans_root_cause[trans]["feature_names"]
    classes = trans_root_cause[trans]["classes"]
    # visualization could be called
    gviz = dt_vis_factory.apply(clf, feature_names, classes)
    dt_vis_factory.view(gviz)

And one of the trees look like (for T02):

ROOT CAUSE ANALYSIS ON EXECUTIONS CONTAINING ACTIVITIES THAT ARE NOT IN THE MODEL

To perform root cause analysis on activities that are executed but are not in the process model, the following code could be used:

from pm4py.algo.conformance.tokenreplay.diagnostics import root_cause_analysis
act_root_cause = root_cause_analysis.diagnose_from_notexisting_activities(log, unwanted_activities,
                                                                          parameters=parameters)

To visualize the decision trees obtained by root cause analysis, the following code could be used:

from pm4py.visualization.decisiontree import factory as dt_vis_factory
for act in act_root_cause:
    clf = act_root_cause[act]["clf"]
    feature_names = act_root_cause[act]["feature_names"]
    classes = act_root_cause[act]["classes"]
    # visualization could be called
    gviz = dt_vis_factory.apply(clf, feature_names, classes)
    dt_vis_factory.view(gviz)

And one of the trees look like (for T03):

.