Conformance Checking using Footprints

Conformance Checking using Footprints

Footprints are a very basic (but scalable) conformance checking technique to compare entities (such that event logs, DFGs, Petri nets, process trees, any other kind of model). Essentially, a relationship between any couple of activities of the log/model is inferred. This can include:

  • Directly-Follows Relationships: in the log/model, it is possible that the activity A is directly followed by B.
  • Directly-Before Relationships: in the log/model, it is possible that the activity B is directly preceded by A.
  • Parallel behavior: it is possible that A is followed by B and B is followed by A

A footprints matrix can be calculated, that describes for each couple of activities the footprint relationship. It is possible to calculate that for different types of models and for the entire event log, but also trace-by-trace (if the local behavior is important).

Let’s assume that the running-example.xes event log is loaded:

from pm4py.objects.log.importer.xes import importer as xes_importer
import os

log = xes_importer.apply(os.path.join("tests", "input_data", "running-example.xes"))

And the inductive miner is applied on such log:

from pm4py.algo.discovery.inductive import algorithm as inductive_miner

net, im, fm = inductive_miner.apply(log)

To calculate the footprints for the entire log, the following code can be used:

from pm4py.algo.discovery.footprints import algorithm as footprints_discovery

fp_log = footprints_discovery.apply(log, variant=footprints_discovery.Variants.ENTIRE_EVENT_LOG)

The footprints of the entire log are:

{‘sequence’: {(‘examine casually’, ‘decide’), (‘decide’, ‘pay compensation’), (‘register request’, ‘examine thoroughly’), (‘reinitiate request’, ‘examine casually’), (‘check ticket’, ‘decide’), (‘register request’, ‘examine casually’), (‘reinitiate request’, ‘examine thoroughly’), (‘decide’, ‘reject request’), (‘examine thoroughly’, ‘decide’), (‘reinitiate request’, ‘check ticket’), (‘register request’, ‘check ticket’), (‘decide’, ‘reinitiate request’)}, ‘parallel’: {(‘examine casually’, ‘check ticket’), (‘check ticket’, ‘examine casually’), (‘check ticket’, ‘examine thoroughly’), (‘examine thoroughly’, ‘check ticket’)}, ‘start_activities’: {‘register request’}, ‘end_activities’: {‘pay compensation’, ‘reject request’}, ‘activities’: {‘reject request’, ‘register request’, ‘check ticket’, ‘decide’, ‘pay compensation’, ‘examine thoroughly’, ‘examine casually’, ‘reinitiate request’}}

The data structure is a dictionary with, as keys, sequence (expressing directly-follows relationships) and parallel (expressing the parallel behavior that can happen in either way).

The footprints of the log, trace-by-trace, can be calculated as follows, and are a list of footprints for each trace:

from pm4py.algo.discovery.footprints import algorithm as footprints_discovery

fp_trace_by_trace = footprints_discovery.apply(log, variant=footprints_discovery.Variants.TRACE_BY_TRACE)

The footprints of the model can be calculated as follows:

fp_net = footprints_discovery.apply(net, im, fm)

And are the following:

{‘sequence’: {(‘check ticket’, ‘decide’), (‘reinitiate request’, ‘examine casually’), (‘register request’, ‘examine thoroughly’), (‘decide’, ‘reject request’), (‘register request’, ‘check ticket’), (‘register request’, ‘examine casually’), (‘decide’, ‘reinitiate request’), (‘reinitiate request’, ‘examine thoroughly’), (‘decide’, ‘pay compensation’), (‘reinitiate request’, ‘check ticket’), (‘examine casually’, ‘decide’), (‘examine thoroughly’, ‘decide’)}, ‘parallel’: {(‘check ticket’, ‘examine thoroughly’), (‘examine thoroughly’, ‘check ticket’), (‘check ticket’, ‘examine casually’), (‘examine casually’, ‘check ticket’)}, ‘activities’: {‘decide’, ‘examine casually’, ‘reinitiate request’, ‘check ticket’, ‘examine thoroughly’, ‘register request’, ‘reject request’, ‘pay compensation’}, ‘start_activities’: {‘register request’}}

The data structure is a dictionary with, as keys, sequence (expressing directly-follows relationships) and parallel (expressing the parallel behavior that can happen in either way).

It is possible to visualize a comparison between the footprints of the (entire) log and the footprints of the (entire) model.

First of all, let’s see how to visualize a single footprints table, for example the one of the model. The following code can be used:

from pm4py.visualization.footprints import visualizer as fp_visualizer

gviz = fp_visualizer.apply(fp_net, parameters={"format": "svg"})
fp_visualizer.view(gviz)

To produce as visualization the following:

To compare the two footprints tables, the following code can be used. Please note that the visualization will look the same, if no deviations are discovered. If deviations are found they are colored by red.

from pm4py.visualization.footprints import visualizer as fp_visualizer

gviz = fp_visualizer.apply(fp_log, fp_net, parameters={"format": "svg"})
fp_visualizer.view(gviz)

To actually find some deviations, let’s repeat the procedure on the receipt.xes log, applying a heavy filter on the log to discover a simpler model:

from pm4py.objects.log.importer.xes import importer as xes_importer
import os
from copy import deepcopy
from pm4py.algo.filtering.log.variants import variants_filter

log = xes_importer.apply(os.path.join("tests", "input_data", "receipt.xes"))
filtered_log = variants_filter.apply_auto_filter(deepcopy(log))

from pm4py.algo.discovery.inductive import algorithm as inductive_miner
net, im, fm = inductive_miner.apply(filtered_log)

The comparative view between the entire event log and the process model looks as the following:

With a conformance checking operation, we want instead to compare the behavior of the traces of the log against the footprints of the model.

This can be done using the following code:

from pm4py.algo.conformance.footprints import algorithm as footprints_conformance

conf_fp = footprints_conformance.apply(fp_trace_by_trace, fp_net)

And will contain, for each trace of the log, a set with the deviations. Extract of the list for some traces:

{(‘T06 Determine necessity of stop advice’, ‘T04 Determine confirmation of receipt’), (‘T02 Check confirmation of receipt’, ‘T06 Determine necessity of stop advice’)}
set()
{(‘T19 Determine report Y to stop indication’, ‘T20 Print report Y to stop indication’), (‘T10 Determine necessity to stop indication’, ‘T16 Report reasons to hold request’), (‘T16 Report reasons to hold request’, ‘T17 Check report Y to stop indication’), (‘T17 Check report Y to stop indication’, ‘T19 Determine report Y to stop indication’)}
set()
set()
{(‘T02 Check confirmation of receipt’, ‘T06 Determine necessity of stop advice’), (‘T10 Determine necessity to stop indication’, ‘T04 Determine confirmation of receipt’), (‘T04 Determine confirmation of receipt’, ‘T03 Adjust confirmation of receipt’), (‘T03 Adjust confirmation of receipt’, ‘T02 Check confirmation of receipt’)}
set()

We can see that for the first trace that contains deviations, there are two deviations, the first related to T06 Determine necessity of stop advice being executed before T04 Determine confirmation of receipt; the second related to T02 Check confirmation of receipt being followed by T06 Determine necessity of stop advice.

The traces for which the conformance returns nothing are fit (at least according to the footprints).

Footprints conformance checking is a way to identify obvious deviations, behavior of the log that is not allowed by the model.

On the log side, their scalability is wonderful! The calculation of footprints for a Petri net model may be instead more expensive.