Log Skeleton

Log Skeleton

The concept of “log skeleton” has been described in the contribution

Verbeek, H. M. W., and R. Medeiros de Carvalho. “Log skeletons: A classification approach to process discovery.” arXiv preprint arXiv:1806.08247 (2018).

And is claimingly the most accurate classification approach to decide whether a trace belongs to (the language) of a log or not.

For a log, an object containing a list of relations is calculated:

  • Equivalence: contains the couples of activities that happen ALWAYS with the same frequency inside a trace.
  • Always-after: contains the couples of activities (A,B) such that an occurrence of A is ALWAYS followed, somewhen in the future of the trace, by an occurrence of B.
  • Always-before: contains the couples of activities (B,A) such that an occurrence of B is ALWAYS preceded, somewhen in the past of the trace, by an occurrence of A.
  • Never-together: contains the couples of activities (A,B) that NEVER happens together in the history of the trace.
  • Directly-follows: contains the list of directly-follows relations of the log.
  • For each activity, the number of possible occurrences per trace.

It is also possible to provide a noise threshold. In that case, more relations are found since the conditions are relaxed.

Let’s suppose to take the running-example.xes log:

from pm4py.objects.log.importer.xes import importer as xes_importer
import os
log = xes_importer.apply(os.path.join("tests", "input_data", "running-example.xes"))

Then, we can calculate the log skeleton:

from pm4py.algo.discovery.log_skeleton import algorithm as lsk_discovery
skeleton = lsk_discovery.apply(log, parameters={lsk_discovery.Variants.CLASSIC.value.Parameters.NOISE_THRESHOLD: 0.0})

We can also print that:

{‘equivalence’: {(‘pay compensation’, ‘register request’), (‘examine thoroughly’, ‘register request’), (‘reject request’, ‘register request’), (‘pay compensation’, ‘examine casually’)}, ‘always_after’: {(‘register request’, ‘check ticket’), (‘examine thoroughly’, ‘decide’), (‘register request’, ‘decide’)}, ‘always_before’: {(‘pay compensation’, ‘register request’), (‘pay compensation’, ‘decide’), (‘pay compensation’, ‘check ticket’), (‘reject request’, ‘decide’), (‘pay compensation’, ‘examine casually’), (‘reject request’, ‘check ticket’), (‘examine thoroughly’, ‘register request’), (‘reject request’, ‘register request’)}, ‘never_together’: {(‘pay compensation’, ‘reject request’), (‘reject request’, ‘pay compensation’)}, ‘directly_follows’: set(), ‘activ_freq’: {‘register request’: {1}, ‘examine casually’: {0, 1, 3}, ‘check ticket’: {1, 2, 3}, ‘decide’: {1, 2, 3}, ‘reinitiate request’: {0, 1, 2}, ‘examine thoroughly’: {0, 1}, ‘pay compensation’: {0, 1}, ‘reject request’: {0, 1}}}

We can see the relations (equivalence, always_after, always_before, never_together, directly_follows, activ_freq) as key of the object, and the values are the activities/couples of activities that follow such pattern.

To see how the log skeleton really works, for classification/conformance purposes, let’s change to another log (the receipt.xes log), and calculate an heavily filtered version of that (to have less behavior)

from pm4py.objects.log.importer.xes import importer as xes_importer
import os

log = xes_importer.apply(os.path.join("tests", "input_data", "receipt.xes"))

from copy import deepcopy
from pm4py.algo.filtering.log.variants import variants_filter
filtered_log = variants_filter.apply_auto_filter(deepcopy(log))

Calculate the log skeleton on top of the filtered log, and then apply the classification as follows:

from pm4py.algo.conformance.log_skeleton import algorithm as lsk_conformance
conf_result = lsk_conformance.apply(log, skeleton)
for trace in conf_result:
    print(conf_result)

In such way, we can get for each trace whether it has been classified as belonging to the filtered log, or not. When deviations are found, the trace does not belong to the language of the original log.

We can also calculate a log skeleton on the original log, for example providing 0.03 as noise threshold, and see which are the effects on the classification:

from pm4py.algo.discovery.log_skeleton import algorithm as lsk_discovery
skeleton = lsk_discovery.apply(log, parameters={lsk_discovery.Variants.CLASSIC.value.Parameters.NOISE_THRESHOLD: 0.03})

from pm4py.algo.conformance.log_skeleton import algorithm as lsk_conformance
conf_result = lsk_conformance.apply(log, skeleton)
for trace in conf_result:
    print(conf_result)

We can see that some traces are classified as “uncorrect” also calculating the log skeleton on the original log, if a noise threshold is provided.