Passed Time Statistics

In the initial exploration of the event log, it may be important to get a detailed overview about the bottlenecks of the process, and a number telling the performance of an activity.

This is actually possible in the following way.

Retrieving the list of activities of a log/dataframe

Suppose we have a log

import os
from pm4py.objects.log.importer.xes import factory as xes_importer

log = xes_importer.apply(os.path.join("tests", "input_data", "receipt.xes"))

Or a dataframe

import os
from pm4py.objects.log.adapters.pandas import csv_import_adapter

df = csv_import_adapter.import_dataframe_from_path(os.path.join("tests", "input_data", "receipt.csv"))

Then, it is possible to retrieve the list of activities of the log/dataframe (if concept:name is the attribute hosting the activity) by respectively doing:

from pm4py.algo.filtering.log.attributes import attributes_filter as log_attributes_filter
activities = log_attributes_filter.get_attribute_values(log, "concept:name")

or

from pm4py.algo.filtering.pandas.attributes import attributes_filter as pd_attributes_filter
activities = pd_attributes_filter.get_attribute_values(df, "concept:name")

Getting Statistics about a particular activity

The following code helps to retrieve information on logs/dataframes about the activities that are executed before T02 Check confirmation of receipt and how much time passes (by looking at the direct predecessor in the log):

from pm4py.statistics.passed_time.log import factory as log_passed_time
pt_T02 = log_passed_time.apply(log, "T02 Check confirmation of receipt", variant="pre")
print(pt_T02)
from pm4py.statistics.passed_time.pandas import factory as pd_passed_time
pt_T02 = pd_passed_time.apply(df, "T02 Check confirmation of receipt", variant="pre")
print(pt_T02)

Getting in both cases the same output:

{‘pre’: [[‘Confirmation of receipt’, 72163.8380231696, 1079], [‘T03 Adjust confirmation of receipt’, 140616.38507843137, 51], [‘T06 Determine necessity of stop advice’, 136135.34368000002, 75], [‘T10 Determine necessity to stop indication’, 264807.78436774196, 155], [‘T20 Print report Y to stop indication’, 294770.1363333333, 3], [‘T07-2 Draft intern advice aspect 2’, 80059.7085, 2], [‘T12 Check document X request unlicensed’, 129568.049, 2], [‘T07-1 Draft intern advice aspect 1’, 0.681, 1]], ‘pre_avg_perf’: 100581.24329239766}

Here:

  • the value of the ‘pre‘ key is a list of input activities, associated with the average throughput time (by looking at the direct predecessor in the log) and the overall count of the relation
  • the value of the ‘pre_avg_perf‘ is the weighted average of all the times passed from the previous activities (direct precedessors in the log).

The following code helps to retrieve information on logs/dataframes about the activities that are executed after T02 Check confirmation of receipt and how much time passes (by looking at the direct follower in the log):

from pm4py.statistics.passed_time.log import factory as log_passed_time
pt_T02 = log_passed_time.apply(log, "T02 Check confirmation of receipt", variant="post")
print(pt_T02)
from pm4py.statistics.passed_time.pandas import factory as pd_passed_time
pt_T02 = pd_passed_time.apply(df, "T02 Check confirmation of receipt", variant="post")
print(pt_T02)

Getting in both cases the same output:

{‘post’: [[‘T03 Adjust confirmation of receipt’, 428324.67441860464, 43], [‘T04 Determine confirmation of receipt’, 26182.95621090259, 1119], [‘T05 Print and send confirmation of receipt’, 303.0, 1], [‘T06 Determine necessity of stop advice’, 73048.43258426966, 178], [‘T07-1 Draft intern advice aspect 1’, 45.5, 2], [‘T07-2 Draft intern advice aspect 2’, 0.0, 1], [‘T07-5 Draft intern advice aspect 5’, 71745.0, 1], [‘T10 Determine necessity to stop indication’, 137.93333333333334, 15]], ‘post_avg_perf’: 44701.11617647059}

Here:

  • The value of the ‘post’ key is a list of output activities, associated with the average throughput time (by looking at the direct successor in the log) and the overall count of the relation
  • The value of the ‘post_avg_perf’ is the weighted average of all the times passed from the next activities (direct successorsin the log).

.