Benchmarks

Benchmarks

Having some benchmark, negative or positive it is, is important for us. We attach in this page some of the benchmarks that have been done in PM4Py in comparison to other Process Mining tools.

Importing a CSV and calculating the frequency DFG

The most widely used format in data extraction from database is the CSV format. In this section, we assume to measure the times of loading the CSV and calculating the Directly Follows Graph on a CSV.

The script that has been used in PM4Py to measure the CSV import times, and calculate the DFG, is the following (tests have been done on a I7-7550U with 16 GB of DDR4 RAM):

from pm4py.objects.log.adapters.pandas import csv_import_adapter
from pm4py.algo.discovery.dfg.adapters.pandas import df_statistics
import time

aa = time.time()
df = csv_import_adapter.import_dataframe_from_path_wo_timeconversion("C:\\csv_logs\\Billing.csv")
#df = csv_import_adapter.convert_timestamp_columns_in_df(df, timest_columns=["time:timestamp"], timest_format="%Y-%m-%d %H:%M:%S")
#df = df.sort_values(["case:concept:name", "time:timestamp"])
bb = time.time()
dfg = df_statistics.get_dfg_graph(df, measure="frequency", sort_caseid_required=False)
cc = time.time()
print(bb-aa)
print(cc-bb)

 

Obtaining the following results comparison to the CSV importer included in ProM 6:

When timestamp columns are converted and a sort operation is done, the Pandas CSV importer performs in some cases better and in some cases equal to the ProM6 CSV importer. When no sort is applied, the Pandas importing is much faster.

When calculating the DFG, in comparison to the analogous plug-in in ProM 6 that could be found inside Inductive Miner, the results are the following (tests have been done on a I7-7550U with 16 GB of DDR4 RAM):

So the Pandas approach in PM4Py is seemingly working very well in comparison to the ProM 6 implementation

Importing a XES file

ProM offers a big choice of plug-ins that are able to import XES files. Choosing one to refer to is difficult, but since RAM is not a problem, the Naive importer has been chosen.

PM4Py on the other side offers two different XES importers:

  • A standard, certified, importer that relies on the standard LXML library and is able to handle any sort of XML files.
  • A non-standard, non-certified, importer that is a “line parser” and is able to parse only pretty-printed XML files.

The results of the comparison are the following:

As it could been seen in the table, PM4Py could not handle XES at the same speed than ProM. The non-standard importer figures better, but fails on both the BPI Challenge 2017 logs.

.