Obtaining Graphs from Dataframes

Graphs permits to understand several aspects of the current log (for example, the distribution of a numeric attribute, or the distribution of case duration, or the events over time).

Distribution of case duration

In the following example, the distribution of case duration is shown in two different graphs, a simple plot and a semi-logarithmic (on the X-axis plot). The semi-logarithmic plot is less sensible to possible outliers.

First, the Receipt log may be loaded:

import os
from pm4py.objects.log.importer.csv.versions import pandas_df_imp

df_path = os.path.join("tests", "input_data", "receipt.csv")
df = pandas_df_imp.import_dataframe_from_path(df_path)

Then, the distribution related to case duration may be obtained:

from pm4py.util import constants
from pm4py.statistics.traces.pandas import case_statistics

x, y = case_statistics.get_kde_caseduration(df, parameters={constants.PARAMETER_CONSTANT_TIMESTAMP_KEY: "time:timestamp"})

We could obtain the simple plot:

from pm4py.visualization.graphs import factory as graphs_vis_factory

gviz = graphs_vis_factory.apply_plot(x, y, variant="cases")
graphs_vis_factory.view(gviz)

Or the semi-logarithmic (on the X-axis) plot.

gviz = graphs_vis_factory.apply_semilogx(x, y, variant="cases")
graphs_vis_factory.view(gviz)

Distribution of events over time

In the following example, a graph representing the distribution of events over time is obtained. This is particularly important because it helps to understand in which time intervals the greatest number of events is recorded.

First, the Receipt log may be loaded:

import os
from pm4py.objects.log.importer.csv.versions import pandas_df_imp

df_path = os.path.join("tests", "input_data", "receipt.csv")
df = pandas_df_imp.import_dataframe_from_path(df_path)

Then, the distribution related to events over time may be obtained:

from pm4py.algo.filtering.pandas.attributes import attributes_filter

x, y = attributes_filter.get_kde_date_attribute(df, attribute="time:timestamp")

And the graph could be obtained:

from pm4py.visualization.graphs import factory as graphs_vis_factory

gviz = graphs_vis_factory.apply_plot(x, y, variant="dates")
graphs_vis_factory.view(gviz)

Distribution of a numeric attribute

In the following example, two graphs related to the distribution of a numeric attribute will be obtained, a normal plot and a semilogarithmic (on the X-axis) plot (that is less sensitive to outliers).

First, a filtered version of the Road Traffic log is loaded:

import os
from pm4py.objects.log.importer.csv.versions import pandas_df_imp

df_path = os.path.join("tests", "input_data", "roadtraffic100traces.csv")
df = pandas_df_imp.import_dataframe_from_path(df_path)

Then, the distribution of the numeric attribute amount is obtained:

from pm4py.algo.filtering.pandas.attributes import attributes_filter

x, y = attributes_filter.get_kde_numeric_attribute(df, "amount")

The standard graph could be then obtained:

from pm4py.visualization.graphs import factory as graphs_vis_factory

gviz = graphs_vis_factory.apply_plot(x, y, variant="attributes")
graphs_vis_factory.view(gviz)

Or the semi-logarithmic graph could be obtained:

from pm4py.visualization.graphs import factory as graphs_vis_factory

gviz = graphs_vis_factory.apply_semilogx(x, y, variant="attributes")
graphs_vis_factory.view(gviz)

.