Task Mining – BPM 2019

In PM4Py, we offer an implementation of the task mining approach presented in the AI4BPM workshop of BPM 2019.

The title of the paper is:

Process Mining from Low-level User Actions

from Aviv Yehezkel, Yaron Bialy, Ariel Smutko, Eran Roseberg

 

A summary of the approach is in this sequence of actions:

  • Stream grouping: events temporally near, or connected by a strong connection, are grouped.
  • Task Generalization/Entity Recognition: replace specific entities in titles/descriptions with a generic entity
  • Sequence Mining (Frequent Itemsets; PrefixSpan algorithm)
  • Sequence Clustering

Here, we propose an example of application of task mining on a log recorded by the “clicks recorder”.

First, the CSV log file containing the clicks is imported (with a specific encoding and ignoring possible errors)

import pandas as pd
df = pd.read_csv(r"clicks.csv", sep=";", error_bad_lines=False, encoding="ISO-8859-1")

Then, it is converted to an event stream (i.e. a list of low-level events)

from pm4py.objects.conversion.log import factory as log_conv_factory
stream = log_conv_factory.apply(df, variant=log_conv_factory.TO_EVENT_STREAM)

Then, the task mining approach is applied. The output is a list of processes recognized by the task miner, each one contains a list of recognized sequences (list of steps):

from pm4py.algo.task_mining import factory as task_mining_factory
tasks = task_mining_factory.apply(stream)

The output can be then stored in a file for a nicer visualization:

import json
content = json.dumps(tasks, indent=2)
F = open("dump.json", "w")
F.write(content)
F.close()

A representation of the output is contained in the following snippet. Since the output is a list of processes, and each process is a list of recognized sequences (list of steps), the output starts with two [ (list openings)

[
  [
    {
      "label": [
        "pycharm64.exe SunAwtFrame (830.7, 514.5) pm4py-source [C:\\Users\\aless\\pm4py-source] - ...\\",
        "pycharm64.exe SunAwtFrame (75.0, 27.4) ...\\ - PyCharm"
      ],
      "score": 12.899999856948853,
      "no_occurrences": 3,
      "events": [
        [
          {
            "pid": 9124,
            "process_name": "pycharm64.exe",
            "process_exe": "C:\\Program Files\\JetBrains\\PyCharm Community Edition 2019.2\\bin\\pycharm64.exe",
            "classname": "SunAwtFrame",
            "window_id": 918756,
            "window_name": "pm4py-source [C:\\Users\\aless\\pm4py-source] - ...\\pm4py\\algo\\stream\\tasks\\versions\\equiv_spatial_grouping.py - PyCharm",
            "window_position": "(-7, -7)",
            "window_dimension": "(1208, 768)",
            "event_position": "(800, 521)",
            "event_position_rel": "(807, 528)",
            "username": "aless",
            "computername": "DESKTOP-14M5HJ1",
            "current_timestamp": 1567937944.7,
            "last_screenshot": "screenshots\\Screenshot_1567937942945.png",
            "@@label_index": 12,
            "@@label": "pycharm64.exe SunAwtFrame (830.7, 514.5) pm4py-source [C:\\Users\\aless\\pm4py-source] - ...\\"
          },...

.