Prediction of Remaining Time

Prediction of the Remaining Time

BETA feature: PM4Py offers limited support (thanks to Scikit-Learn ElasticNets) to prediction of the remaining time on the ‘bpmnIntegration2’ branch, and support for Keras RNN is offered only in an even more experimental branch ‘predictionIntegration’

The prediction of the remaining time is important because it helps to understand, from the current event, how much time is still left before the end of the business process instance, so it helps to understand if S.L.A. are going to be violated.

Prediction through classic regression (Scikit-Learn – ElasticNets)

It is available in the bpmnIntegration2 and predictionIntegration branches.

An important premise is that this method is not very good in comparison to the state-of-the-art remaining time prediction techniques in Process Mining, but is very fast and do not require powerful hardware. In this approach:

  • The log is transformed into a matrix of features where, for each prefix of a case, a row is associated that contains all the features of the current and of the previous events.
  • The remaining time for each prefix are stored in a vector
  • The regression model is learned

Let’s see an example. A log could be loaded:

from pm4py.objects.log.importer.xes import factory as xes_importer
import os

log = xes_importer.apply(os.path.join("tests", "input_data", "running-example.xes"))

And split into a training log and a test log:

from pm4py.objects.log.log import EventLog
test_log_size = int(len(log) * 0.34)
training_log = EventLog(log[test_log_size:len(log)])
test_log = EventLog(log[0:test_log_size])

Then, the prediction model could be learned:

from pm4py.algo.prediction import factory as prediction_factory
model = prediction_factory.train(log, variant="elasticnet")

At this point, it is going to be evaluated on the test log, to see for each trace belonging to the test log what is the remaining time. We can expect that if the end of the trace is reached, the prediction is more or less near to 0. Let’s see if that is the case.

results = prediction_factory.test(model, test_log)
print("ELASTICNET RESULTS")
print(results)

We obtain bad results with this method:

ELASTICNET RESULTS
[337755.57226467 262236.54330092]

The model could be saved with:

prediction_factory.save(model, "model1.dump")

And retrieved with:

model = prediction_factory.load("model1.dump")

Prediction through Recurrent Neural Networks (Keras)

It is currently available in the predictionIntegration branch. This approach works much better in comparison to the classic regression.

In this approach, the log is transformed into a structure suited for recurrent neural network training. An example could be provided.

A log could be loaded:

from pm4py.objects.log.importer.xes import factory as xes_importer
import os

log = xes_importer.apply(os.path.join("tests", "input_data", "running-example.xes"))

And split into a training log and a test log:

from pm4py.objects.log.log import EventLog
test_log_size = int(len(log) * 0.34)
training_log = EventLog(log[test_log_size:len(log)])
test_log = EventLog(log[0:test_log_size])

Then, the model could be trained:

from pm4py.algo.prediction import factory as prediction_factory
model = prediction_factory.train(log, variant="keras_rnn")

At this point, it is going to be evaluated on the test log, to see for each trace belonging to the test log what is the remaining time. We can expect that if the end of the trace is reached, the prediction is more or less near to 0. Let’s see if that is the case.

rnn_results = prediction_factory.test(model, test_log)
print("KERAS_RNN results")
print(rnn_results)

We obtain good results with this method (at the end of the case, the remaining time is predicted to be very near to 0):

KERAS_RNN results
[11.828211773511763, 537.1129334989265]

The model could be saved with:

prediction_factory.save(model, "model2.dump")

And retrieved with:

model = prediction_factory.load("model2.dump")

.