## Earth Mover Distance

The Earth Mover Distance as introduced in:

**Leemans, Sander JJ, Anja F. Syring, and Wil MP van der Aalst. “Earth movers’ stochastic conformance checking.” International Conference on Business Process Management. Springer, Cham, 2019.**

provides a way to calculate the distance between two different stochastic languages.

Generally, one language is extracted from the event log, and one language is extracted from the process model.

With language, we mean a set of traces that is weighted according to its probability.

For the event log, trivially taking the set of variants of the log, and dividing by the total number of languages, provides the language of the model.

We can see how the language of the model can be discovered. We can import an event log and calculate its language:

from pm4py.objects.log.importer.xes import importer as xes_importer from pm4py.statistics.variants.log import get as variants_module log = xes_importer.apply("C:/running-example.xes") language = variants_module.get_language(log) print(language)

Obtaining the following probability distribution:

**{(‘register request’, ‘examine casually’, ‘check ticket’, ‘decide’, ‘reinitiate request’, ‘examine thoroughly’, ‘check ticket’, ‘decide’, ‘pay compensation’): 0.16666666666666666, (‘register request’, ‘check ticket’, ‘examine casually’, ‘decide’, ‘pay compensation’): 0.16666666666666666, (‘register request’, ‘examine thoroughly’, ‘check ticket’, ‘decide’, ‘reject request’): 0.16666666666666666, (‘register request’, ‘examine casually’, ‘check ticket’, ‘decide’, ‘pay compensation’): 0.16666666666666666, (‘register request’, ‘examine casually’, ‘check ticket’, ‘decide’, ‘reinitiate request’, ‘check ticket’, ‘examine casually’, ‘decide’, ‘reinitiate request’, ‘examine casually’, ‘check ticket’, ‘decide’, ‘reject request’): 0.16666666666666666, (‘register request’, ‘check ticket’, ‘examine thoroughly’, ‘decide’, ‘reject request’): 0.16666666666666666}**

The same thing does not happen in a natural way for the process model. In order to calculate a language for the process model, a scalable approach (but non deterministic) is to playout the model in order to obtain an event log.

Let’s first apply the Alpha Miner.

from pm4py.algo.discovery.alpha import algorithm as alpha_miner net, im, fm = alpha_miner.apply(log)

Then, we do the playout of the Petri net. We choose the **STOCHASTIC_PLAYOUT** variant.

from pm4py.simulation.playout import simulator playout_log = simulator.apply(net, im, fm, variant=simulator.Variants.STOCHASTIC_PLAYOUT)

We can then calculate the language of the event log:

from pm4py.simulation.playout import simulator playout_log = simulator.apply(net, im, fm, parameters={simulator.Variants.STOCHASTIC_PLAYOUT.value.Parameters.LOG: log}, variant=simulator.Variants.STOCHASTIC_PLAYOUT)

Obtaining the language of the model.

Then, the earth mover distance is calculated:

- It is assured that the two languages contain the same words: if a language does not contain a word, that is set to 0
- A common ordering (for example, alphabetical ordering) is decided among the keys of the languages.
- The distance between the different keys is calculated (using a string distance function such as the Levensthein function).

This permits to obtain a number greater or equal than 0 that express the distance between the language of the log and the language of the model. This is an alternative measure for the precision. To calculate the Earth Mover Distance, the Python package **pyemd** should be installed (**pip install pyemd**). The code to apply the Earth Mover Distance is the following:

from pm4py.evaluation.earth_mover_distance import evaluator emd = evaluator.apply(model_language, language) print(emd)

If the running-example log is chosen along with the Alpha Miner model, a value similar/equal to **0.1733**.