On March 22, 2022, Esther María Rojas Krugger defended her master’s thesis. Prof. Dr. Sarajane Marques Peres supervised her. Esther worked with a comparative study between neural network-based and count-based approaches applied on anomaly detection in business process event logs.
Abstract: Process mining aims to use event data to obtain useful information about the processes related to these events. Its main task is process discovery, that aims to create a model that represents behavior occurring in the organizations processes. However, anomaly behaviour occurring in processes makes process discovery challenging, because anomalies impact event logs. On the one hand, the anomalies detection task is important because they can indicate fraud or errors in information systems, thus, the organization can make decisions based on these detections. On the other hand, the anomalies treatment/filtering task is essential to improve process discovery. There are several approaches for anomaly detection in event logs, including neural networks-based and count-based approaches. In the literature, some of those are state-of-the-art approaches in anomaly detection but have not been evaluated for the treatment/removal of anomalies aiming to improve model discovery. Also, some state-of-the-art approaches for the treatment/removal of anomalies aiming to improve discovery have not been evaluated in the anomaly detection task. Therefore, there is a gap between these two tasks. That gap was addressed in this research through a comprehensive comparative study. The goal of this research was to identify which approaches are suitable for the detection of three types of anomalies (skipping activity, activity insertion, activity switching), considering their capabilities to perform the two tasks. This research was carried out through quantitative and qualitative analyzes applied to thirty artificial events logs. These analyzes showed the advantages, disadvantages and limitations of the approaches under the presence of three types of anomalies in the event log. It was found that some approaches did not handle two challenges effectively: classifying normal cases whose traces are infrequent, and classifying cases that execute loop behavior. Furthermore, in this research was studied which approaches best deal with these challenges. This comparative study is important for process mining as it can provide a basis for organizations to decide to use one or another approach according to specific characteristics of their problem
You can access the complete thesis here (in Portuguese).
The evaluating board of this work had the following researcher:
- Prof. Dr. Sarajane Marques Peres (chair) – USP
- Prof. Dr. Flávia Maria Santoro – Faculdade Intelli
- Prof. Dr. Edson Emílio Scalabrin – PUC-PR