MASTER DEFENSE – trace clustering for concept drift detection and localization

On October 28, 2021, Rafael Gaspar de Sousa defended her master’s thesis. Prof. Dr. Sarajane Marques Peres supervised him and Prof.dr.ir. Hajo A. Reijers (University of Utrecht) was the co-supervisor . Rafael worked with new approach for detecting and localizing concept drifts in a event log.

Abstract: Business processes are constantly subject to changes over time due to the need for adaptation and flexibility in the complex environment they operate, such as new clients demands, competition, or legislation. Process models are one of the fundamental tools when understanding a process behavior, which is key for business success. However, these process models are usually not documented and updated to agree with eventual changes in process behavior over time, leading to misconceptions in the understanding of the actual process. Although process mining aims to provide techniques that discover, analyze, and enhance process automatically based on event logs, most techniques assume that the process is stationary, which is not often the case. Handling the problem of processes changing over time, known as concept drift, leads to the capability of detecting drift as soon as possible and localizing the entities involved in them, providing a much better comprehension of the process behavior that can be a competitive advantage for businesses. Most of the work on dealing with concept drift in the process mining literature focuses on providing a framework that is able to detect drifts, but are generally not adequate to simultaneously localize the change inside the process behavior and exhibit information on the entities involved. Applying clustering techniques to data from event logs, known as trace clustering, supports the identification of patterns in the process behavior that enable simplification and segregation of similar behaviors that produces a model of the process behavior as clusters. However, although common in general process mining, trace clustering has not been widely explored in the context of the concept drift problem. This research presents a method to simultaneously perform concept drift detection and localization based on the same clusters obtained by online trace clustering. The clusters are able to reflect changes in complex process behavior in a simplified manner that serves as a platform for performing effective drift detection and localization online with no additional data structures. Experiments with synthetic and real-world event logs with different types of control-flow changes have shown that, although our method has not outperformed the baseline for drift detection in all cases, our approach was able to correctly detect drifts in most cases according to parameters configuration while also providing information about the entities involved in the drift from the business process perspective.

You can access the complete thesis here (in English).

The evaluating board of this work had the following researcher:

  • Prof. Dr. Sarajane Marques Peres (chair) – USP
  • Prof. Dr. André Carlos Ponce de Leon Ferreira de Carvalho – USP
  • Prof. Dr. Hélio Côrtes Vieira Lopes – PUC-RJ

Related posts