Event Log Data Quality: From Benchmarking to Quality-Aware Process Mining Pipelines
Abstract
Event logs are derived from the logs of the information systems supporting the execution of business processes. Like any data, they are prone to errors, and, like any data, their quality can crucially influence the insights that we extract from their analysis, most notably through process mining techniques. In this talk, we start by defining the issues related to data quality in the context of event log data, giving an overview of how event log data quality has been conceptualized in the literature and the related challenges. Then, we present two research works that we are currently developing. First, we present a language for the specification of common types of errors in event logs, which is used for developing an error injection tool that generates benchmark event logs with different levels of quality. Second, we introduce our preliminary work on data quality-aware event log cleaning pipelines for process mining tasks.
About the speaker
Marco Comuzzi is Associate Professor and Director of the Blockchain Research Center at the Department of Industrial Engineering, Ulsan National Institute of Science and Technology (UNIST), in Ulsan (South Korea). He has also held academic full-time positions at the Eindhoven University of Technology, The Netherlands, and City, University of London, United Kingdom. His research is in the broad area of information systems design and data science, with a specific focus on process mining and blockchain technology.