aalto1 untyped-item.component.html
Data analysis and pre-processing for digital twin development, predictive modeling of missing variables at Viikinmäki wastewater treatment plant
Loading...
URL
Journal Title
Journal ISSN
Volume Title
School of Electrical Engineering |
Master's thesis
Unless otherwise stated, all rights belong to the author. You may download, display and print this publication for Your own personal use. Commercial use is prohibited.
Authors
Date
Department
Major/Subject
Mcode
Language
en
Pages
90
Series
Abstract
Data Analysis and Pre-Processing for Digital Twin Development Predictive Modelling of Missing Variables at Viikinmäki Wastewater Treatment Plant In the light of the upcoming EU Urban Wastewater Treatment updates requiring energy neutrality for medium and large wastewater treatment plants by 2024, our team at DIGICARBA is designing a digital replica of the Viikinmäki WWTP. While soft sensors are being developed and assessed in a simulation environment, data from online physical sensors is often incomplete or of low quality. This thesis introduces a data pre-processing tool to identify and correct errors in the dataset, as well as a predictive tool to address gaps in critical effluent variables. Together, these tools enhance data quality and availability, supporting improved carbon balance by tracking greenhouse gas emissions and promoting sustainable resource use in wastewater treatment technologies.
The motivation for this research was the unreliability of online data, which is often unclean and inconsistent, affecting value forecasting in the simulation environment. This study aimed to substitute online data with lab data when strong correlation was found, allowing the more reliable lab data to be used in the simulation environment. Thus, laboratory data was analysed for correlation and its potential use in the digital twin model.
The data was provided by HSY. There were two main types of datasets: Online data: Data was collected from physical sensors mounted at various stages of the wastewater treatment plant. Lab data: collected from the same plant under supervision of industry experts in the laboratory. In the data preprocessing pipeline, time series analysis was conducted for both online and lab data. The data was visualized and examined for missing or NaN values, followed by suitable imputation. Visualization also helped detect outliers, identified using the Interquartile Range (IQR) method and Principal Component Analysis (PCA). Once outliers were removed, missing values were imputed using the PCA method.
In the second part, a predictive model for the prediction of NH4N and COD SS was designed. For this purpose, ordinary least squares (OLS) was implemented as a base criterion for other machine learning models. This method did not capture the effect of varying certain parameters of the process. Hence, Autoregressive with exogeneous variables (ARX) model was implemented, which not only improved the rmse but captured the impact of varying essential elements.
Description
Supervisor
Elvander, FilipThesis advisor
Larsson, TimoHaimi, Henri