aalto1 untyped-item.component.html

Data analysis and pre-processing for digital twin development, predictive modeling of missing variables at Viikinmäki wastewater treatment plant

Loading...
Thumbnail Image

URL

Journal Title

Journal ISSN

Volume Title

School of Electrical Engineering | Master's thesis

Department

Mcode

Language

en

Pages

90

Series

Abstract

Data Analysis and Pre-Processing for Digital Twin Development Predictive Modelling of Missing Variables at Viikinmäki Wastewater Treatment Plant In the light of the upcoming EU Urban Wastewater Treatment updates requiring energy neutrality for medium and large wastewater treatment plants by 2024, our team at DIGICARBA is designing a digital replica of the Viikinmäki WWTP. While soft sensors are being developed and assessed in a simulation environment, data from online physical sensors is often incomplete or of low quality. This thesis introduces a data pre-processing tool to identify and correct errors in the dataset, as well as a predictive tool to address gaps in critical effluent variables. Together, these tools enhance data quality and availability, supporting improved carbon balance by tracking greenhouse gas emissions and promoting sustainable resource use in wastewater treatment technologies. The motivation for this research was the unreliability of online data, which is often unclean and inconsistent, affecting value forecasting in the simulation environment. This study aimed to substitute online data with lab data when strong correlation was found, allowing the more reliable lab data to be used in the simulation environment. Thus, laboratory data was analysed for correlation and its potential use in the digital twin model. The data was provided by HSY. There were two main types of datasets: Online data: Data was collected from physical sensors mounted at various stages of the wastewater treatment plant. Lab data: collected from the same plant under supervision of industry experts in the laboratory. In the data preprocessing pipeline, time series analysis was conducted for both online and lab data. The data was visualized and examined for missing or NaN values, followed by suitable imputation. Visualization also helped detect outliers, identified using the Interquartile Range (IQR) method and Principal Component Analysis (PCA). Once outliers were removed, missing values were imputed using the PCA method. In the second part, a predictive model for the prediction of NH4N and COD SS was designed. For this purpose, ordinary least squares (OLS) was implemented as a base criterion for other machine learning models. This method did not capture the effect of varying certain parameters of the process. Hence, Autoregressive with exogeneous variables (ARX) model was implemented, which not only improved the rmse but captured the impact of varying essential elements.

Description

Supervisor

Elvander, Filip

Thesis advisor

Larsson, Timo
Haimi, Henri

Other note

Citation

Endorsement

Review

Supplemented By

Referenced By