Data analysis and pre-processing for digital twin development, predictive modeling of missing variables at Viikinmäki wastewater treatment plant

Kiran, Anmol

aalto1 untyped-item.component.html

Data analysis and pre-processing for digital twin development, predictive modeling of missing variables at Viikinmäki wastewater treatment plant

Files

master_Kiran_Anmol_2025.pdf (3.77 MB)

School of Electrical Engineering | Master's thesis

Unless otherwise stated, all rights belong to the author. You may download, display and print this publication for Your own personal use. Commercial use is prohibited.

Authors

Kiran, Anmol

Date

2025-02-22

Major/Subject

Signal Processing and Data Science

Degree programme

Master's Programme in Computer, Communication and Information Sciences

Language

en

Pages

90

Abstract

Data Analysis and Pre-Processing for Digital Twin Development Predictive Modelling of Missing Variables at Viikinmäki Wastewater Treatment Plant In the light of the upcoming EU Urban Wastewater Treatment updates requiring energy neutrality for medium and large wastewater treatment plants by 2024, our team at DIGICARBA is designing a digital replica of the Viikinmäki WWTP. While soft sensors are being developed and assessed in a simulation environment, data from online physical sensors is often incomplete or of low quality. This thesis introduces a data pre-processing tool to identify and correct errors in the dataset, as well as a predictive tool to address gaps in critical effluent variables. Together, these tools enhance data quality and availability, supporting improved carbon balance by tracking greenhouse gas emissions and promoting sustainable resource use in wastewater treatment technologies. The motivation for this research was the unreliability of online data, which is often unclean and inconsistent, affecting value forecasting in the simulation environment. This study aimed to substitute online data with lab data when strong correlation was found, allowing the more reliable lab data to be used in the simulation environment. Thus, laboratory data was analysed for correlation and its potential use in the digital twin model. The data was provided by HSY. There were two main types of datasets: Online data: Data was collected from physical sensors mounted at various stages of the wastewater treatment plant. Lab data: collected from the same plant under supervision of industry experts in the laboratory. In the data preprocessing pipeline, time series analysis was conducted for both online and lab data. The data was visualized and examined for missing or NaN values, followed by suitable imputation. Visualization also helped detect outliers, identified using the Interquartile Range (IQR) method and Principal Component Analysis (PCA). Once outliers were removed, missing values were imputed using the PCA method. In the second part, a predictive model for the prediction of NH4N and COD SS was designed. For this purpose, ordinary least squares (OLS) was implemented as a base criterion for other machine learning models. This method did not capture the effect of varying certain parameters of the process. Hence, Autoregressive with exogeneous variables (ARX) model was implemented, which not only improved the rmse but captured the impact of varying essential elements.

Supervisor

Elvander, Filip

Thesis advisor

Larsson, Timo
Haimi, Henri

Keywords

digital twin, data pre-processing, data driven predictive models, wastewater treat-ment plant, hydraulic analysis, soft sensors

Permanent link to this item

https://urn.fi/URN:NBN:fi:aalto-202503172812

Collections

[dipl] Sähkötekniikan korkeakoulu / ELEC

Show all metadata

Data analysis and pre-processing for digital twin development, predictive modeling of missing variables at Viikinmäki wastewater treatment plant

Files

URL

Journal Title

Journal ISSN

Volume Title

Authors

Date

Department

Major/Subject

Mcode

Degree programme

Language

Pages

Series

Abstract

Description

Supervisor

Thesis advisor

Keywords

Other note

Citation

Permanent link to this item

Collections

Endorsement

Review

Supplemented By

Referenced By