Reconstructing Missing NOx Emissions in Heavy-Duty Diesel Vehicle OBD Data:A Machine learning approach

Published in Journal of Hazard Materials, 2025

On-Board Diagnostic (OBD) systems enable real-time monitoring of heavy-duty diesel vehicle operations and NOx emissions. However, existing OBD systems have inherent limitations, including systematic missing values. To address this issue, this study develops a data-driven approach. The method utilizes OBD-recorded upstream and downstream NOx emissions data to build machine learning models for reconstructing real-world OBD data. The modeling results indicate that machine learning models perform well in predicting upstream emissions, achieving an R² above 0.9 on the test set. However, its performance in predicting downstream emissions was highly variable, with R² ranging from 0.05 to 0.98, and showed a positive correlation with fuel-based emission factors. A case study was conducted on a selected vehicle. The total NOx emission associated with missing data for this specific vehicle was estimated at 15,741.3 g, whereas the recorded emission from available data was 6,157.3 g. Missing data were then imputed for an additional 31 vehicles, revealing that normal emitters showed significantly higher emission associated with missing data. The proposed approach is highly compatible with existing big data platforms and can be easily extended to other vehicles. This will improve the platform’s representation of real-world emission, enabling policymakers to implement more targeted pollution mitigation strategies.

Recommended citation: Cao, Z. et al. Reconstructing missing NOx emissions in heavy-duty diesel vehicle OBD data: A machine learning approach. Journal of Hazardous Materials 494, 138619, doi:https://doi.org/10.1016/j.jhazmat.2025.138619 (2025).
Download Paper