Machine learning-based automated classification of worker-reported safety reports in construction

Nikhil Bugalia; Vurukuti Tarani; Jai Kedia; Hrishikesh Gadekar

Journal of Information Technology in Construction

ISSN: 1874-4753

Editor-in-chief-:

Robert Amor

ITcon adheres to:

Web Of Science:

IF (2025): 3.4, Q2

Members of:

Acknowledgement

Journal is partially sponsored by:

Slovenian Research and Innovation Agency

ITcon Vol. 27, pg. 926-950, http://www.itcon.org/2022/45

Machine learning-based automated classification of worker-reported safety reports in construction

DOI:	10.36680/j.itcon.2022.045
submitted:	December 2021
revised:	September 2022
published:	November 2022
editor(s):	Obonyo E
authors:	Nikhil Bugalia, Assistant Professor, Department of Civil Engineering, Indian Institute of Technology Madras, India E-mail: nikhilbugalia@gmail.com Vurukuti Tarani, Graduate Student Department of Civil Engineering, Indian Institute of Technology Madras, India E-mail: ce19m012@smail.iitm.ac.in Jai Kedia, Undergraduate Student Department of Civil Engineering, Indian Institute of Technology Madras, India E-mail: ce17b037@smail.iitm.ac.in Hrishikesh Gadekar, Graduate Student Department of Civil Engineering, Indian Institute of Technology Madras, India E-mail: ce17b116@smail.iitm.ac.in
summary:	Limited academic attention has been paid to the applicability of Machine Learning (ML) approaches for analyzing worker-reported near-miss safety reports, as opposed to injury reports, at construction sites. Although resource-efficient analysis through ML of large volumes of such data at construction sites can help guide practitioners in decision-making to prevent injuries. The current study addresses this research gap by evaluating the relevance of ML approaches through quantitative and qualitative methods for scaling efficient near-miss reporting programs at construction sites. The study uses an extensive experimentation strategy consisting of input data processing, n-gram modeling, and sensitivity analysis. It first tests the proposition that, despite the data-quality challenges, the high performance of different ML algorithms can be achieved in automatically classifying the textual near-miss observations. The study relies on worker-reported near-miss data collected from a real construction site in Kuwait. The classification performance of various ML approaches is evaluated using F1 scores for three academically novel but commonly used category labels at the sites - "Unsafe Act (UA)," "Unsafe Condition (UC)," and "Good Observation (GO)." In addition, the practitioner's input was utilized to assess the practical applicability of ML classifiers for construction sites. The conventional Logistic Regression (LR) classifiers have a comparatively high F1 score of 0.79. However, ML classifiers faced challenges in distinguishing between UA and UC. Further, the analysis reveals that optimal ML classifiers may lose on being acceptable to human decision-makers. Overall, despite the promising performance of ML tools for the near-miss data, the sites with low maturity of reporting systems may find themselves unable to leverage ML to scale their reporting systems. A simplified experimentation strategy like the current study could help practitioners identify the data-specific optimal ML approaches in future applications.
keywords:	Automated construction; Machine Learning; Construction; Safety; Near-Miss Reporting; Neural Networks; n-grams
full text:	(PDF file, 0.799 MB)
citation:	Bugalia N, Tarani V, Kedia J, Gadekar H (2022). Machine learning-based automated classification of worker-reported safety reports in construction. Journal of Information Technology in Construction (ITcon), 27, 926-950. https://doi.org/10.36680/j.itcon.2022.045
statistics: