ITcon Vol. 31, pg. 632-650, http://www.itcon.org/2026/28

Training data generation for construction instance segmentation from target domains

DOI:10.36680/j.itcon.2026.028
submitted:December 2025
published:May 2026
editor(s):Bosché F
authors:Xuzhong Yan, Lecturer
Department of Construction Management, School of Management, Zhejiang University of Technology, Hangzhou 310023, China
https://orcid.org/0000-0002-1675-2368
xzyan@zjut.edu.cn

Zeli Wang, Lecturer
School of Business, East China University of Science and Technology, Shanghai 200237, China
https://orcid.org/0000-0002-5613-6148
wang.zeli@ecust.edu.cn
summary:This study addresses a persistent challenge in computer vision for construction monitoring: deep learning models trained on source-domain data often perform poorly when deployed in new target domains due to distribution shifts and limited annotations. To mitigate these issues, the research introduces TDG-CIS, a clustering-initialized semi-supervised framework designed to generate high-quality instance segmentation training data directly from unlabeled target-domain images. TDG-CIS operates in two stages. First, it employs a clustering-based mask generation strategy that uses a transformer feature backbone to extract patch-level representations and derive initial instance masks without human supervision. These masks serve as a reliable starting point for semi-supervised learning. Second, a semi-supervised instance segmentation model iteratively refines these masks and converts raw images into usable training samples. This iterative pipeline allows the model to progressively improve segmentation quality while adapting to the visual characteristics of diverse construction environments. The framework was validated on a large dataset of 50,000 images spanning more than 70 construction-related domains. Experimental results show that TDG-CIS achieves a 77.9% data utilization rate, along with 87.5% mAP and 81.1% mAR in segmentation quality. When used to scale training data for downstream instance segmentation models, TDG-CIS yields substantial performance gains: baseline models trained on automatically generated data outperform those trained on manually labeled datasets, improving mAP from 92.9% to 94.3% and mAR from 86.7% to 88.6%. Ablation studies further demonstrate that the semi-supervised refinement mechanism is key to boosting both data utilization and segmentation accuracy. Overall, the study offers a novel approach that eliminates dependence on source-domain supervision and provides a scalable pathway for producing target-domain training datasets for instance segmentation in intelligent construction applications.
keywords:training data generation, clustering, semi-supervised, construction instance segmentation
full text: (PDF file, 2.904 MB)
citation:Yan, X., & Wang, Z. (2026). Training data generation for construction instance segmentation from target domains. Journal of Information Technology in Construction (ITcon), 31, 632-650. https://doi.org/10.36680/j.itcon.2026.028
statistics: