TY - GEN
T1 - Automated bridge inspection image interpretation based on vision-language pre-Training
AU - Wang, Shengyi
AU - El-Gohary, Nora
N1 - The authors would like to thank the National Science Foundation (NSF). This material is based on work supported by the NSF under Grant No. 1937115. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the NSF.
PY - 2024
Y1 - 2024
N2 - Bridge inspection images capture a wealth of information and details about bridge conditions. This study proposes a method to interpret on-site bridge inspection images by generating human-readable descriptive sentences. The resulting text can be formed into a bridge inspection report to aid and expedite the bridge inspection process for bridge engineers; and the extracted information can be further exploited to support bridge deterioration prediction and maintenance decision making. This is, however, a challenging task that combines computer vision and natural language processing. First, it not only requires object detection/segmentation from the bridge inspection images but also demands a grasp of the relationships between the recognized objects. Second, human-readable sentences need to be generated based on the extracted information from the images. Third, the available bridge image-Text data pairs, which can be used for training, are quite limited and highly noisy. To address these gaps, this paper proposes a deep learning-based model for generating free-form human-readable descriptive sentences of the bridge conditions, which leverages bootstrapping language-image pre-Training (BLIP) and its vision-language pre-Training data from the web. This paper discusses the proposed model and its performance results.
AB - Bridge inspection images capture a wealth of information and details about bridge conditions. This study proposes a method to interpret on-site bridge inspection images by generating human-readable descriptive sentences. The resulting text can be formed into a bridge inspection report to aid and expedite the bridge inspection process for bridge engineers; and the extracted information can be further exploited to support bridge deterioration prediction and maintenance decision making. This is, however, a challenging task that combines computer vision and natural language processing. First, it not only requires object detection/segmentation from the bridge inspection images but also demands a grasp of the relationships between the recognized objects. Second, human-readable sentences need to be generated based on the extracted information from the images. Third, the available bridge image-Text data pairs, which can be used for training, are quite limited and highly noisy. To address these gaps, this paper proposes a deep learning-based model for generating free-form human-readable descriptive sentences of the bridge conditions, which leverages bootstrapping language-image pre-Training (BLIP) and its vision-language pre-Training data from the web. This paper discusses the proposed model and its performance results.
UR - http://www.scopus.com/inward/record.url?scp=85184285384&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85184285384&partnerID=8YFLogxK
U2 - 10.1061/9780784485224.001
DO - 10.1061/9780784485224.001
M3 - Conference contribution
AN - SCOPUS:85184285384
T3 - Computing in Civil Engineering 2023: Data, Sensing, and Analytics - Selected Papers from the ASCE International Conference on Computing in Civil Engineering 2023
SP - 1
EP - 8
BT - Computing in Civil Engineering 2023
A2 - Turkan, Yelda
A2 - Louis, Joseph
A2 - Leite, Fernanda
A2 - Ergan, Semiha
PB - American Society of Civil Engineers
T2 - ASCE International Conference on Computing in Civil Engineering 2023: Data, Sensing, and Analytics, i3CE 2023
Y2 - 25 June 2023 through 28 June 2023
ER -