25–29 Apr 2026
Kechuang Building
Asia/Shanghai timezone

Vision-Language Model Performance and Scaling-Law on LHAASO-WCDA Data

27 Apr 2026, 16:00
5m
A102 (Kechuang Building)

A102

Kechuang Building

NO.1520 Taihu Blvd, Suzhou, Jiangsu, China
Poster report(print size: 0.6m Wide*0.9m High) AI and Others session

Speaker

Zijie Huang

Description

Vision-Language Models (VLMs) integrate image information into the representation space through visual encoders, overcoming the information bottleneck associated with pure text token inputs. Tailored to the structural characteristics of the LHAASO array detectors, this study explores the feasibility of employing array trigger images as a new input modality, aiming to enhance particle identification performance via more comprehensive signal representation. We constructed and evaluated a VLM based on the GLM-4.1V-9B architecture using simulated data. Results demonstrate that the model exhibits significant Scaling Law characteristics and achieves a substantial improvement in understanding detector information compared to text-only models. This work validates the feasibility of multimodal approaches in WCDA data analysis, providing a novel technical pathway for optimizing WCDA's particle identification and reconstruction capabilities.

Primary author

Zijie Huang

Presentation materials