Bibliographic record
DSI-Bench: A Benchmark for Dynamic Spatial Intelligence
- Authors
- Ziang Zhang, Zehan Wang, Guanghao Zhang, Weilong Dai, Yan Xia, Ziang Yan, Minjie Hong, Zhou Zhao
- Publication year
- 2025
- OA status
- oa_green
Print
Need access?
Ask circulation staff for physical copies or request digital delivery via Ask a Librarian.
Digital copy
Unavailable in your region (PD status unclear).
Abstract
Reasoning about dynamic spatial relationships is essential, as both observers
and objects often move simultaneously. Although vision-language models (VLMs)
and visual expertise models excel in 2D tasks and static scenarios, their
ability to fully understand dynamic 3D scenarios remains limited. We introduce
Dynamic Spatial Intelligence and propose DSI-Bench, a benchmark with nearly
1,000 dynamic videos and over 1,700 manually annotated questions covering nine
decoupled motion patterns of observers and objects. Spatially and temporally
symmetric designs reduce biases and enable systematic evaluation of models'
reasoning about self-motion and object motion. Our evaluation of 14 VLMs and
expert models reveals key limitations: models often conflate observer and
object motion, exhibit semantic biases, and fail to accurately infer relative
relationships in dynamic scenarios. Our DSI-Bench provides valuable findings
and insights about the future development of general and expertise models with
dynamic spatial intelligence.
and objects often move simultaneously. Although vision-language models (VLMs)
and visual expertise models excel in 2D tasks and static scenarios, their
ability to fully understand dynamic 3D scenarios remains limited. We introduce
Dynamic Spatial Intelligence and propose DSI-Bench, a benchmark with nearly
1,000 dynamic videos and over 1,700 manually annotated questions covering nine
decoupled motion patterns of observers and objects. Spatially and temporally
symmetric designs reduce biases and enable systematic evaluation of models'
reasoning about self-motion and object motion. Our evaluation of 14 VLMs and
expert models reveals key limitations: models often conflate observer and
object motion, exhibit semantic biases, and fail to accurately infer relative
relationships in dynamic scenarios. Our DSI-Bench provides valuable findings
and insights about the future development of general and expertise models with
dynamic spatial intelligence.
Copies & availability
Realtime status across circulation, reserve, and Filipiniana sections.
Self-checkout (no login required)
- Enter your student ID, system ID, or full name directly in the table.
- Provide your identifier so we can match your patron record.
- Choose Self-checkout to send the request; circulation staff are notified instantly.
| Barcode | Location | Material type | Status | Action |
|---|---|---|---|---|
| No holdings recorded. | ||||
Digital files
Preview digitized copies when embargo permits.
-
View digital file
original
APPLICATION/PDF · 20.93 MB
Links & eResources
Access licensed or open resources connected to this record.
- oa Direct