PhysInOne: Visual Physics Learning and Reasoning in One Suite

FIRST_AUTHOR_LAST, FIRST_AUTHOR_FIRST; SECOND_AUTHOR_LAST, SECOND_AUTHOR_FIRST

PhysInOne: Visual Physics Learning and Reasoning in One Suite

Siyuan Zhou¹^†, Hejun Wang¹^†, Hu Cheng¹^†, Jinxi Li¹^†, Dongsheng Wang¹^†, Junwei Jiang¹^†, Yixiao Jin¹^†, Jiayue Huang¹^†, Shiwei Mao¹^†, Shangjia Liu², Yafei Yang¹, Hongkang Song¹, Shenxing Wei¹, Zihui Zhang¹, DataTeam¹^*, Bing Wang², Zhihua Wang³, Chuhang Zou⁴, Bo Yang¹

¹vLAR Group, ²The Hong Kong Polytechnic University, ³Syai Singapore, ⁴Meta
^† equal contribution and co-first authorship
^*{Peng Huang, Shijie Liu, Zhengli Hao, Hao Li, Yitian Li, Wenqi Zhou, Zhihan Zhao, Zongqi He, Hongtao Wen, Shouwang Huang, Peng Yun, Bowen Cheng, Pok Kazaf Fu, Wai Kit Lai, Jiahao Chen, Kaiyuan Wang, Zhixuan Sun, Ziqi Li, Haochen Hu, Di Zhang, Chun Ho Yuen}
{siyuan.zhou, hejun.wang, hu123.cheng, jinxi.li}@connect.polyu.hk, bo.yang@polyu.edu.hk

arXiv Code Data Benchmark

Dataset Overview

We present PhysInOne, the largest dataset addressing the critical scarcity of physically-grounded training data for AI systems.

Scale and Diversity
- 2 million videos generated from 153,810 dynamic 3D scenes
- Covers 71 fundamental physical phenomena in everyday environments, spanning four major domains: Mechanics, Optics, Fluid Dynamics, Magnetism
- Includes 2,231 common objects tailored to daily physical interactions
- Enriched with 623 materials across five categories: plastic, metal, wood, stone, and fabric
- Features 528 diverse 3D backgrounds to ensure realism and environmental variety
Scene Characteristics
- Each scene involves 1–3 physical phenomena, reflecting real-world activities, including single-, double-, and triple-physics activities
- Supports complex multi-object interactions, with increasing scene complexity:
- Average number of objects per scene: 3.9 (single-physics), 6.3 (double-physics), 7.8 (triple-physics)
- Each scene is captured from 13 viewpoints: 12 static cameras and 1 moving camera
Rich Annotations
- 3D geometry
- Semantic labels
- Object motion and dynamics
- Physical properties
- Natural-language scene descriptions
Supported Applications
- Physics-aware video generation
- Short- and long-term future frame prediction
- Physical property estimation
- Motion transfer
- ⋮

Dataset Examples

Loading showcase videos...

Benchmark Results

Quantitative evaluation results across four physics-related tasks using PhysInOne dataset.

Quantitative Results
Qualitative Results

Physics-aware Video Generation

Evaluation of video generation models with and without fine-tuning on PhysInOne.

	PMF ↑	FVD ↓	Human Rating ↑
SVD	2.753	203	6.09
SVD_lora	2.446	150	5.82
SVD_sft	3.147	143	6.08
SVD_flt	2.464	147	5.45

CogVideoX	2.877	165	2.98
CogVideoX_lora	2.869	149	2.95

Wan2.2-5B	2.041	258	2.26
Wan2.2-5B_lora	2.785	178	4.80
Wan2.2-5B_sft	2.978	190	5.95
Wan2.2-5B_flt	2.227	341	2.61

Future Frame Prediction

Long-term Prediction (Seen / Novel Viewpoints)

Models predict ~78 future frames (~2.6 seconds ahead) from the first half of video clips.

	PMF ↑	PSNR ↑	SSIM ↑	LPIPS ↓
TiNeuVox	3.710 / 2.885	21.49 / 15.20	0.633 / 0.452	0.517 / 0.665
DefGS	3.980 / 3.347	22.85 / 17.95	0.833 / 0.598	0.192 / 0.348
TRACE	3.869 / 3.242	22.42 / 17.44	0.756 / 0.599	0.295 / 0.422
FreeGave	3.897 / 3.265	22.57 / 17.75	0.818 / 0.619	0.219 / 0.355

ExtDM	3.363 / -	19.55 / -	0.657 / -	0.771 / -
MAGI-1	4.086 / -	23.14 / -	0.788 / -	0.364 / -

Short-term Prediction (Seen / Novel Viewpoints)

Models continuously predict the next 10 frames in real-time from streaming input.

	PMF ↑	PSNR ↑	SSIM ↑	LPIPS ↓
DefGS	4.536 / 3.728	26.02 / 20.92	0.861 / 0.739	0.206 / 0.322
FreeGave	4.742 / 3.706	27.09 / 20.80	0.876 / 0.715	0.199 / 0.336

ExtDM	3.774 / -	22.14 / -	0.717 / -	0.715 / -
MAGI-1	4.696 / -	26.75 / -	0.886 / -	0.116 / -

Physical Properties Estimation

Resimulation with Estimated Properties

Quantitative comparison of resimulated videos using estimated physical properties.

	PMF ↑	PSNR ↑	SSIM ↑	LPIPS ↓
PAC-NeRF	5.617	24.12	0.942	0.086
GIC	5.938	26.90	0.950	0.074

Property Estimation Error by Material Type

Percentage error (%) of estimated physical parameters. Lower is better. v denotes initial velocity.

Elastic Solids

	log₁₀(E)	ν	v
PAC-NeRF	117.18	14.26	4.04
GIC	49.76	16.35	3.32

Plasticine

	log₁₀(E)	ν	log₁₀(τ_Y)	v
PAC-NeRF	68.38	15.79	25.51	3.25
GIC	178.36	42.72	17.11	3.39

Newtonian Fluids

	log₁₀(μ)	log₁₀(κ)	v
PAC-NeRF	42.64	287.56	3.11
GIC	8.78	70.07	3.28

Granular Substances

	θ_fric	v
PAC-NeRF	16.87	3.29
GIC	18.85	3.57

Non-Newtonian Fluids

	log₁₀(μ)	log₁₀(κ)	log₁₀(τ_Y)	log₁₀(η)	v
PAC-NeRF	309.42	552.89	339.20	65.60	2.95
GIC	124.26	181.87	28.78	24.97	3.73

Motion Transfer

Evaluation of transferring physical motion dynamics from source videos to target images.

	PMF ↑	PSNR ↑	SSIM ↑	LPIPS ↓
GoWithTheFlow	3.309	18.98	0.691	0.410
MotionPro	3.484	20.28	0.775	0.467

Authors

Loading authors...

Acknowledgements

We would like to express our sincere gratitude to (in alphabetical order) Geer Chen, Jinhe Chen, Zhiyuan Chen, Yuanhaonan Deng, Shuo Feng, Wenxuan Guo, Junpeng Hu, Ruitao Hu, Ying Ji, Yixuan Jiang, Jiani Liu, Xinjie Liu, Xinsheng Liu, Jiyuan Ma, Qiyue Ma, Chenyang Mao, Yukun Miao, Ye Peng, Yuanyue Qiao, Dacheng Qin, Xiangnuo Ren, Xiaowen Song, Jingqi Tian, Hong Wang, Huixuechun Wang, Zheng Wang, Weipeng Wu, Zhaowei Wu, Kai Xing, Ran Yan, Leize Yang, Ruizhe Yang, Ao Yu, and Minhao Zhu for their essential contributions and dedicated efforts in conducting human evaluations.

BibTeX

@misc{zhou2026physinonevisualphysicslearning,
      title={PhysInOne: Visual Physics Learning and Reasoning in One Suite}, 
      author={Siyuan Zhou and Hejun Wang and Hu Cheng and Jinxi Li and Dongsheng Wang and Junwei Jiang and Yixiao Jin and Jiayue Huang and Shiwei Mao and Shangjia Liu and Yafei Yang and Hongkang Song and Shenxing Wei and Zihui Zhang and Peng Huang and Shijie Liu and Zhengli Hao and Hao Li and Yitian Li and Wenqi Zhou and Zhihan Zhao and Zongqi He and Hongtao Wen and Shouwang Huang and Peng Yun and Bowen Cheng and Pok Kazaf Fu and Wai Kit Lai and Jiahao Chen and Kaiyuan Wang and Zhixuan Sun and Ziqi Li and Haochen Hu and Di Zhang and Chun Ho Yuen and Bing Wang and Zhihua Wang and Chuhang Zou and Bo Yang},
      year={2026},
      eprint={2604.09415},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2604.09415}, 
}

Visitor Distribution