Jianshu She (佘建树) — MBZUAI PhD Student

Jianshu She 佘建树

MBZUAI — PhD Student

Email GitHub Google Scholar

About

I am Jianshu She (佘建树), a third-year PhD student at MBZUAI advised by Prof. Qirong Ho and Prof. Steve Liu. My research focuses on LLM efficient reasoning, efficient post-training, LLM serving systems, and cloud–edge collaborative reasoning optimization. I previously worked as an integrated circuit design engineer responsible for chip front-end design and verification.

🔍 I am actively looking for research internship opportunities (2026). Feel free to reach out!

Publications

LAPS: A Length-Aware-Prefill LLM Serving System

Jianshu She, Zonghang Li, Hongchao Du, Shangyu Wu, Wenhao Zheng, Eric Xing, Zhengzhong Liu, Huaxiu Yao, Jason Xue, Qirong Ho

MLSys 2026 (Oral)First author

TL;DR: An LLM serving system that schedules prefill by predicted output length, cutting latency and improving throughput under mixed workloads.

[PDF]

Hawkeye: Model Collaboration for Efficient Reasoning

Jianshu She, Zhuohao Li, Zhemin Huang, Qi Li, Peiran Xu, Haonan Li, Qirong Ho

COLM 2025First author

TL;DR: A large model emits compact reasoning that a small model expands, so most tokens are produced cheaply — speeding up chain-of-thought without losing accuracy.

[PDF]

Linear Steerability in Language Models: When It Emerges and How It Evolves

Jianshu She, Xinyue Li, Eric Xing, Zhengzhong Liu, Qirong Ho

EMNLP 2025First author

TL;DR: Studies when linearly-steerable directions emerge in language models and how they evolve over the course of training.

[ArXiv]

Token-level routing inference system for edge devices

Jianshu She, Wenhao Zheng, Zhengzhong Liu, Hongyi Wang, Eric Xing, Huaxiu Yao, Qirong Ho

ACL Demo 2025First author

TL;DR: An edge inference system that routes individual tokens between a small on-device model and a large cloud model to balance quality against cost and latency.

[PDF]

Prima.cpp: Fast 30-70B LLM Inference on Heterogeneous and Low-Resource Home Clusters

Zonghang Li, Tao Li, Wenjiao Feng, Rongxing Xiao, Jianshu She, Hong Huang, Mohsen Guizani, Hongfang Yu, Qirong Ho, Wei Xiang, Steve Liu

ICLR 2026

TL;DR: Runs 30–70B LLMs on everyday home devices by distributing inference across heterogeneous, low-resource CPUs/GPUs.

[PDF]

Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain Perspective

Zhoujun Cheng, Shibo Hao, Tianyang Liu, Fan Zhou, Yutao Xie, Feng Yao, Yuexin Bian, Yonghao Zhuang, Nilabjo Dey, Yuheng Zha, Yi Gu, Kun Zhou, Yuqi Wang, Yuan Li, Richard Fan, Jianshu She, Chengqian Gao, Abulhair Saparov, Haonan Li, Taylor W Killian, Mikhail Yurochkin, Zhengzhong Liu, Eric P Xing, Zhiting Hu

NeurIPS Datasets & Benchmarks 2025

TL;DR: A large-scale, cross-domain study and benchmark of RL for LLM reasoning, revealing how reasoning skills transfer (and fail to transfer) across domains.

[PDF]

AgentRM: An OS-Inspired Resource Manager for LLM Agent Systems

Jianshu She

arXiv 2026First author

TL;DR: An OS-inspired resource manager for LLM agents — MLFQ scheduling plus a context lifecycle manager — to prevent scheduling failures and context degradation.

[ArXiv]

SplitAgent: A Privacy-Preserving Distributed Architecture for Enterprise-Cloud Agent Collaboration

Jianshu She

arXiv 2026First author

TL;DR: Splits agents into an enterprise-side privacy agent and a cloud-side reasoning agent, using context-aware dynamic sanitization to keep sensitive data on-prem.

[ArXiv]

K2-think: A Parameter-efficient Reasoning System

Zhoujun Cheng, Richard Fan, Shibo Hao, Taylor W Killian, Haonan Li, Suqi Sun, Hector Ren, Alexander Moreno, Daqian Zhang, Tianjun Zhong, Yuxin Xiong, Yuanzhe Hu, Yutao Xie, Xudong Han, Yuqi Wang, Varad Pimpalkhute, Yonghao Zhuang, Aaryamonvikram Singh, Xuezhi Liang, Anze Xie, Jianshu She, Desai Fan, Chengqian Gao, Liqun Ma, Mikhail Yurochkin, John Maggs, Xuezhe Ma, Guowei He, Zhiting Hu, Zhengzhong Liu, Eric P Xing

arXiv 2025

TL;DR: A parameter-efficient reasoning system that delivers strong reasoning performance with a much smaller model footprint.

[PDF]

Concise Reasoning in the Lens of Lagrangian Optimization

Chengqian Gao, Haonan Li, Taylor W. Killian, Jianshu She, Renxi Wang, Liqun Ma, Zhoujun Cheng, Shibo Hao, Zhiqiang Xu

arXiv 2025

TL;DR: Frames concise reasoning as constrained optimization — minimize response length subject to a performance constraint — solved via Lagrangian relaxation (PALU).

[ArXiv]

COPUS: Co-adaptive Parallelism and Batch Size Selection in Large Language Model Training

Akhmed Sakip, Erland Hilman Fuadi, Omar Sayedelahl, Zonghang Li, Jianshu She, Alham Fikri Aji, Steve Liu, Eric Xing, Qirong Ho

arXiv 2026

TL;DR: Jointly adapts global batch size, parallelism strategy, and micro-batch size during training, guided by Goodput, to maximize convergence per unit of wall-clock time.

[ArXiv]

Jianshu She 佘建树

About

News

Publications

Contact