-
Jianshu She, Zonghang Li, Hongchao Du, Shangyu Wu, Wenhao Zheng, Eric Xing, Zhengzhong Liu, Huaxiu Yao, Jason Xue, Qirong Ho
MLSys 2026 (Oral)First author
TL;DR: An LLM serving system that schedules prefill by predicted output length, cutting latency and improving throughput under mixed workloads.
-
Jianshu She, Zhuohao Li, Zhemin Huang, Qi Li, Peiran Xu, Haonan Li, Qirong Ho
COLM 2025First author
TL;DR: A large model emits compact reasoning that a small model expands, so most tokens are produced cheaply — speeding up chain-of-thought without losing accuracy.
-
Jianshu She, Xinyue Li, Eric Xing, Zhengzhong Liu, Qirong Ho
EMNLP 2025First author
TL;DR: Studies when linearly-steerable directions emerge in language models and how they evolve over the course of training.
-
Jianshu She, Wenhao Zheng, Zhengzhong Liu, Hongyi Wang, Eric Xing, Huaxiu Yao, Qirong Ho
ACL Demo 2025First author
TL;DR: An edge inference system that routes individual tokens between a small on-device model and a large cloud model to balance quality against cost and latency.
-
Zonghang Li, Tao Li, Wenjiao Feng, Rongxing Xiao, Jianshu She, Hong Huang, Mohsen Guizani, Hongfang Yu, Qirong Ho, Wei Xiang, Steve Liu
ICLR 2026
TL;DR: Runs 30–70B LLMs on everyday home devices by distributing inference across heterogeneous, low-resource CPUs/GPUs.
-
Zhoujun Cheng, Shibo Hao, Tianyang Liu, Fan Zhou, Yutao Xie, Feng Yao, Yuexin Bian,
Yonghao Zhuang, Nilabjo Dey, Yuheng Zha, Yi Gu, Kun Zhou, Yuqi Wang, Yuan Li,
Richard Fan, Jianshu She, Chengqian Gao, Abulhair Saparov, Haonan Li,
Taylor W Killian, Mikhail Yurochkin, Zhengzhong Liu, Eric P Xing, Zhiting Hu
NeurIPS Datasets & Benchmarks 2025
TL;DR: A large-scale, cross-domain study and benchmark of RL for LLM reasoning, revealing how reasoning skills transfer (and fail to transfer) across domains.
-
Jianshu She
arXiv 2026First author
TL;DR: An OS-inspired resource manager for LLM agents — MLFQ scheduling plus a context lifecycle manager — to prevent scheduling failures and context degradation.
-
Jianshu She
arXiv 2026First author
TL;DR: Splits agents into an enterprise-side privacy agent and a cloud-side reasoning agent, using context-aware dynamic sanitization to keep sensitive data on-prem.
-
Zhoujun Cheng, Richard Fan, Shibo Hao, Taylor W Killian, Haonan Li, Suqi Sun, Hector Ren, Alexander Moreno,
Daqian Zhang, Tianjun Zhong, Yuxin Xiong, Yuanzhe Hu, Yutao Xie, Xudong Han, Yuqi Wang, Varad Pimpalkhute,
Yonghao Zhuang, Aaryamonvikram Singh, Xuezhi Liang, Anze Xie, Jianshu She, Desai Fan, Chengqian Gao, Liqun Ma, Mikhail Yurochkin, John Maggs, Xuezhe Ma, Guowei He, Zhiting Hu,
Zhengzhong Liu, Eric P Xing
arXiv 2025
TL;DR: A parameter-efficient reasoning system that delivers strong reasoning performance with a much smaller model footprint.
-
Chengqian Gao, Haonan Li, Taylor W. Killian, Jianshu She, Renxi Wang, Liqun Ma, Zhoujun Cheng, Shibo Hao, Zhiqiang Xu
arXiv 2025
TL;DR: Frames concise reasoning as constrained optimization — minimize response length subject to a performance constraint — solved via Lagrangian relaxation (PALU).
-
Akhmed Sakip, Erland Hilman Fuadi, Omar Sayedelahl, Zonghang Li, Jianshu She, Alham Fikri Aji, Steve Liu, Eric Xing, Qirong Ho
arXiv 2026
TL;DR: Jointly adapts global batch size, parallelism strategy, and micro-batch size during training, guided by Goodput, to maximize convergence per unit of wall-clock time.
Feel free to reach out for questions or collaboration opportunities.