{DistCache}: Provable Load Balancing for {Large-Scale} Storage Systems with Distributed Caching Z Liu, Z Bai, Z Liu, X Li, C Kim, V Braverman, X Jin, I Stoica 17th USENIX Conference on File and Storage Technologies (FAST 19), 143-157, 2019 | 188 | 2019 |
{PipeSwitch}: Fast Pipelined Context Switching for Deep Learning Applications Z Bai, Z Zhang, Y Zhu, X Jin 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI …, 2020 | 140 | 2020 |
MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs Z Jiang, H Lin, Y Zhong, Q Huang, Y Chen, Z Zhang, Y Peng, X Li, C Xie, ... arXiv preprint arXiv:2402.15627, 2024 | 94 | 2024 |
Harmonia: Near-linear scalability for replicated storage with in-network conflict detection H Zhu, Z Bai, J Li, E Michael, D Ports, I Stoica, X Jin arXiv preprint arXiv:1904.08964, 2019 | 80 | 2019 |
Transparent {GPU} Sharing in Container Clouds for Deep Learning Workloads B Wu, Z Zhang, Z Bai, X Liu, X Jin 20th USENIX Symposium on Networked Systems Design and Implementation (NSDI …, 2023 | 43 | 2023 |
Runtime recovery of web applications under zero-day redos attacks Z Bai, K Wang, H Zhu, Y Cao, X Jin 2021 IEEE Symposium on Security and Privacy (SP), 1575-1588, 2021 | 22 | 2021 |
: Efficient Resource Disaggregation for Deep Learning Workloads X Jin, Z Bai, Z Zhang, Y Zhu, Y Zhong, X Liu IEEE/ACM Transactions on Networking, 2024 | 5 | 2024 |
Runtime scheduling and updating for deep learning applications Z Bai Johns Hopkins University, 2022 | | 2022 |
Scaling Large Language Model Training to More Than 10,000 GPUs Z Jiang, H Lin, Y Zhong, Q Huang, Y Chen, Z Zhang, Y Peng, X Li, C Xie, ... | | |