동향 Beating 모니터링에 따르면, Hanhua Ji announced that it has completed the adaptation of two models, 285B DeepSeek-V4-Flash and 1.6T DeepSeek-V4-Pro, on the day of the V4 release, based on the vLLM inference framework, and the adaptation code has been open-sourced on GitHub.
Adaptation speed depends on two premises: First, Hanhua Ji's self-developed NeuWare software stack natively supports mainstream frameworks such as PyTorch and vLLM, enabling fast model migration; Second, Hanhua Ji's chips natively support mainstream low-precision data formats, allowing for accuracy validation without additional format conversion. For the new structure of V4, Hanhua Ji has developed a proprietary fusion operator library, Torch-MLU-Ops, to provide specialized acceleration for modules such as Compressor and mHC, and has written sparse/compressed Attention, GroupGemm, and other hot operator kernels using BangC.
At the inference framework level, Hanhua Ji supports TP/PP/SP/DP/EP five-dimensional mixed parallelism, communication-computation parallelism, low-precision quantization, and PD-separated deployment in vLLM. The V4 technical report only mentioned validation on NVIDIA GPUs and Huawei Ascend NPUs, without mentioning the Hanhua Ji platform. This adaptation was autonomously carried out by Hanhua Ji. Stimulated by the V4 release news, the A-share domestic chip sector strengthened, and Hanhua Ji saw a linear surge during trading hours.
