openPangu-R-7B-Diffusion
中文 | English
1. Introduction
openPangu-R-7B-Diffusion is a context-causal block diffusion language model with a block length of 32 and 7 billion parameters (excluding vocabulary embedding). It is built upon the pre-trained model of openPangu-Embedded-7B, which was further pre-trained on 700 billion tokens with an 8k length. After that, it was fine-tuned on 100 billion tokens of 32k-long annealing data, and finally trained for 10 epochs using 10 billion tokens of slow-thinking SFT data. The entire model training and inference process is based on the Ascend NPU.
- openPangu-7B-Diffusion-Base: A pre-trained model with 8k context length.
- openPangu-R-7B-Diffusion: An instruction-tuned model capable of slow thinking with 32k context length.
Key Features:
Inference
openPangu-R-7B-Diffusion employs causal block diffusion decoding, performing diffusion decoding block by block. During the decoding process, full attention is applied within each block, while causal attention is used for the preceding context. When all tokens within a block are decoded, the entire block of tokens is stored in the context KV cache, which uses a causal attention mask. Simultaneously, the first token of the next block is decoded.
- Supports variable-length inference and KV-Cache.
- Flexible context length not restricted by block length.
- Using the confidence threshold sampling can increase throughput by up to 2.5 times compared to standard autoregressive decoding.
- Similar to Fast dLLMv2, which sets small blocks within a block to achieve a trade-off between throughput and performance, typically performing optimally when the small block length is 4 or 8.
- Supports both AR and BlockDiffusion decoding.
Training
During the training of openPangu-R-7B-Diffusion, masked corpus blocks are concatenated with unmasked context. The model predicts the masked tokens for the masked corpus blocks and performs autoregressive training on the unmasked context.
- Retain the same causal attention mask shape as in the autoregressive model, enabling a quick adaptation from AR model to BlockDiffusion model.
- The block diffusion training only utilizes masked text blocks for training, while the unmasked context parts are wasted. By employing context causal attention mask, openPangu-R-7B-Diffusion is able to simultaneously train on the context using next-token prediction loss, thereby improving training efficiency.
- Compared to full-attention diffusion models, the number of tokens involved in training per batch is more stable, ensuring smooth training on long sequences.
2. Model Architecture
| openPangu-R-7B-Diffusion | |
|---|---|
| Architecture | Dense |
| Parameters (Non-Embedding) | 7B |
| Number of Layers | 34 |
| Hidden Dimension | 12800 |
| Attention Mechanism | GQA |
| Number of Attention Heads | 32 for Q,8 for KV |
| Vocabulary Size | 153k |
| Context Length (Natively) | 32k |
| Continued training Tokens | 800B |
3. Results
| Benchmark | Metric | Dream-v0-Instruct-7B | Fast-dLLMv2 | LLaDA2.0-mini-preview (16BA1B) | SDAR-8B | openPangu-R-7B-Diffusion |
|---|---|---|---|---|---|---|
| General | ||||||
| MMLU | Acc | 67.00 | 66.60 | 72.49 | 78.60 | 81.66 |
| MMLU-Pro | Acc | 43.30 | 44.42* | 49.22 | 56.90 | 71.26 |
| CMMLU | Acc | 58.82 | 59.67* | 67.53 | 75.70 | 76.43 |
| CEval | Acc | 57.98 | 66.76* | 66.54 | 72.72* | 70.81 |
| IFEval | Prompt Strict | 62.50 | 61.40 | 62.50 | 61.40 | 60.81 |
| Math | ||||||
| GSM8K | Acc | 81.00 | 83.70 | 89.01 | 91.30 | 91.89 |
| MATH | Acc | 39.20 | 61.60 | 73.50 | 78.60 | 84.26 |
| Coding | ||||||
| MBPP | Pass@1 | 58.80 | 57.10 | 77.75 | 72.00 | 84.05 |
| HumanEval | Pass@1 | 55.50 | 63.40 | 80.49 | 78.70 | 87.80 |
| Avg | 58.22 | 62.74 | 70.95 | 73.99 | 78.77 |
Note: For coding benchmarks, MBPP and HumanEval, we use the sampling setting alg="entropy", num_small_blocks=32, top_p=0.8, temperature=1. For other beachmarks, we use alg="entropy", num_small_blocks=8, top_p=1, temperature=0. All evaluations are performed using a sequence length of 28k tokens. The data marked with * in the above table were not officially reported; the metric was calculated independently using official code.
4. Deployment
4.1 Environment
Hardware Requirements
Atlas 800T A2 (64GB), please refer to [Atlas 800T A2] for obtaining the driver and firmware installation packages.
System Requirements & Dependencies
- System: Linux (OpenEuler ≥ 24.03 recommended)
- CANN==8.1.RC1: [CANN Install]
- python==3.10
- torch==2.6.0
- torch-npu==2.6.0
- transformers==4.53.2
The above software environment has been verified, and theoretically supports newer versions. For any questions, please submit an issue.
4.2 Inference Examples
The following provides a simple inference example of openPangu-R-7B-Diffusion based on the transformers framework:
Please modify generate.py and add the model path before running.
cd inference
python generate.py
Unlike benchmarking, to achieve optimal throughput, the sampling parameters should be set to alg="confidence_threshold", threshold=0.9, num_small_blocks=1, and an appropriate batch size should be selected based on the device.
5. Model License
Unless otherwise noted, openPangu-R-7B-Diffusion model is licensed under the terms and conditions of OPENPANGU MODEL LICENSE AGREEMENT VERSION 1.0, which is intended to be used permissively and enable the further development of artificial intelligence technologies. Please refer to the LICENSE file located in the root directory of the model repository for details.
6. Disclaimer
Due to the technical limitations inherent in the technology on which the openPangu-R-7B-Diffusion (“Model”) relies and the fact that the artificial intelligence generated content is automatically produced by Model, Huawei cannot make any guarantees regarding the following matters:
- The output of this Model is automatically generated via AI algorithms, it does not rule out the possibility that some of the information may be flawed, unreasonable, or cause discomfort, and the generated content does not represent Huawei's attitude or standpoint;
- There is no guarantee that this Model is 100% accurate, reliable, functional, timely, secure and safety, error-free, uninterrupted, continuously stable, or free of any faults;
- The output of this Model does not constitute any advices or decisions for you, and it does not guarantee the authenticity, completeness, accuracy, timeliness, legality, functionality, or practicality of the generated content. The generated content cannot replace professionals in medical, legal, and other fields in answering your questions. The generated content is for your reference only and does not represent any attitude, standpoint, or position of Huawei. You need to make independent judgments based on your actual situation, and Huawei does not assume any responsibilities.
7. Contact Us
If you have any comments or suggestions, please submit an issue or contact openPangu@huawei.com.


