| --- |
| license: mit |
| base_model: deepseek-ai/DeepSeek-OCR |
| tags: |
| - quantization |
| - int8 |
| - uniform-quantization |
| - model-compression |
| --- |
| |
| # Uniform INT8 Quantized DeepSeek-OCR |
|
|
| This model is a uniformly quantized version of [deepseek-ai/DeepSeek-OCR](https://huggingface.co/deepseek-ai/DeepSeek-OCR). |
|
|
| ## Quantization Details |
|
|
| - **Method**: Uniform INT8 quantization |
| - **Quantized Layers**: 2342 |
| - **Vision Layers**: 96 @ 8-bit |
| - **Language Layers**: 2197 @ 8-bit |
| - **Average Bit-width**: 8.00 |
| - **Original Size**: 6363.12 MB |
| - **Compressed Size**: 3351.56 MB |
| - **Compression Ratio**: 1.90x |
|
|
| ## Model Files |
|
|
| - `quantized_weights.pt`: Quantized model weights |
| - `quantization_info.json`: Layer-wise quantization configuration |
| - `layer_configs.json`: Detailed layer configurations |
| - `compression_stats.json`: Compression statistics |
| - `layer_analysis.json`: Modality analysis (vision/language/other) |
|
|
| ## Usage |
|
|
| ```python |
| import torch |
| from transformers import AutoTokenizer |
| |
| # Load tokenizer |
| tokenizer = AutoTokenizer.from_pretrained("SamMikaelson/deepseek-ocr-int8-quantized", trust_remote_code=True) |
| |
| # Load quantized weights |
| state_dict = torch.load("quantized_weights.pt") |
| # Note: You'll need the QuantizedLinear class to properly load and use this model |
| ``` |
|
|
| ## Baseline Characteristics |
|
|
| This uniform quantization approach: |
| - Applies the **same 8-bit** quantization to ALL layers |
| - **Does not distinguish** between vision and language modalities |
| - Serves as a **baseline** for comparison with modality-aware methods |
|
|
| ## Citation |
|
|
| If you use this model, please cite the original model and mention the uniform quantization approach. |
|
|