--- language: - en license: mit library_name: mlx pipeline_tag: image-text-to-text base_model: showlab/ShowUI-2B tags: - mlx - mlx-vlm - safetensors - apple-silicon - conversational - gui - vision-language-model - qwen2_vl - showui - gui-agents - vision-language-action - computer-use - grounding - 6-bit - quantized --- # ShowUI-2B 6bit This is a 6-bit quantized MLX conversion of [showlab/ShowUI-2B](https://huggingface.co/showlab/ShowUI-2B), optimized for Apple Silicon. ShowUI is a lightweight `2B` vision-language-action model designed for GUI agents. Upstream, it is framed around GUI grounding and UI navigation, with point-style localization and atomic action dictionaries over screenshots. This artifact was derived from the validated local MLX `bf16` reference conversion and then quantized with `mlx-vlm`. It was validated locally with both `mlx_vlm` prompt-packet checks and `vllm-mlx` OpenAI-compatible serve checks. ## Conversion Details | Field | Value | |---|---| | Upstream model | `showlab/ShowUI-2B` | | Artifact type | `6bit quantized MLX conversion` | | Source artifact | local validated `bf16` MLX artifact | | Repo action | `update existing mlx-community repo` | | Conversion tool | `mlx_vlm.convert` via `mlx-vlm 0.3.12` | | Python | `3.11.14` | | MLX | `0.31.0` | | Transformers | `5.2.0` | | Validation backend | `vllm-mlx (phase/p1 @ 8a5d41b)` | | Quantization | `6bit` | | Group size | `64` | | Quantization mode | `affine` | | Converter dtype note | `bfloat16` | | Reported effective bits per weight | `9.088` | | Artifact size | `2.60G` | | Template repair | `tokenizer_config.json["chat_template"]` was re-injected after quantization | Additional notes: - This quantized artifact inherits the fresh-source posture of the validated local `bf16` base artifact. - `chat_template.json`, `chat_template.jinja`, and `tokenizer_config.json["chat_template"]` were kept aligned after quantization. - This family was validated on the Track B packet revision aligned to ShowUI's native point/action contract. ## Validation This artifact passed local validation in this workspace: - `mlx_vlm` prompt-packet validation: `PASS` - `vllm-mlx` OpenAI-compatible serve validation: `PASS` Local validation notes: - All four Track B packet prompts matched the local `bf16` outputs exactly. - The same coordinate drift between non-stream and streamed serve outputs remained present. - No new regression appeared in packet shape, multimodal detection, or the serve path after quantization. ## Performance - Artifact size on disk: `2.60G` - Local fixed-packet `mlx_vlm` validation used about `4.35 GB` peak memory - Local `vllm-mlx` serve validation completed in about `20.15s` non-stream and `21.13s` streamed These are local validation measurements, not a full benchmark suite. ## Usage ### Install ```bash pip install -U mlx-vlm ``` ### CLI ```bash python -m mlx_vlm.generate \ --model mlx-community/ShowUI-2B-6bit-v2 \ --image path/to/image.png \ --prompt "Based on the screenshot, return the clickable location for the API Host field as [x, y] on a 0-1 scale." \ --max-tokens 128 \ --temperature 0.0 ``` ### Python ```python from mlx_vlm import load, generate model, processor = load("mlx-community/ShowUI-2B-6bit-v2") result = generate( model, processor, prompt="Based on the screenshot, return the clickable location for the API Host field as [x, y] on a 0-1 scale.", image="path/to/image.png", max_tokens=128, temp=0.0, ) print(result.text) ``` ### vllm-mlx Serve ```bash python -m vllm_mlx.cli serve mlx-community/ShowUI-2B-6bit-v2 --mllm --localhost --port 8000 ``` ## Links - Upstream model: [showlab/ShowUI-2B](https://huggingface.co/showlab/ShowUI-2B) - Paper: [ShowUI: One Vision-Language-Action Model for GUI Visual Agent](https://arxiv.org/abs/2411.17465) - GitHub: [showlab/ShowUI](https://github.com/showlab/ShowUI/tree/main) - Demo Space: [showlab/ShowUI Space](https://huggingface.co/spaces/showlab/ShowUI) - Dataset: [showlab/ShowUI-desktop-8K](https://huggingface.co/datasets/showlab/ShowUI-desktop-8K) - Base model lineage: [Qwen/Qwen2-VL-2B-Instruct](https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct) - MLX framework: [ml-explore/mlx](https://github.com/ml-explore/mlx) - mlx-vlm: [Blaizzy/mlx-vlm](https://github.com/Blaizzy/mlx-vlm) ## Other Quantizations Planned sibling repos in this wave: - [`mlx-community/ShowUI-2B-bf16-v2`](https://huggingface.co/mlx-community/ShowUI-2B-bf16-v2) - [`mlx-community/ShowUI-2B-6bit-v2`](https://huggingface.co/mlx-community/ShowUI-2B-6bit-v2) - this model ## Notes and Limitations - This card reports local MLX conversion and validation results only. - Upstream benchmark claims belong to the original ShowUI model family and were not re-run here unless explicitly stated. - This family remains tied to the Track B point/action packet rather than the Track A bounding-box packet. - The original `mlx-community/ShowUI-2B-bf16-6bit` repo already existed, so this refreshed artifact is published under the `-v2` repo id. ## Citation If you use this MLX conversion, please also cite the original ShowUI work: ```bibtex @misc{lin2024showui, title={ShowUI: One Vision-Language-Action Model for GUI Visual Agent}, author={Kevin Qinghong Lin and Linjie Li and Difei Gao and Zhengyuan Yang and Shiwei Wu and Zechen Bai and Weixian Lei and Lijuan Wang and Mike Zheng Shou}, year={2024}, eprint={2411.17465}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2411.17465}, } ``` ## License This repo follows the upstream model license: MIT. See the upstream model card for the authoritative license details: [showlab/ShowUI-2B](https://huggingface.co/showlab/ShowUI-2B).