need-for-speed (Need4Speed)

posted an update 8 days ago

Post

4478

🚀 We provide **free** hardware to quantize models at the [Intel Low Bit Open LLM Leaderboard]( Intel/low_bit_open_llm_leaderboard), currently supporting Pure RTN mode powered by AutoRound

⭐ If you find it useful, please consider starring the AutoRound project on [GitHub](https://github.com/intel/auto-round)!

7 replies

·

Haihao

authored a paper 6 months ago

SignRoundV2: Closing the Performance Gap in Extremely Low-Bit Post-Training Quantization for LLMs

Paper • 2512.04746 • Published Dec 4, 2025 • 14

wenhuach

authored a paper 6 months ago

SignRoundV2: Closing the Performance Gap in Extremely Low-Bit Post-Training Quantization for LLMs

Paper • 2512.04746 • Published Dec 4, 2025 • 14

wenhuach

posted an update 6 months ago

Post

3015

🚀 SignRoundV2 for LLM quantization: PTQ-level cost, QAT-level accuracy — yes, even at 2 bits.

SignRoundV2: Closing the Performance Gap in Extremely Low-Bit Post-Training Quantization for LLMs (2512.04746)

wenhuach

posted an update 7 months ago

Post

324

🚀 AutoRound(https://github.com/intel/auto-round) is now supported by SGLang!

After integrations with TorchAO, Transformers, and VLLM, AutoRound-quantized models are now officially compatible with SGLang — bringing faster and more flexible deployment to your LLM workflows.

💡 We’ve also enhanced the RTN mode (--iters 0), cutting quantization costs significantly for low-resource users.

⭐ Star our repo and stay tuned for more exciting updates!

wenhuach

posted an update 8 months ago

Post

1774

AutoRound keeps evolving its LLM quantization algorithm! 🚀
After enhancing W2A16 quantization, we now offer a fast algorithm to generate mixed bits/data-type schemes (~2mins for 8B models), great for MXFP4 and W2A16.
Learn more: https://github.com/intel/auto-round/blob/main/docs/step_by_step.md#autoscheme

lvwerra

authored a paper 8 months ago

BigCodeArena: Unveiling More Reliable Human Preferences in Code Generation via Execution

Paper • 2510.08697 • Published Oct 9, 2025 • 40

wenhuach

posted an update 9 months ago

Post

428

AutoRound v0.7 is out! 🚀
This release includes enhanced algorithms for W2A16, NVFP4, and MXFP4, along with support for FP8 models as input.
👉 Check out the full details here: https://github.com/intel/auto-round/releases/tag/v0.7.0

wenhuach

posted an update 10 months ago

Post

1952

🚀 AutoRound(https://github.com/intel/auto-round) Now Supports GGUF Export & Custom Bit Settings!

We're excited to announce that AutoRound now supports:
✅ GGUF format export – for seamless compatibility with popular inference engines.
✅ Custom bit settings – tailor quantization to your needs for optimal performance.

Check out these newly released models:
🔹Intel/Qwen3-235B-A22B-Instruct-2507-gguf-q4km-AutoRound
🔹Intel/Qwen3-235B-A22B-Instruct-2507-gguf-q2ks-mixed-AutoRound
🔹Intel/Kimi-K2-Instruct-gguf-q2ks-mixed-AutoRound

Stay tuned! An even more advanced algorithm for some configurations is coming soon.

lvwerra

authored a paper 11 months ago

FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language

Paper • 2506.20920 • Published Jun 26, 2025 • 78

loubnabnl

authored a paper 12 months ago

The Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text

Paper • 2506.05209 • Published Jun 5, 2025 • 61

zhentaoyu

authored a paper about 1 year ago

HunyuanPortrait: Implicit Condition Control for Enhanced Portrait Animation

Paper • 2503.18860 • Published Mar 24, 2025 • 6

wenhuach

posted an update about 1 year ago

Post

1916

AutoRound(https://github.com/intel/auto-round) has been integrated into vLLM , allowing you to run AutoRound-formatted models directly in the upcoming release.

Beside, we strongly recommend using AutoRound to generate AWQ INT4 models, as AutoAWQ is no longer maintained and manually configuring new models is not trivial due to the need for custom layer mappings.

loubnabnl

posted an update about 1 year ago

Post

7478

SmolVLM is now available on PocketPal — you can run it offline on your smartphone to interpret the world around you. 🌍📱

And check out this real-time camera demo by @ngxson , powered by llama.cpp:
https://github.com/ngxson/smolvlm-realtime-webcam
https://x.com/pocketpal_ai

5 replies

·

zhentaoyu

authored a paper about 1 year ago

HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation

Paper • 2505.04512 • Published May 7, 2025 • 36

wenhuach

posted an update about 1 year ago

Post

1944

AutoRound(https://github.com/intel/auto-round) has been integrated into Transformers, allowing you to run AutoRound-formatted models directly in the upcoming release. Additionally, we are actively working on supporting the GGUF double-quant format, e.g. q4_k_s, stay tuned!

https://huggingface.co/blog/autoround

lvwerra

authored a paper about 1 year ago

SmolVLM: Redefining small and efficient multimodal models

Paper • 2504.05299 • Published Apr 7, 2025 • 208

loubnabnl

authored a paper about 1 year ago

SmolVLM: Redefining small and efficient multimodal models

Paper • 2504.05299 • Published Apr 7, 2025 • 208

Haihao

authored a paper about 1 year ago

Faster Inference of LLMs using FP8 on the Intel Gaudi

Paper • 2503.09975 • Published Mar 13, 2025 • 1

wenhuach

posted an update over 1 year ago

Post

2544

Check out [DeepSeek-R1 INT2 model( OPEA/DeepSeek-R1-int2-mixed-sym-inc). This 200GB DeepSeek-R1 model shows only about a 2% drop in MMLU, though it's quite slow due to kernel issue.

| | BF16 | INT2-mixed |
| ------------- | ------ | ---------- |
| mmlu | 0.8514 | 0.8302 |
| hellaswag | 0.6935 | 0.6657 |
| winogrande | 0.7932 | 0.7940 |
| arc_challenge | 0.6212 | 0.6084 |

AI & ML interests

Team members 18

need-for-speed's activity