Need4Speed

company
Activity Feed

AI & ML interests

None defined yet.

wenhuachย 
posted an update 8 days ago
view post
Post
4478
๐Ÿš€ We provide **free** hardware to quantize models at the [Intel Low Bit Open LLM Leaderboard]( Intel/low_bit_open_llm_leaderboard), currently supporting Pure RTN mode powered by AutoRound

โญ If you find it useful, please consider starring the AutoRound project on [GitHub](https://github.com/intel/auto-round)!
  • 7 replies
ยท
wenhuachย 
posted an update 6 months ago
wenhuachย 
posted an update 7 months ago
view post
Post
324
๐Ÿš€ AutoRound(https://github.com/intel/auto-round) is now supported by SGLang!

After integrations with TorchAO, Transformers, and VLLM, AutoRound-quantized models are now officially compatible with SGLang โ€” bringing faster and more flexible deployment to your LLM workflows.

๐Ÿ’ก Weโ€™ve also enhanced the RTN mode (--iters 0), cutting quantization costs significantly for low-resource users.

โญ Star our repo and stay tuned for more exciting updates!
wenhuachย 
posted an update 8 months ago
wenhuachย 
posted an update 9 months ago
wenhuachย 
posted an update 10 months ago
view post
Post
1952
๐Ÿš€ AutoRound(https://github.com/intel/auto-round) Now Supports GGUF Export & Custom Bit Settings!

We're excited to announce that AutoRound now supports:
โœ… GGUF format export โ€“ for seamless compatibility with popular inference engines.
โœ… Custom bit settings โ€“ tailor quantization to your needs for optimal performance.

Check out these newly released models:
๐Ÿ”นIntel/Qwen3-235B-A22B-Instruct-2507-gguf-q4km-AutoRound
๐Ÿ”นIntel/Qwen3-235B-A22B-Instruct-2507-gguf-q2ks-mixed-AutoRound
๐Ÿ”นIntel/Kimi-K2-Instruct-gguf-q2ks-mixed-AutoRound

Stay tuned! An even more advanced algorithm for some configurations is coming soon.
wenhuachย 
posted an update about 1 year ago
view post
Post
1916
AutoRound(https://github.com/intel/auto-round) has been integrated into vLLM , allowing you to run AutoRound-formatted models directly in the upcoming release.

Beside, we strongly recommend using AutoRound to generate AWQ INT4 models, as AutoAWQ is no longer maintained and manually configuring new models is not trivial due to the need for custom layer mappings.
loubnabnlย 
posted an update about 1 year ago
wenhuachย 
posted an update about 1 year ago
wenhuachย 
posted an update over 1 year ago
view post
Post
2544
Check out [DeepSeek-R1 INT2 model( OPEA/DeepSeek-R1-int2-mixed-sym-inc). This 200GB DeepSeek-R1 model shows only about a 2% drop in MMLU, though it's quite slow due to kernel issue.

| | BF16 | INT2-mixed |
| ------------- | ------ | ---------- |
| mmlu | 0.8514 | 0.8302 |
| hellaswag | 0.6935 | 0.6657 |
| winogrande | 0.7932 | 0.7940 |
| arc_challenge | 0.6212 | 0.6084 |