Update README.md
Browse files
README.md
CHANGED
|
@@ -1,15 +1,18 @@
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
| 3 |
-
base_model:
|
|
|
|
| 4 |
language:
|
| 5 |
- en
|
|
|
|
|
|
|
| 6 |
---
|
| 7 |
|
| 8 |
## Model Details
|
| 9 |
<img alt="OLMo Logo" src="https://cdn-uploads.huggingface.co/production/uploads/65316953791d5a2611426c20/nC44-uxMD6J6H3OHxRtVU.png" width="242px" style="margin-left:'auto' margin-right:'auto' display:'block'">
|
| 10 |
|
| 11 |
|
| 12 |
-
# Model Card for Olmo-3-32B-Instruct-SFT
|
| 13 |
|
| 14 |
We introduce Olmo 3, a new family of 7B and 32B models both Instruct and Think variants. Long chain-of-thought thinking improves reasoning tasks like math and coding.
|
| 15 |
|
|
@@ -20,12 +23,12 @@ These models are pre-trained on the Dolma 3 dataset and post-trained on the Dolc
|
|
| 20 |
|
| 21 |
The core models released in this batch include the following:
|
| 22 |
|
| 23 |
-
| **Stage** | **
|
| 24 |
-
|
| 25 |
-
| **Base Model** | [Olmo-3-7B](https://huggingface.co/allenai/Olmo-3-1025-7B) | [Olmo-3-32B](https://huggingface.co/allenai/Olmo-3-1125-32B) | | |
|
| 26 |
-
| **SFT** | [Olmo-3-7B-Think-SFT](https://huggingface.co/allenai/Olmo-3-7B-Think-SFT) | [Olmo-3-32B-Think-SFT](https://huggingface.co/allenai/Olmo-3-32B-Think-SFT) | [Olmo-3-7B-Instruct-SFT](https://huggingface.co/allenai/Olmo-3-7B-Instruct-SFT) | [Olmo-3-32B-Instruct-SFT](https://huggingface.co/allenai/Olmo-3-32B-Instruct-SFT) |
|
| 27 |
-
| **DPO** | [Olmo-3-7B-Think-DPO](https://huggingface.co/allenai/Olmo-3-7B-Think-DPO) | [Olmo-3-32B-Think-DPO](https://huggingface.co/allenai/Olmo-3-32B-Think-DPO) | [Olmo-3-7B-Instruct-DPO](https://huggingface.co/allenai/Olmo-3-7B-Instruct-DPO) | [Olmo-3-32B-Instruct-DPO](https://huggingface.co/allenai/Olmo-3-32B-Instruct-DPO) |
|
| 28 |
-
| **Final Models (RLVR)** | [Olmo-3-7B-Think](https://huggingface.co/allenai/Olmo-3-7B-Think) | [Olmo-3-32B-Think](https://huggingface.co/allenai/Olmo-3-32B-Think) | [Olmo-3-7B-Instruct](https://huggingface.co/allenai/Olmo-3-7B-Instruct) | [Olmo-3-32B-Instruct](https://huggingface.co/allenai/Olmo-3-32B-Instruct) |
|
| 29 |
|
| 30 |
|
| 31 |
## Installation
|
|
@@ -40,8 +43,8 @@ pip install transformers>=4.57.0
|
|
| 40 |
You can use OLMo with the standard HuggingFace transformers library:
|
| 41 |
```python
|
| 42 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 43 |
-
olmo = AutoModelForCausalLM.from_pretrained("allenai/Olmo-3-32B-Instruct-SFT")
|
| 44 |
-
tokenizer = AutoTokenizer.from_pretrained("allenai/Olmo-3-32B-Instruct-SFT")
|
| 45 |
message = ["Language modeling is "]
|
| 46 |
inputs = tokenizer(message, return_tensors='pt', return_token_type_ids=False)
|
| 47 |
# optional verifying cuda
|
|
@@ -54,7 +57,7 @@ print(tokenizer.batch_decode(response, skip_special_tokens=True)[0])
|
|
| 54 |
|
| 55 |
For faster performance, you can quantize the model using the following method:
|
| 56 |
```python
|
| 57 |
-
AutoModelForCausalLM.from_pretrained("allenai/Olmo-3-32B-Instruct-SFT",
|
| 58 |
torch_dtype=torch.float16,
|
| 59 |
load_in_8bit=True) # Requires bitsandbytes
|
| 60 |
```
|
|
@@ -68,13 +71,13 @@ We have released checkpoints for these models. For post-training, the naming con
|
|
| 68 |
|
| 69 |
To load a specific model revision with HuggingFace, simply add the argument `revision`:
|
| 70 |
```bash
|
| 71 |
-
olmo = AutoModelForCausalLM.from_pretrained("allenai/Olmo-3-32B-Instruct-SFT", revision="step_1375")
|
| 72 |
```
|
| 73 |
|
| 74 |
Or, you can access all the revisions for the models via the following code snippet:
|
| 75 |
```python
|
| 76 |
from huggingface_hub import list_repo_refs
|
| 77 |
-
out = list_repo_refs("allenai/Olmo-3-32B-Instruct-SFT")
|
| 78 |
branches = [b.name for b in out.branches]
|
| 79 |
```
|
| 80 |
|
|
@@ -98,7 +101,7 @@ For more documentation, see the [GitHub readme](https://github.com/allenai/OLMo-
|
|
| 98 |
- **Language(s) (NLP):** English
|
| 99 |
- **License:** This model is licensed under Apache 2.0. It is intended for research and educational use in accordance with Ai2's [Responsible Use Guidelines](https://allenai.org/responsible-use).
|
| 100 |
- **Contact:** Technical inquiries: `[email protected]`. Press: `[email protected]`
|
| 101 |
-
- **Date cutoff:** Dec.
|
| 102 |
|
| 103 |
|
| 104 |
### Model Sources
|
|
@@ -108,37 +111,51 @@ For more documentation, see the [GitHub readme](https://github.com/allenai/OLMo-
|
|
| 108 |
- Open-Instruct for DPO and RLVR: https://github.com/allenai/open-instruct
|
| 109 |
- OLMo-Core for pre-training and SFT: https://github.com/allenai/OLMo-core
|
| 110 |
- OLMo-Eval for evaluation: https://github.com/allenai/OLMo-Eval
|
| 111 |
-
- **Paper
|
| 112 |
-
<!-- - **Technical blog post:** (URL) -->
|
| 113 |
-
<!-- - **W&B Logs:** [SFT](()), [DPO](()), [RLVR](()) -->
|
| 114 |
|
| 115 |
|
| 116 |
## Evaluation
|
| 117 |
|
| 118 |
-
|
|
| 119 |
-
|
| 120 |
-
|
|
| 121 |
-
|
|
| 122 |
-
|
|
| 123 |
-
|
|
| 124 |
-
|
|
| 125 |
-
| **
|
| 126 |
-
|
|
| 127 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 128 |
|
| 129 |
## Model Details
|
| 130 |
|
| 131 |
#### Stage 1: SFT
|
| 132 |
- supervised fine-tuning on the Dolci-Think-SFT-7B dataset. This dataset consits of math, code, chat, and general knowledge queries.
|
| 133 |
-
- Datasets: [Dolci-Think-SFT-7B](https://huggingface.co/datasets/allenai/dolci-thinking-sft), [Dolci-Instruct-SFT
|
| 134 |
|
| 135 |
#### Stage 2:DPO
|
| 136 |
- direct preference optimization on the Dolci-Think-DPO-7B dataset. This dataset consits of math, code, chat, and general knowledge queries.
|
| 137 |
-
- Datasets: [Dolci-Think-DPO-7B](https://huggingface.co/datasets/allenai/dolci-thinking-dpo), [Dolci-Instruct-DPO
|
| 138 |
|
| 139 |
#### Stage 3: RLVR
|
| 140 |
- reinforcement learning from verifiable rewards on the Dolci-Think-RL-7B dataset. This dataset consits of math, code, instruction-following, and general chat queries.
|
| 141 |
-
- Datasets: [Dolci-Think-RL-7B](https://huggingface.co/datasets/allenai/Dolci-Think-RL-7B), [Dolci-Instruct-RL
|
| 142 |
|
| 143 |
|
| 144 |
## Bias, Risks, and Limitations
|
|
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
| 3 |
+
base_model:
|
| 4 |
+
- allenai/Olmo-3-1125-32B
|
| 5 |
language:
|
| 6 |
- en
|
| 7 |
+
datasets:
|
| 8 |
+
- allenai/Dolci-Instruct-SFT
|
| 9 |
---
|
| 10 |
|
| 11 |
## Model Details
|
| 12 |
<img alt="OLMo Logo" src="https://cdn-uploads.huggingface.co/production/uploads/65316953791d5a2611426c20/nC44-uxMD6J6H3OHxRtVU.png" width="242px" style="margin-left:'auto' margin-right:'auto' display:'block'">
|
| 13 |
|
| 14 |
|
| 15 |
+
# Model Card for Olmo-3.1-32B-Instruct-SFT
|
| 16 |
|
| 17 |
We introduce Olmo 3, a new family of 7B and 32B models both Instruct and Think variants. Long chain-of-thought thinking improves reasoning tasks like math and coding.
|
| 18 |
|
|
|
|
| 23 |
|
| 24 |
The core models released in this batch include the following:
|
| 25 |
|
| 26 |
+
| **Stage** | **Olmo 3 7B Think** | **Olmo (3/3.1) 32B Think** | **Olmo 3 7B Instruct** | **Olmo 3.1 32B Instruct** |
|
| 27 |
+
|--------------------------|-----------------------|------------------------|---------------------------|----------------------------|
|
| 28 |
+
| **Base Model** | [Olmo-3-7B](https://huggingface.co/allenai/Olmo-3-1025-7B) | [Olmo-3-32B](https://huggingface.co/allenai/Olmo-3-1125-32B) | [Olmo-3-7B](https://huggingface.co/allenai/Olmo-3-1025-7B) | [Olmo-3-32B](https://huggingface.co/allenai/Olmo-3-1125-32B) |
|
| 29 |
+
| **SFT** | [Olmo-3-7B-Think-SFT](https://huggingface.co/allenai/Olmo-3-7B-Think-SFT) | [Olmo-3-32B-Think-SFT](https://huggingface.co/allenai/Olmo-3-32B-Think-SFT) | [Olmo-3-7B-Instruct-SFT](https://huggingface.co/allenai/Olmo-3-7B-Instruct-SFT) | [Olmo-3.1-32B-Instruct-SFT](https://huggingface.co/allenai/Olmo-3.1-32B-Instruct-SFT) |
|
| 30 |
+
| **DPO** | [Olmo-3-7B-Think-DPO](https://huggingface.co/allenai/Olmo-3-7B-Think-DPO) | [Olmo-3-32B-Think-DPO](https://huggingface.co/allenai/Olmo-3-32B-Think-DPO) | [Olmo-3-7B-Instruct-DPO](https://huggingface.co/allenai/Olmo-3-7B-Instruct-DPO) | [Olmo-3.1-32B-Instruct-DPO](https://huggingface.co/allenai/Olmo-3.1-32B-Instruct-DPO) |
|
| 31 |
+
| **Final Models (RLVR)** | [Olmo-3-7B-Think](https://huggingface.co/allenai/Olmo-3-7B-Think) | [Olmo-3-32B-Think](https://huggingface.co/allenai/Olmo-3-32B-Think)<br>[Olmo-3.1-32B-Think](https://huggingface.co/allenai/Olmo-3.1-32B-Think) | [Olmo-3-7B-Instruct](https://huggingface.co/allenai/Olmo-3-7B-Instruct) | [Olmo-3.1-32B-Instruct](https://huggingface.co/allenai/Olmo-3.1-32B-Instruct) |
|
| 32 |
|
| 33 |
|
| 34 |
## Installation
|
|
|
|
| 43 |
You can use OLMo with the standard HuggingFace transformers library:
|
| 44 |
```python
|
| 45 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 46 |
+
olmo = AutoModelForCausalLM.from_pretrained("allenai/Olmo-3.1-32B-Instruct-SFT")
|
| 47 |
+
tokenizer = AutoTokenizer.from_pretrained("allenai/Olmo-3.1-32B-Instruct-SFT")
|
| 48 |
message = ["Language modeling is "]
|
| 49 |
inputs = tokenizer(message, return_tensors='pt', return_token_type_ids=False)
|
| 50 |
# optional verifying cuda
|
|
|
|
| 57 |
|
| 58 |
For faster performance, you can quantize the model using the following method:
|
| 59 |
```python
|
| 60 |
+
AutoModelForCausalLM.from_pretrained("allenai/Olmo-3.1-32B-Instruct-SFT",
|
| 61 |
torch_dtype=torch.float16,
|
| 62 |
load_in_8bit=True) # Requires bitsandbytes
|
| 63 |
```
|
|
|
|
| 71 |
|
| 72 |
To load a specific model revision with HuggingFace, simply add the argument `revision`:
|
| 73 |
```bash
|
| 74 |
+
olmo = AutoModelForCausalLM.from_pretrained("allenai/Olmo-3.1-32B-Instruct-SFT", revision="step_1375")
|
| 75 |
```
|
| 76 |
|
| 77 |
Or, you can access all the revisions for the models via the following code snippet:
|
| 78 |
```python
|
| 79 |
from huggingface_hub import list_repo_refs
|
| 80 |
+
out = list_repo_refs("allenai/Olmo-3.1-32B-Instruct-SFT")
|
| 81 |
branches = [b.name for b in out.branches]
|
| 82 |
```
|
| 83 |
|
|
|
|
| 101 |
- **Language(s) (NLP):** English
|
| 102 |
- **License:** This model is licensed under Apache 2.0. It is intended for research and educational use in accordance with Ai2's [Responsible Use Guidelines](https://allenai.org/responsible-use).
|
| 103 |
- **Contact:** Technical inquiries: `[email protected]`. Press: `[email protected]`
|
| 104 |
+
- **Date cutoff:** Dec. 2024.
|
| 105 |
|
| 106 |
|
| 107 |
### Model Sources
|
|
|
|
| 111 |
- Open-Instruct for DPO and RLVR: https://github.com/allenai/open-instruct
|
| 112 |
- OLMo-Core for pre-training and SFT: https://github.com/allenai/OLMo-core
|
| 113 |
- OLMo-Eval for evaluation: https://github.com/allenai/OLMo-Eval
|
| 114 |
+
- **Paper:**: https://allenai.org/papers/olmo3
|
|
|
|
|
|
|
| 115 |
|
| 116 |
|
| 117 |
## Evaluation
|
| 118 |
|
| 119 |
+
| Metric | **Olmo 3.1 32B Instruct SFT** | **Olmo 3.1 32B Instruct DPO** | **Olmo 3.1 32B Instruct** | Apertus 70B | Qwen 3 32B (No Think) | Qwen 3 VL 32B Instruct | Qwen 2.5 32B | Gemma 3 27B | Gemma 2 27B | OLMo 2 32B |
|
| 120 |
+
| :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- |
|
| 121 |
+
| **Math** | | | | | | | | | | |
|
| 122 |
+
| MATH | 74.4 | 86.6 | 93.4 | 36.2 | 84.3 | 95.1 | 80.2 | 87.4 | 51.5 | 49.2 |
|
| 123 |
+
| AIME 2024 | 12.7 | 35.2 | 67.8 | 0.31 | 27.9 | 75.4 | 15.7 | 28.9 | 4.7 | 4.6 |
|
| 124 |
+
| AIME 2025 | 8.2 | 23.3 | 57.9 | 0.1 | 21.3 | 64.2 | 13.4 | 22.9 | 0.9 | 0.9 |
|
| 125 |
+
| OMEGA | 15.5 | 33.3 | 42.2 | 5.6 | 23.4 | 44.0 | 19.2 | 24.0 | 9.1 | 9.8 |
|
| 126 |
+
| **Reasoning** | | | | | | | | | | |
|
| 127 |
+
| BigBenchHard | 69.0 | 82.1 | 84.0 | 57.0 | 80.4 | 89.0 | 80.9 | 82.4 | 66.0 | 65.6 |
|
| 128 |
+
| ZebraLogic | 30.6 | 51.1 | 61.7 | 9.0 | 28.4 | 86.7 | 24.1 | 24.8 | 17.2 | 13.3 |
|
| 129 |
+
| AGI Eval English | 71.7 | 79.4 | 79.5 | 61.6 | 82.4 | 89.4 | 78.9 | 76.9 | 70.9 | 68.4 |
|
| 130 |
+
| **Coding** | | | | | | | | | | |
|
| 131 |
+
| HumanEvalPlus | 80.8 | 85.7 | 86.7 | 42.9 | 83.9 | 89.3 | 82.6 | 79.2 | 67.5 | 44.4 |
|
| 132 |
+
| MBPP+ | 61.5 | 63.6 | 65.1 | 45.8 | 67.9 | 69.0 | 66.6 | 65.7 | 61.2 | 49.0 |
|
| 133 |
+
| LiveCodeBench v3 | 35.4 | 49.6 | 54.7 | 9.7 | 57.5 | 70.2 | 49.9 | 39.0 | 28.7 | 10.6 |
|
| 134 |
+
| **IF** | | | | | | | | | | |
|
| 135 |
+
| IFEval | 87.7 | 87.3 | 88.8 | 70.4 | 87.5 | 88.1 | 81.9 | 85.4 | 62.1 | 85.8 |
|
| 136 |
+
| IFBench | 29.7 | 36.3 | 39.7 | 26.0 | 31.3 | 37.2 | 36.7 | 31.3 | 27.8 | 36.4 |
|
| 137 |
+
| **Knowledge & QA** | | | | | | | | | | |
|
| 138 |
+
| MMLU | 79.0 | 81.9 | 80.9 | 70.2 | 85.8 | 88.7 | 84.6 | 74.6 | 76.1 | 77.1 |
|
| 139 |
+
| PopQA | 23.7 | 28.5 | 25.0 | 33.5 | 25.9 | 25.7 | 28.0 | 30.2 | 30.4 | 37.2 |
|
| 140 |
+
| GPQA | 41.3 | 47.9 | 48.6 | 27.9 | 54.4 | 61.4 | 44.6 | 45.0 | 39.9 | 36.4 |
|
| 141 |
+
| **Chat** | | | | | | | | | | |
|
| 142 |
+
| AlpacaEval 2 LC | 42.2 | 69.7 | 59.8 | 19.9 | 67.9 | 84.3 | 81.9 | 65.5 | 39.8 | 38.0 |
|
| 143 |
+
| **Safety** | 92.1 | 88.9 | 89.5 | 77.1 | 81.6 | 85.8 | 82.2 | 68.8 | 74.4 | 84.2 |
|
| 144 |
+
|
| 145 |
|
| 146 |
## Model Details
|
| 147 |
|
| 148 |
#### Stage 1: SFT
|
| 149 |
- supervised fine-tuning on the Dolci-Think-SFT-7B dataset. This dataset consits of math, code, chat, and general knowledge queries.
|
| 150 |
+
- Datasets: [Dolci-Think-SFT-7B](https://huggingface.co/datasets/allenai/dolci-thinking-sft), [Dolci-Instruct-SFT](https://huggingface.co/datasets/allenai/dolci-instruct-sft)
|
| 151 |
|
| 152 |
#### Stage 2:DPO
|
| 153 |
- direct preference optimization on the Dolci-Think-DPO-7B dataset. This dataset consits of math, code, chat, and general knowledge queries.
|
| 154 |
+
- Datasets: [Dolci-Think-DPO-7B](https://huggingface.co/datasets/allenai/dolci-thinking-dpo), [Dolci-Instruct-DPO](https://huggingface.co/datasets/allenai/dolci-3-instruct-dpo-with-metadata)
|
| 155 |
|
| 156 |
#### Stage 3: RLVR
|
| 157 |
- reinforcement learning from verifiable rewards on the Dolci-Think-RL-7B dataset. This dataset consits of math, code, instruction-following, and general chat queries.
|
| 158 |
+
- Datasets: [Dolci-Think-RL-7B](https://huggingface.co/datasets/allenai/Dolci-Think-RL-7B), [Dolci-Instruct-RL](https://huggingface.co/datasets/allenai/Dolci-Instruct-RL-7B)
|
| 159 |
|
| 160 |
|
| 161 |
## Bias, Risks, and Limitations
|