natolambert commited on
Commit
6066108
·
verified ·
1 Parent(s): 2ce5fa6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +47 -30
README.md CHANGED
@@ -1,15 +1,18 @@
1
  ---
2
  license: apache-2.0
3
- base_model: allenai/Olmo-3-7B-Think-SFT
 
4
  language:
5
  - en
 
 
6
  ---
7
 
8
  ## Model Details
9
  <img alt="OLMo Logo" src="https://cdn-uploads.huggingface.co/production/uploads/65316953791d5a2611426c20/nC44-uxMD6J6H3OHxRtVU.png" width="242px" style="margin-left:'auto' margin-right:'auto' display:'block'">
10
 
11
 
12
- # Model Card for Olmo-3-32B-Instruct-SFT
13
 
14
  We introduce Olmo 3, a new family of 7B and 32B models both Instruct and Think variants. Long chain-of-thought thinking improves reasoning tasks like math and coding.
15
 
@@ -20,12 +23,12 @@ These models are pre-trained on the Dolma 3 dataset and post-trained on the Dolc
20
 
21
  The core models released in this batch include the following:
22
 
23
- | **Stage** | **[Olmo 3 7B Think]** | **[Olmo 3 32B Think]** | **[Olmo 3 7B Instruct]** | **[Olmo 3 32B Instruct]** |
24
- |--------------------------|---------------|---------------|---------------|---------------|
25
- | **Base Model** | [Olmo-3-7B](https://huggingface.co/allenai/Olmo-3-1025-7B) | [Olmo-3-32B](https://huggingface.co/allenai/Olmo-3-1125-32B) | | |
26
- | **SFT** | [Olmo-3-7B-Think-SFT](https://huggingface.co/allenai/Olmo-3-7B-Think-SFT) | [Olmo-3-32B-Think-SFT](https://huggingface.co/allenai/Olmo-3-32B-Think-SFT) | [Olmo-3-7B-Instruct-SFT](https://huggingface.co/allenai/Olmo-3-7B-Instruct-SFT) | [Olmo-3-32B-Instruct-SFT](https://huggingface.co/allenai/Olmo-3-32B-Instruct-SFT) |
27
- | **DPO** | [Olmo-3-7B-Think-DPO](https://huggingface.co/allenai/Olmo-3-7B-Think-DPO) | [Olmo-3-32B-Think-DPO](https://huggingface.co/allenai/Olmo-3-32B-Think-DPO) | [Olmo-3-7B-Instruct-DPO](https://huggingface.co/allenai/Olmo-3-7B-Instruct-DPO) | [Olmo-3-32B-Instruct-DPO](https://huggingface.co/allenai/Olmo-3-32B-Instruct-DPO) |
28
- | **Final Models (RLVR)** | [Olmo-3-7B-Think](https://huggingface.co/allenai/Olmo-3-7B-Think) | [Olmo-3-32B-Think](https://huggingface.co/allenai/Olmo-3-32B-Think) | [Olmo-3-7B-Instruct](https://huggingface.co/allenai/Olmo-3-7B-Instruct) | [Olmo-3-32B-Instruct](https://huggingface.co/allenai/Olmo-3-32B-Instruct) |
29
 
30
 
31
  ## Installation
@@ -40,8 +43,8 @@ pip install transformers>=4.57.0
40
  You can use OLMo with the standard HuggingFace transformers library:
41
  ```python
42
  from transformers import AutoModelForCausalLM, AutoTokenizer
43
- olmo = AutoModelForCausalLM.from_pretrained("allenai/Olmo-3-32B-Instruct-SFT")
44
- tokenizer = AutoTokenizer.from_pretrained("allenai/Olmo-3-32B-Instruct-SFT")
45
  message = ["Language modeling is "]
46
  inputs = tokenizer(message, return_tensors='pt', return_token_type_ids=False)
47
  # optional verifying cuda
@@ -54,7 +57,7 @@ print(tokenizer.batch_decode(response, skip_special_tokens=True)[0])
54
 
55
  For faster performance, you can quantize the model using the following method:
56
  ```python
57
- AutoModelForCausalLM.from_pretrained("allenai/Olmo-3-32B-Instruct-SFT",
58
  torch_dtype=torch.float16,
59
  load_in_8bit=True) # Requires bitsandbytes
60
  ```
@@ -68,13 +71,13 @@ We have released checkpoints for these models. For post-training, the naming con
68
 
69
  To load a specific model revision with HuggingFace, simply add the argument `revision`:
70
  ```bash
71
- olmo = AutoModelForCausalLM.from_pretrained("allenai/Olmo-3-32B-Instruct-SFT", revision="step_1375")
72
  ```
73
 
74
  Or, you can access all the revisions for the models via the following code snippet:
75
  ```python
76
  from huggingface_hub import list_repo_refs
77
- out = list_repo_refs("allenai/Olmo-3-32B-Instruct-SFT")
78
  branches = [b.name for b in out.branches]
79
  ```
80
 
@@ -98,7 +101,7 @@ For more documentation, see the [GitHub readme](https://github.com/allenai/OLMo-
98
  - **Language(s) (NLP):** English
99
  - **License:** This model is licensed under Apache 2.0. It is intended for research and educational use in accordance with Ai2's [Responsible Use Guidelines](https://allenai.org/responsible-use).
100
  - **Contact:** Technical inquiries: `[email protected]`. Press: `[email protected]`
101
- - **Date cutoff:** Dec. 2023.
102
 
103
 
104
  ### Model Sources
@@ -108,37 +111,51 @@ For more documentation, see the [GitHub readme](https://github.com/allenai/OLMo-
108
  - Open-Instruct for DPO and RLVR: https://github.com/allenai/open-instruct
109
  - OLMo-Core for pre-training and SFT: https://github.com/allenai/OLMo-core
110
  - OLMo-Eval for evaluation: https://github.com/allenai/OLMo-Eval
111
- - **Paper:** [TBD]
112
- <!-- - **Technical blog post:** (URL) -->
113
- <!-- - **W&B Logs:** [SFT](()), [DPO](()), [RLVR](()) -->
114
 
115
 
116
  ## Evaluation
117
 
118
- | **Model** | **Math** | | | **Reasoning** | | | **Coding** | | | **IF** | | **QA** | | |
119
- |----------|----------|----------|----------|-------------|-------------|-------------|-------------|-------------|-------------|-------------|-------------|-------------|-------------|-------------|
120
- | | AIME '24 | AIME '25 | OMEGA | BBH | Zebra Logic | AGI Eval | Human Eval+ | MBPP+ | LCB v3 | IFEval | IFBench | MMLU | PopQA | GPQA |
121
- | **Nemotron–Nano–9B–v2** | 72.1 | 58.9 | 42.4 | 86.2 | 60.8 | 83.1 | 89.7 | 66.1 | 83.4 | 86.0 | 34.6 | 84.3 | 17.9 | 56.2 |
122
- | **OpenThinker3–7B** | 67.7 | 57.2 | 38.4 | 77.1 | 34.9 | 78.6 | 87.4 | 61.4 | 68.2 | 51.7 | 23.0 | 77.4 | 18.0 | 48.0 |
123
- | **DeepSeek–R1–Qwen–7B** | 54.9 | 40.2 | 28.5 | 73.5 | 26.1 | 69.5 | 83.0 | 63.5 | 58.8 | 59.6 | 16.7 | 67.9 | 12.8 | 53.2 |
124
- | **Qwen 3 8B (w/ reasoning)** | **74.0** | **67.8** | 43.4 | 84.4 | 85.2 | 87.0 | 80.2 | **69.1** | **86.2** | **87.4** | 37.1 | 85.4 | 24.3 | 57.7 |
125
- | **Qwen 3 VL 8B Thinker** | 70.9 | 61.5 | 37.9 | **86.8** | **91.2** | **90.1** | 83.7 | 63.0 | 85.5 | 85.5 | 40.4 | **86.5** | **29.3** | **62.4** |
126
- | **OpenReasoning Nemo 7B** | 77.0 | 73.1 | 43.2 | 81.3 | 22.4 | 81.4 | 89.7 | 61.2 | 82.3 | 42.5 | — | 80.7 | 14.5 | 60.8 |
127
- | **Olmo-3-32B-Instruct-SFT** | | | | | | | | | | | | | | |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
128
 
129
  ## Model Details
130
 
131
  #### Stage 1: SFT
132
  - supervised fine-tuning on the Dolci-Think-SFT-7B dataset. This dataset consits of math, code, chat, and general knowledge queries.
133
- - Datasets: [Dolci-Think-SFT-7B](https://huggingface.co/datasets/allenai/dolci-thinking-sft), [Dolci-Instruct-SFT-7B](https://huggingface.co/datasets/allenai/dolci-instruct-sft)
134
 
135
  #### Stage 2:DPO
136
  - direct preference optimization on the Dolci-Think-DPO-7B dataset. This dataset consits of math, code, chat, and general knowledge queries.
137
- - Datasets: [Dolci-Think-DPO-7B](https://huggingface.co/datasets/allenai/dolci-thinking-dpo), [Dolci-Instruct-DPO-7B](https://huggingface.co/datasets/allenai/dolci-3-instruct-dpo-with-metadata)
138
 
139
  #### Stage 3: RLVR
140
  - reinforcement learning from verifiable rewards on the Dolci-Think-RL-7B dataset. This dataset consits of math, code, instruction-following, and general chat queries.
141
- - Datasets: [Dolci-Think-RL-7B](https://huggingface.co/datasets/allenai/Dolci-Think-RL-7B), [Dolci-Instruct-RL-7B](https://huggingface.co/datasets/allenai/Dolci-Instruct-RL-7B)
142
 
143
 
144
  ## Bias, Risks, and Limitations
 
1
  ---
2
  license: apache-2.0
3
+ base_model:
4
+ - allenai/Olmo-3-1125-32B
5
  language:
6
  - en
7
+ datasets:
8
+ - allenai/Dolci-Instruct-SFT
9
  ---
10
 
11
  ## Model Details
12
  <img alt="OLMo Logo" src="https://cdn-uploads.huggingface.co/production/uploads/65316953791d5a2611426c20/nC44-uxMD6J6H3OHxRtVU.png" width="242px" style="margin-left:'auto' margin-right:'auto' display:'block'">
13
 
14
 
15
+ # Model Card for Olmo-3.1-32B-Instruct-SFT
16
 
17
  We introduce Olmo 3, a new family of 7B and 32B models both Instruct and Think variants. Long chain-of-thought thinking improves reasoning tasks like math and coding.
18
 
 
23
 
24
  The core models released in this batch include the following:
25
 
26
+ | **Stage** | **Olmo 3 7B Think** | **Olmo (3/3.1) 32B Think** | **Olmo 3 7B Instruct** | **Olmo 3.1 32B Instruct** |
27
+ |--------------------------|-----------------------|------------------------|---------------------------|----------------------------|
28
+ | **Base Model** | [Olmo-3-7B](https://huggingface.co/allenai/Olmo-3-1025-7B) | [Olmo-3-32B](https://huggingface.co/allenai/Olmo-3-1125-32B) | [Olmo-3-7B](https://huggingface.co/allenai/Olmo-3-1025-7B) | [Olmo-3-32B](https://huggingface.co/allenai/Olmo-3-1125-32B) |
29
+ | **SFT** | [Olmo-3-7B-Think-SFT](https://huggingface.co/allenai/Olmo-3-7B-Think-SFT) | [Olmo-3-32B-Think-SFT](https://huggingface.co/allenai/Olmo-3-32B-Think-SFT) | [Olmo-3-7B-Instruct-SFT](https://huggingface.co/allenai/Olmo-3-7B-Instruct-SFT) | [Olmo-3.1-32B-Instruct-SFT](https://huggingface.co/allenai/Olmo-3.1-32B-Instruct-SFT) |
30
+ | **DPO** | [Olmo-3-7B-Think-DPO](https://huggingface.co/allenai/Olmo-3-7B-Think-DPO) | [Olmo-3-32B-Think-DPO](https://huggingface.co/allenai/Olmo-3-32B-Think-DPO) | [Olmo-3-7B-Instruct-DPO](https://huggingface.co/allenai/Olmo-3-7B-Instruct-DPO) | [Olmo-3.1-32B-Instruct-DPO](https://huggingface.co/allenai/Olmo-3.1-32B-Instruct-DPO) |
31
+ | **Final Models (RLVR)** | [Olmo-3-7B-Think](https://huggingface.co/allenai/Olmo-3-7B-Think) | [Olmo-3-32B-Think](https://huggingface.co/allenai/Olmo-3-32B-Think)<br>[Olmo-3.1-32B-Think](https://huggingface.co/allenai/Olmo-3.1-32B-Think) | [Olmo-3-7B-Instruct](https://huggingface.co/allenai/Olmo-3-7B-Instruct) | [Olmo-3.1-32B-Instruct](https://huggingface.co/allenai/Olmo-3.1-32B-Instruct) |
32
 
33
 
34
  ## Installation
 
43
  You can use OLMo with the standard HuggingFace transformers library:
44
  ```python
45
  from transformers import AutoModelForCausalLM, AutoTokenizer
46
+ olmo = AutoModelForCausalLM.from_pretrained("allenai/Olmo-3.1-32B-Instruct-SFT")
47
+ tokenizer = AutoTokenizer.from_pretrained("allenai/Olmo-3.1-32B-Instruct-SFT")
48
  message = ["Language modeling is "]
49
  inputs = tokenizer(message, return_tensors='pt', return_token_type_ids=False)
50
  # optional verifying cuda
 
57
 
58
  For faster performance, you can quantize the model using the following method:
59
  ```python
60
+ AutoModelForCausalLM.from_pretrained("allenai/Olmo-3.1-32B-Instruct-SFT",
61
  torch_dtype=torch.float16,
62
  load_in_8bit=True) # Requires bitsandbytes
63
  ```
 
71
 
72
  To load a specific model revision with HuggingFace, simply add the argument `revision`:
73
  ```bash
74
+ olmo = AutoModelForCausalLM.from_pretrained("allenai/Olmo-3.1-32B-Instruct-SFT", revision="step_1375")
75
  ```
76
 
77
  Or, you can access all the revisions for the models via the following code snippet:
78
  ```python
79
  from huggingface_hub import list_repo_refs
80
+ out = list_repo_refs("allenai/Olmo-3.1-32B-Instruct-SFT")
81
  branches = [b.name for b in out.branches]
82
  ```
83
 
 
101
  - **Language(s) (NLP):** English
102
  - **License:** This model is licensed under Apache 2.0. It is intended for research and educational use in accordance with Ai2's [Responsible Use Guidelines](https://allenai.org/responsible-use).
103
  - **Contact:** Technical inquiries: `[email protected]`. Press: `[email protected]`
104
+ - **Date cutoff:** Dec. 2024.
105
 
106
 
107
  ### Model Sources
 
111
  - Open-Instruct for DPO and RLVR: https://github.com/allenai/open-instruct
112
  - OLMo-Core for pre-training and SFT: https://github.com/allenai/OLMo-core
113
  - OLMo-Eval for evaluation: https://github.com/allenai/OLMo-Eval
114
+ - **Paper:**: https://allenai.org/papers/olmo3
 
 
115
 
116
 
117
  ## Evaluation
118
 
119
+ | Metric | **Olmo 3.1 32B Instruct SFT** | **Olmo 3.1 32B Instruct DPO** | **Olmo 3.1 32B Instruct** | Apertus 70B | Qwen 3 32B (No Think) | Qwen 3 VL 32B Instruct | Qwen 2.5 32B | Gemma 3 27B | Gemma 2 27B | OLMo 2 32B |
120
+ | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- |
121
+ | **Math** | | | | | | | | | | |
122
+ | MATH | 74.4 | 86.6 | 93.4 | 36.2 | 84.3 | 95.1 | 80.2 | 87.4 | 51.5 | 49.2 |
123
+ | AIME 2024 | 12.7 | 35.2 | 67.8 | 0.31 | 27.9 | 75.4 | 15.7 | 28.9 | 4.7 | 4.6 |
124
+ | AIME 2025 | 8.2 | 23.3 | 57.9 | 0.1 | 21.3 | 64.2 | 13.4 | 22.9 | 0.9 | 0.9 |
125
+ | OMEGA | 15.5 | 33.3 | 42.2 | 5.6 | 23.4 | 44.0 | 19.2 | 24.0 | 9.1 | 9.8 |
126
+ | **Reasoning** | | | | | | | | | | |
127
+ | BigBenchHard | 69.0 | 82.1 | 84.0 | 57.0 | 80.4 | 89.0 | 80.9 | 82.4 | 66.0 | 65.6 |
128
+ | ZebraLogic | 30.6 | 51.1 | 61.7 | 9.0 | 28.4 | 86.7 | 24.1 | 24.8 | 17.2 | 13.3 |
129
+ | AGI Eval English | 71.7 | 79.4 | 79.5 | 61.6 | 82.4 | 89.4 | 78.9 | 76.9 | 70.9 | 68.4 |
130
+ | **Coding** | | | | | | | | | | |
131
+ | HumanEvalPlus | 80.8 | 85.7 | 86.7 | 42.9 | 83.9 | 89.3 | 82.6 | 79.2 | 67.5 | 44.4 |
132
+ | MBPP+ | 61.5 | 63.6 | 65.1 | 45.8 | 67.9 | 69.0 | 66.6 | 65.7 | 61.2 | 49.0 |
133
+ | LiveCodeBench v3 | 35.4 | 49.6 | 54.7 | 9.7 | 57.5 | 70.2 | 49.9 | 39.0 | 28.7 | 10.6 |
134
+ | **IF** | | | | | | | | | | |
135
+ | IFEval | 87.7 | 87.3 | 88.8 | 70.4 | 87.5 | 88.1 | 81.9 | 85.4 | 62.1 | 85.8 |
136
+ | IFBench | 29.7 | 36.3 | 39.7 | 26.0 | 31.3 | 37.2 | 36.7 | 31.3 | 27.8 | 36.4 |
137
+ | **Knowledge & QA** | | | | | | | | | | |
138
+ | MMLU | 79.0 | 81.9 | 80.9 | 70.2 | 85.8 | 88.7 | 84.6 | 74.6 | 76.1 | 77.1 |
139
+ | PopQA | 23.7 | 28.5 | 25.0 | 33.5 | 25.9 | 25.7 | 28.0 | 30.2 | 30.4 | 37.2 |
140
+ | GPQA | 41.3 | 47.9 | 48.6 | 27.9 | 54.4 | 61.4 | 44.6 | 45.0 | 39.9 | 36.4 |
141
+ | **Chat** | | | | | | | | | | |
142
+ | AlpacaEval 2 LC | 42.2 | 69.7 | 59.8 | 19.9 | 67.9 | 84.3 | 81.9 | 65.5 | 39.8 | 38.0 |
143
+ | **Safety** | 92.1 | 88.9 | 89.5 | 77.1 | 81.6 | 85.8 | 82.2 | 68.8 | 74.4 | 84.2 |
144
+
145
 
146
  ## Model Details
147
 
148
  #### Stage 1: SFT
149
  - supervised fine-tuning on the Dolci-Think-SFT-7B dataset. This dataset consits of math, code, chat, and general knowledge queries.
150
+ - Datasets: [Dolci-Think-SFT-7B](https://huggingface.co/datasets/allenai/dolci-thinking-sft), [Dolci-Instruct-SFT](https://huggingface.co/datasets/allenai/dolci-instruct-sft)
151
 
152
  #### Stage 2:DPO
153
  - direct preference optimization on the Dolci-Think-DPO-7B dataset. This dataset consits of math, code, chat, and general knowledge queries.
154
+ - Datasets: [Dolci-Think-DPO-7B](https://huggingface.co/datasets/allenai/dolci-thinking-dpo), [Dolci-Instruct-DPO](https://huggingface.co/datasets/allenai/dolci-3-instruct-dpo-with-metadata)
155
 
156
  #### Stage 3: RLVR
157
  - reinforcement learning from verifiable rewards on the Dolci-Think-RL-7B dataset. This dataset consits of math, code, instruction-following, and general chat queries.
158
+ - Datasets: [Dolci-Think-RL-7B](https://huggingface.co/datasets/allenai/Dolci-Think-RL-7B), [Dolci-Instruct-RL](https://huggingface.co/datasets/allenai/Dolci-Instruct-RL-7B)
159
 
160
 
161
  ## Bias, Risks, and Limitations