payelb commited on
Commit
c1ee3b2
·
verified ·
1 Parent(s): 856bdf2

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +23 -3
README.md CHANGED
@@ -1,3 +1,23 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - trl
5
+ - ppo
6
+ - lora
7
+ - alignment
8
+ - reward-modeling
9
+ - ultrafeedback
10
+ base_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
11
+ ---
12
+
13
+ # Aligned TinyLlama on UltraFeedback (fixed-1k prompt pool)
14
+
15
+ This model was aligned with **TRL PPO** using a reward model:
16
+ - **payelb/UltraFeedback_openbmb_deberta_1k_fixed_WoN** (tag: `won`)
17
+
18
+ Key settings:
19
+ - Prompt pool: restricted to the same fixed/selected 1k subset used for RM training (loaded from CSV)
20
+ - PPO updates: 200
21
+ - batch size: 4
22
+ - lr: 1e-05
23
+ - LoRA: r=16, alpha=32, dropout=0.05