Tamper-Resistant Safeguards for Open-Weight LLMs
Collection
Models & datasets from the paper "Tamper-Resistant Safeguards for Open-Weight LLMs" (https://arxiv.org/pdf/2408.00761) • 9 items • Updated • 5
Llama-3-8B-Instruct with a tamper-resistant safeguard applied via the TAR method.
ArXiv: https://arxiv.org/abs/2408.00761
Project Website: https://www.tamper-resistant-safeguards.com/
Base model
meta-llama/Meta-Llama-3-8B-Instruct