My LoRA Fine-Tuned AI-generated Detector

This is a e5-small model fine-tuned with LoRA for sequence classification tasks. It is optimized to classify text into AI-generated or human-written with high accuracy.

Label_0: Represents human-written content.
Label_1: Represents AI-generated content.

Model Details

Base Model: intfloat/e5-small
Fine-Tuning Technique: LoRA (Low-Rank Adaptation)
Task: Sequence Classification
Use Cases: Text classification for AI-generated detection.
Hyperparameters:
- Learning rate: 5e-5
- Epochs: 3
- LoRA rank: 8
- LoRA alpha: 16

Training Details

Dataset:
- 10,000 twitters and 10,000 rewritten twitters with GPT-4o-mini.
- 80,000 human-written text from RAID-train.
- 128,000 AI-generated text from RAID-train.
Hardware: Fine-tuned on a single NVIDIA A100 GPU.
Training Time: Approximately 2 hours.
Evaluation Metrics:

Metric (Raw) E5-small Fine-tuned

Accuracy 65.2% 89.0%

F1 Score 0.653 0.887

AUC 0.697 0.976

Metric	(Raw) E5-small	Fine-tuned
Accuracy	65.2%	89.0%
F1 Score	0.653	0.887
AUC	0.697	0.976

Collaborators

Menglin Zhou
Jiaping Liu
Xiaotian Zhan

Citation

If you use this model, please cite the RAID dataset as follows:

@inproceedings{dugan-etal-2024-raid,
    title = "{RAID}: A Shared Benchmark for Robust Evaluation of Machine-Generated Text Detectors",
    author = "Dugan, Liam  and
      Hwang, Alyssa  and
      Trhl{\'\i}k, Filip  and
      Zhu, Andrew  and
      Ludan, Josh Magnus  and
      Xu, Hainiu  and
      Ippolito, Daphne  and
      Callison-Burch, Chris",
    booktitle = "Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = aug,
    year = "2024",
    address = "Bangkok, Thailand",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.acl-long.674",
    pages = "12463--12492",
}

Downloads last month: 1,098

Safetensors

Model size

33.4M params

Tensor type

F32

Model tree for MayZhou/e5-small-lora-ai-generated-detector

Base model

intfloat/e5-small

Finetuned

(5)

this model

Quantizations

3 models

Dataset used to train MayZhou/e5-small-lora-ai-generated-detector

Space using MayZhou/e5-small-lora-ai-generated-detector 1

Evaluation results

accuracy on RAID-test
RAID Benchmark Leaderboard

0.939