BART pre-training?

cahya · July 31, 2020, 8:59pm

Hi,
How can I pre-train Bart with our own dataset? It seems that the script examples/language-modeling/run_language_modeling.py doesn’t support it yet. Thanks.

valhalla · August 1, 2020, 4:38am

Hi @cahya, BART pre-training is not yet available in transformers. You can find the denoising dataset here in the fairseq repo, and try to use it;

cahya · August 1, 2020, 12:23pm

Thanks for the info and the link to the denoising dataset. Maybe @sshleifer can tell us also his experience with BART and transformers?
And how about T5? I see that you have already several T5 models, can we pre-train T5 with our own dataset using transformers?

valhalla · August 1, 2020, 2:06pm

Adding both of these tasks (T5 and BART pre-training) is in my todo list. Might take some time though.
If you are able to create the span masking code for T5 then you can easily pre-train T5 with Transformers.

cahya · August 1, 2020, 2:27pm

Great, I’ll try it

Arielkanevsky · August 5, 2023, 4:59am

Any updated on this?

Topic		Replies	Views
Is there any pretraining script for BART? 🤗Transformers	0	1237	August 14, 2020
Don't Stop Pretraining BART Research	1	941	December 29, 2020
Continued (in-domain) Pre-training of BART 🤗Transformers	1	498	September 13, 2023
Further pretrain BART Models	0	483	December 14, 2021
Example of how to pretrain T5? 🤗Transformers	15	16376	March 16, 2023

BART pre-training?

Related topics