Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
MerlinLi 's Collections
any-to-embedding
timeseries-llm
Agentic-llm
domain-specific-llm
QWenX
3D-Gen
Agent
gpt4-data
Code-LLM
Merged-LLM
Yi-LLM
text-to-image
Chinese-Speech-Data
synthetic-data
Speech-App
llm-structured-data
function-llm
mm-lm
dpo-datasets
text-to-speech
text-embedding
llm-guard
role-play-llm

synthetic-data

updated May 11, 2025
Upvote
-

  • HuggingFaceTB/cosmopedia

    Viewer • Updated Aug 12, 2024 • 31.1M • 43.1k • 652

  • HuggingFaceTB/cosmopedia-20k

    Viewer • Updated Feb 23, 2024 • 20k • 31 • 1

  • Open-Orca/SlimOrca-Dedup

    Viewer • Updated May 19, 2025 • 363k • 15.2k • 90

  • abacusai/SystemChat

    Viewer • Updated Mar 4, 2024 • 7.02k • 45 • 134

  • allenai/WildChat-nontoxic

    Viewer • Updated May 6, 2024 • 530k • 52 • 26

  • instruction-pretrain/instruction-synthesizer

    Text Generation • 7B • Updated Mar 1, 2025 • 28 • 79

  • argilla/FinePersonas-v0.1

    Viewer • Updated Dec 11, 2024 • 42.1M • 9.45k • 408

  • opencsg/chinese-cosmopedia

    Preview • Updated Jan 15, 2025 • 1.4k • 74

  • Running
    132

    TxT360: Trillion Extracted Text

    📖
    132

    Explore and analyze the TxT360 dataset for LLM pre-training


  • open-r1/OpenR1-Math-220k

    Viewer • Updated Feb 18, 2025 • 450k • 12.2k • 693

  • opencsg/chinese-fineweb-edu

    Viewer • Updated 30 days ago • 84.6M • 13.2k • 109

  • BAAI/CCI2-Data

    Viewer • Updated Dec 17, 2024 • 179M • 324 • 54
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs