view article Article Building Harvey-style tabular review from scratch, but better isaacus • Apr 9 • 8
view article Article Introducing Kanon 2 Enricher — the world’s first hierarchical graphitization model isaacus • Mar 3 • 7
Open Legal Data Collection A collection of our favorite open-source legal datasets on Hugging Face. • 15 items • Updated Mar 14 • 7
view article Article Australian-made LLM beats OpenAI and Google at legal retrieval isaacus • Oct 23, 2025 • 27
view article Article How I Built Lightning-Fast Vector Search for Legal Documents adlumal • Oct 20, 2025 • 14
view article Article Introducing the Massive Legal Embedding Benchmark (MLEB) isaacus • Oct 17, 2025 • 24
Seq vs Seq: An Open Suite of Paired Encoders and Decoders Paper • 2507.11412 • Published Jul 15, 2025 • 32
view article Article Open-R1: a fully open reproduction of DeepSeek-R1 +1 eliebak, lvwerra, lewtun • Jan 28, 2025 • 889
Should We Still Pretrain Encoders with Masked Language Modeling? Paper • 2507.00994 • Published Jul 1, 2025 • 81
view article Article Training and Finetuning Sparse Embedding Models with Sentence Transformers tomaarsen, arthurbresnu • Jul 1, 2025 • 138
Zeroshot Classifiers Collection These are my current best zeroshot classifiers. Some of my older models are downloaded more often, but the models in this collection are newer/better. • 12 items • Updated Jan 6, 2025 • 151
view article Article Multi-Label Classification Model From Scratch: Step-by-Step Tutorial Valerii-Knowledgator • Jan 8, 2024 • 51
SaulLM-54B & SaulLM-141B: Scaling Up Domain Adaptation for the Legal Domain Paper • 2407.19584 • Published Jul 28, 2024 • 66
Tajik Datasets Collection Datasets that have tajik subset or entirely tajik • 13 items • Updated Feb 20, 2025 • 4
Open Australian Legal Models Collection A collection of open source Australian legal language models • 6 items • Updated Jun 15, 2024 • 1