Stefan Schweter's picture

In a Training Loop 🔄

Stefan Schweter PRO

stefan-it

·

https://schweter.bayern

AI & ML interests

Flair Library 💕, NER & PoS Tagging, LM Pretraining (mostly encoder-only & encoder-decoder), Historical Language Models, German Language Models, Bavarian NLP 🥨, xLSTM

Recent Activity

liked a model 3 days ago

flwrlabs/Lizzy-7B

upvoted a collection 3 days ago

liked a model 10 days ago

NX-AI/xlstm_scaling_laws

View all activity

Organizations

upvoted a collection 3 days ago

GlotSuite

GlotSuite: Paving the Way for Bringing Generative AI to Underserved Communities • 17 items • Updated 3 days ago • 3

upvoted an article 11 days ago

Article

How we OCR'ed 30,000 papers using Codex, open OCR models and Jobs

11 days ago

•

55

upvoted a collection 16 days ago

Gemma 4

8 items • Updated 16 days ago • 640

upvoted a collection 24 days ago

fiNERweb

A multilingual dataset for NER covering 91 langauges and 25 scripts • 3 items • Updated Dec 16, 2025 • 3

upvoted a paper 26 days ago

F2LLM-v2: Inclusive, Performant, and Efficient Embeddings for a Multilingual World

Paper • 2603.19223 • Published about 1 month ago • 31

upvoted 2 collections 26 days ago

Nemotron-Post-Training-v3

Collection of datasets used in the post-training phase of Nemotron Nano and Super v3. • 28 items • Updated 4 days ago • 123

Nemotron-Cascade 2

Post-Training LLMs with Cascade RL and Multi-Domain On-Policy Distillation • 4 items • Updated 4 days ago • 49

upvoted a changelog 29 days ago

Hugging Face Changelog

Protected Spaces with Public URLs

29 days ago

• 122

upvoted a collection about 1 month ago

Olmo Hybrid

6 items • Updated Mar 5 • 25

upvoted a paper about 1 month ago

Omnilingual MT: Machine Translation for 1,600 Languages

Paper • 2603.16309 • Published Mar 17 • 21

upvoted 2 articles about 1 month ago

Article

State of Open Source on Hugging Face: Spring 2026

Mar 17

•

79

Article

Efficient LLM Pretraining: Packed Sequences and Masked Attention

Oct 7, 2024

•

70

upvoted 3 papers about 1 month ago

Information Asymmetry across Language Varieties: A Case Study on Cantonese-Mandarin and Bavarian-German QA

Paper • 2603.14782 • Published Mar 16 • 1

Indirect Question Answering in English, German and Bavarian: A Challenging Task for High- and Low-Resource Languages Alike

Paper • 2603.15130 • Published Mar 16 • 1

Effective Distillation to Hybrid xLSTM Architectures

Paper • 2603.15590 • Published Mar 16 • 33

upvoted 2 articles about 1 month ago

Article

Ulysses Sequence Parallelism: Training with Million-Token Contexts

Mar 9

•

26

Article

FlashHead: Accelerating Language Model Inference ~ Efficient drop-in replacement for the classification head

Mar 11

•

2

upvoted a paper about 1 month ago

Flash-KMeans: Fast and Memory-Efficient Exact K-Means

Paper • 2603.09229 • Published Mar 10 • 82

upvoted a collection about 1 month ago

Nemotron-Pre-Training-Datasets

Large scale pre-training datasets used in the Nemotron family of models. • 12 items • Updated 4 days ago • 139

upvoted a paper about 1 month ago

Lost in Backpropagation: The LM Head is a Gradient Bottleneck

Paper • 2603.10145 • Published Mar 10 • 13