stereoplegic 's Collections Knowledge distillation
updated
Democratizing Reasoning Ability: Tailored Learning from Large Language
Model
Paper
• 2310.13332
• Published
• 16
Teaching Language Models to Self-Improve through Interactive
Demonstrations
Paper
• 2310.13522
• Published
• 12
Self-Convinced Prompting: Few-Shot Question Answering with Repeated
Introspection
Paper
• 2310.05035
• Published
• 1
Tuna: Instruction Tuning using Feedback from Large Language Models
Paper
• 2310.13385
• Published
• 10
Reflection-Tuning: Data Recycling Improves LLM Instruction-Tuning
Paper
• 2310.11716
• Published
• 6
SILC: Improving Vision Language Pretraining with Self-Distillation
Paper
• 2310.13355
• Published
• 9
Conditional Diffusion Distillation
Paper
• 2310.01407
• Published
• 20
AutoMix: Automatically Mixing Language Models
Paper
• 2310.12963
• Published
• 14
An Emulator for Fine-Tuning Large Language Models using Small Language
Models
Paper
• 2310.12962
• Published
• 13
Effective Distillation of Table-based Reasoning Ability from LLMs
Paper
• 2309.13182
• Published
• 1
Sci-CoT: Leveraging Large Language Models for Enhanced Knowledge
Distillation in Small Models for Scientific QA
Paper
• 2308.04679
• Published
• 1
The Consensus Game: Language Model Generation via Equilibrium Search
Paper
• 2310.09139
• Published
• 14
CLIN: A Continually Learning Language Agent for Rapid Task Adaptation
and Generalization
Paper
• 2310.10134
• Published
• 1
DistillSpec: Improving Speculative Decoding via Knowledge Distillation
Paper
• 2310.08461
• Published
• 1
Large Language Models Are Also Good Prototypical Commonsense Reasoners
Paper
• 2309.13165
• Published
• 1
DialCoT Meets PPO: Decomposing and Exploring Reasoning Paths in Smaller
Language Models
Paper
• 2310.05074
• Published
• 1
Fantastic Gains and Where to Find Them: On the Existence and Prospect of
General Knowledge Transfer between Any Pretrained Model
Paper
• 2310.17653
• Published
• 2
Compress, Then Prompt: Improving Accuracy-Efficiency Trade-off of LLM
Inference with Transferable Prompt
Paper
• 2305.11186
• Published
• 1
Self-slimmed Vision Transformer
Paper
• 2111.12624
• Published
• 1
Commonsense Knowledge Transfer for Pre-trained Language Models
Paper
• 2306.02388
• Published
• 1
Symbolic Knowledge Distillation: from General Language Models to
Commonsense Models
Paper
• 2110.07178
• Published
• 1
Snowman: A Million-scale Chinese Commonsense Knowledge Graph Distilled
from Foundation Model
Paper
• 2306.10241
• Published
• 1
Distilling Efficient Language-Specific Models for Cross-Lingual Transfer
Paper
• 2306.01709
• Published
• 1
Parameter-Efficient Neural Reranking for Cross-Lingual and Multilingual
Retrieval
Paper
• 2204.02292
• Published
• 1
Composable Sparse Fine-Tuning for Cross-Lingual Transfer
Paper
• 2110.07560
• Published
• 2
HARD: Hard Augmentations for Robust Distillation
Paper
• 2305.14890
• Published
• 1
Transfer to a Low-Resource Language via Close Relatives: The Case Study
on Faroese
Paper
• 2304.08823
• Published
• 1
Massively Multilingual Lexical Specialization of Multilingual
Transformers
Paper
• 2208.01018
• Published
• 1
Robust Active Distillation
Paper
• 2210.01213
• Published
• 1
LTD: Low Temperature Distillation for Robust Adversarial Training
Paper
• 2111.02331
• Published
• 1
Mitigating the Accuracy-Robustness Trade-off via Multi-Teacher
Adversarial Distillation
Paper
• 2306.16170
• Published
• 1
Mutual Adversarial Training: Learning together is better than going
alone
Paper
• 2112.05005
• Published
• 1
Weight Averaging Improves Knowledge Distillation under Domain Shift
Paper
• 2309.11446
• Published
• 1
Cross-Architecture Knowledge Distillation
Paper
• 2207.05273
• Published
• 1
Cross-Domain Ensemble Distillation for Domain Generalization
Paper
• 2211.14058
• Published
• 1
TransKD: Transformer Knowledge Distillation for Efficient Semantic
Segmentation
Paper
• 2202.13393
• Published
• 1
Distilling Step-by-Step! Outperforming Larger Language Models with Less
Training Data and Smaller Model Sizes
Paper
• 2305.02301
• Published
• 5
Zephyr: Direct Distillation of LM Alignment
Paper
• 2310.16944
• Published
• 123
Personalised Distillation: Empowering Open-Sourced LLMs with Adaptive
Learning for Code Generation
Paper
• 2310.18628
• Published
• 8
TeacherLM: Teaching to Fish Rather Than Giving the Fish, Language
Modeling Likewise
Paper
• 2310.19019
• Published
• 9
How Far Can Camels Go? Exploring the State of Instruction Tuning on Open
Resources
Paper
• 2306.04751
• Published
• 5
Small Language Models Improve Giants by Rewriting Their Outputs
Paper
• 2305.13514
• Published
• 2
ICLEF: In-Context Learning with Expert Feedback for Explainable Style
Transfer
Paper
• 2309.08583
• Published
• 1
A Survey on Model Compression for Large Language Models
Paper
• 2308.07633
• Published
• 3
Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo
Labelling
Paper
• 2311.00430
• Published
• 56
Token-Scaled Logit Distillation for Ternary Weight Generative Language
Models
Paper
• 2308.06744
• Published
• 1
Understanding and Improving Knowledge Distillation for
Quantization-Aware Training of Large Transformer Encoders
Paper
• 2211.11014
• Published
• 1
Model compression via distillation and quantization
Paper
• 1802.05668
• Published
• 1
Feature Affinity Assisted Knowledge Distillation and Quantization of
Deep Neural Networks on Label-Free Data
Paper
• 2302.10899
• Published
• 1
Improving Differentiable Architecture Search via Self-Distillation
Paper
• 2302.05629
• Published
• 1
Co-training and Co-distillation for Quality Improvement and Compression
of Language Models
Paper
• 2311.02849
• Published
• 8
Tailoring Self-Rationalizers with Multi-Reward Distillation
Paper
• 2311.02805
• Published
• 6
Can a student Large Language Model perform as well as it's teacher?
Paper
• 2310.02421
• Published
• 1
NetDistiller: Empowering Tiny Deep Learning via In-Situ Distillation
Paper
• 2310.19820
• Published
• 1
Talking Models: Distill Pre-trained Knowledge to Downstream Models via
Interactive Communication
Paper
• 2310.03188
• Published
• 1
A Comparative Analysis of Task-Agnostic Distillation Methods for
Compressing Transformer Language Models
Paper
• 2310.08797
• Published
• 1
MiniLMv2: Multi-Head Self-Attention Relation Distillation for
Compressing Pretrained Transformers
Paper
• 2012.15828
• Published
• 1
Self-Distillation for Further Pre-training of Transformers
Paper
• 2210.02871
• Published
• 1
Class Token and Knowledge Distillation for Multi-head Self-Attention
Speaker Verification Systems
Paper
• 2111.03842
• Published
• 1
Multi-Mode Online Knowledge Distillation for Self-Supervised Visual
Representation Learning
Paper
• 2304.06461
• Published
• 1
Retrieval-based Knowledge Transfer: An Effective Approach for Extreme
Large Language Model Compression
Paper
• 2310.15594
• Published
• 1
MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression
of Pre-Trained Transformers
Paper
• 2002.10957
• Published
• 2
UNFUSED: UNsupervised Finetuning Using SElf supervised Distillation
Paper
• 2303.05668
• Published
• 1
One-Step Knowledge Distillation and Fine-Tuning in Using Large
Pre-Trained Self-Supervised Learning Models for Speaker Verification
Paper
• 2305.17394
• Published
• 1
BPKD: Boundary Privileged Knowledge Distillation For Semantic
Segmentation
Paper
• 2306.08075
• Published
• 1
Prototype-guided Cross-task Knowledge Distillation for Large-scale
Models
Paper
• 2212.13180
• Published
• 1
ProKD: An Unsupervised Prototypical Knowledge Distillation Network for
Zero-Resource Cross-Lingual Named Entity Recognition
Paper
• 2301.08855
• Published
• 1
DPHuBERT: Joint Distillation and Pruning of Self-Supervised Speech
Models
Paper
• 2305.17651
• Published
• 1
Recycle-and-Distill: Universal Compression Strategy for
Transformer-based Speech SSL Models with Attention Map Reusing and Masking
Distillation
Paper
• 2305.11685
• Published
• 2
Large Language Model Distillation Doesn't Need a Teacher
Paper
• 2305.14864
• Published
• 3
One Student Knows All Experts Know: From Sparse to Dense
Paper
• 2201.10890
• Published
• 1
BD-KD: Balancing the Divergences for Online Knowledge Distillation
Paper
• 2212.12965
• Published
• 1
Rethinking Momentum Knowledge Distillation in Online Continual Learning
Paper
• 2309.02870
• Published
• 1
Beyond Not-Forgetting: Continual Learning with Backward Knowledge
Transfer
Paper
• 2211.00789
• Published
• 1
Preserving Linear Separability in Continual Learning by Backward Feature
Projection
Paper
• 2303.14595
• Published
• 2
Big-model Driven Few-shot Continual Learning
Paper
• 2309.00862
• Published
• 1
Augmentation with Projection: Towards an Effective and Efficient Data
Augmentation Paradigm for Distillation
Paper
• 2210.11768
• Published
• 1
Understanding the Role of Mixup in Knowledge Distillation: An Empirical
Study
Paper
• 2211.03946
• Published
• 1
What Makes a "Good" Data Augmentation in Knowledge Distillation -- A
Statistical Perspective
Paper
• 2012.02909
• Published
• 1
Group channel pruning and spatial attention distilling for object
detection
Paper
• 2306.01526
• Published
• 1
Structured Pruning Learns Compact and Accurate Models
Paper
• 2204.00408
• Published
• 1
MPCFormer: fast, performant and private Transformer inference with MPC
Paper
• 2211.01452
• Published
• 1
Towards Teachable Conversational Agents
Paper
• 2102.10387
• Published
• 1
OrchestraLLM: Efficient Orchestration of Language Models for Dialogue
State Tracking
Paper
• 2311.09758
• Published
• 1
Task-Specific Expert Pruning for Sparse Mixture-of-Experts
Paper
• 2206.00277
• Published
• 1
Augmented Large Language Models with Parametric Knowledge Guiding
Paper
• 2305.04757
• Published
• 2
NLP From Scratch Without Large-Scale Pretraining: A Simple and Efficient
Framework
Paper
• 2111.04130
• Published
• 1
Answering Unseen Questions With Smaller Language Models Using Rationale
Generation and Dense Retrieval
Paper
• 2308.04711
• Published
• 1