DocAI
updated
Document Parsing Unveiled: Techniques, Challenges, and Prospects for
Structured Information Extraction
Paper
• 2410.21169
• Published
• 30
LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via
Hybrid Architecture
Paper
• 2409.02889
• Published
• 54
M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page
Multi-document Understanding
Paper
• 2411.04952
• Published
• 29
Contextual Document Embeddings
Paper
• 2410.02525
• Published
• 24
PDF-WuKong: A Large Multimodal Model for Efficient Long PDF Reading with
End-to-End Sparse Sampling
Paper
• 2410.05970
• Published
• 1
READoc: A Unified Benchmark for Realistic Document Structured Extraction
Paper
• 2409.05137
• Published
Xmodel-1.5: An 1B-scale Multilingual LLM
Paper
• 2411.10083
• Published
• 14
M-Longdoc: A Benchmark For Multimodal Super-Long Document Understanding
And A Retrieval-Aware Tuning Framework
Paper
• 2411.06176
• Published
• 45
CC1984/mall_receipt_extraction_dataset
Viewer
• Updated
• 1.8k • 38
• 2
VisDoM: Multi-Document QA with Visually Rich Elements Using Multimodal
Retrieval-Augmented Generation
Paper
• 2412.10704
• Published
• 16
DoPTA: Improving Document Layout Analysis using Patch-Text Alignment
Paper
• 2412.12902
• Published
Predicting the Original Appearance of Damaged Historical Documents
Paper
• 2412.11634
• Published
• 4
SynFinTabs: A Dataset of Synthetic Financial Tables for Information and
Table Extraction
Paper
• 2412.04262
• Published
• 4
TableBench: A Comprehensive and Complex Benchmark for Table Question
Answering
Paper
• 2408.09174
• Published
• 52
A Token-level Text Image Foundation Model for Document Understanding
Paper
• 2503.02304
• Published
• 4
More Documents, Same Length: Isolating the Challenge of Multiple
Documents in RAG
Paper
• 2503.04388
• Published
• 17
SAGE: A Framework of Precise Retrieval for RAG
Paper
• 2503.01713
• Published
• 7
IFIR: A Comprehensive Benchmark for Evaluating Instruction-Following in
Expert-Domain Information Retrieval
Paper
• 2503.04644
• Published
• 21
SmolDocling: An ultra-compact vision-language model for end-to-end
multi-modal document conversion
Paper
• 2503.11576
• Published
• 147
UniHDSA: A Unified Relation Prediction Approach for Hierarchical
Document Structure Analysis
Paper
• 2503.15893
• Published
• 2
CommonForms: A Large, Diverse Dataset for Form Field Detection
Paper
• 2509.16506
• Published
• 22