DocAI - a cooleel Collection

cooleel 's Collections

DocAI

updated Oct 13, 2025

Document Parsing Unveiled: Techniques, Challenges, and Prospects for Structured Information Extraction

Paper • 2410.21169 • Published Oct 28, 2024 • 30
LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture

Paper • 2409.02889 • Published Sep 4, 2024 • 54
M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding

Paper • 2411.04952 • Published Nov 7, 2024 • 29
Contextual Document Embeddings

Paper • 2410.02525 • Published Oct 3, 2024 • 24
PDF-WuKong: A Large Multimodal Model for Efficient Long PDF Reading with End-to-End Sparse Sampling

Paper • 2410.05970 • Published Oct 8, 2024 • 1
READoc: A Unified Benchmark for Realistic Document Structured Extraction

Paper • 2409.05137 • Published Sep 8, 2024
Xmodel-1.5: An 1B-scale Multilingual LLM

Paper • 2411.10083 • Published Nov 15, 2024 • 14
M-Longdoc: A Benchmark For Multimodal Super-Long Document Understanding And A Retrieval-Aware Tuning Framework

Paper • 2411.06176 • Published Nov 9, 2024 • 45
CC1984/mall_receipt_extraction_dataset

Viewer • Updated Aug 31, 2023 • 1.8k • 118 • 3
VisDoM: Multi-Document QA with Visually Rich Elements Using Multimodal Retrieval-Augmented Generation

Paper • 2412.10704 • Published Dec 14, 2024 • 16
DoPTA: Improving Document Layout Analysis using Patch-Text Alignment

Paper • 2412.12902 • Published Dec 17, 2024
Predicting the Original Appearance of Damaged Historical Documents

Paper • 2412.11634 • Published Dec 16, 2024 • 4
SynFinTabs: A Dataset of Synthetic Financial Tables for Information and Table Extraction

Paper • 2412.04262 • Published Dec 5, 2024 • 4
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering

Paper • 2408.09174 • Published Aug 17, 2024 • 53
A Token-level Text Image Foundation Model for Document Understanding

Paper • 2503.02304 • Published Mar 4, 2025 • 4
More Documents, Same Length: Isolating the Challenge of Multiple Documents in RAG

Paper • 2503.04388 • Published Mar 6, 2025 • 17
SAGE: A Framework of Precise Retrieval for RAG

Paper • 2503.01713 • Published Mar 3, 2025 • 7
IFIR: A Comprehensive Benchmark for Evaluating Instruction-Following in Expert-Domain Information Retrieval

Paper • 2503.04644 • Published Mar 6, 2025 • 21
SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion

Paper • 2503.11576 • Published Mar 14, 2025 • 160
UniHDSA: A Unified Relation Prediction Approach for Hierarchical Document Structure Analysis

Paper • 2503.15893 • Published Mar 20, 2025 • 2
CommonForms: A Large, Diverse Dataset for Form Field Detection

Paper • 2509.16506 • Published Sep 20, 2025 • 22