DocLLM: A layout-aware generative language model for multimodal document understanding
2024-01-08
DocLLM is a novel extension of large language models that enhances document understanding by incorporating both textual and spatial layout elements without the need for heavy image encoders, outperforming state-of-the-art models in several document intelligence tasks.
Was this useful?