GitHub

phizzog-ai/pdf_parser

A robust PDF parsing pipeline that extracts text, tables, and images from PDF documents into structured JSON format. Designed as the first stage in a multimodal RAG (Retrieval-Augmented Generation) ...
Upload several PDF documents (invoices, receipts, contracts, etc.), specify which fields you want extracted or any filters to apply (e.g., date range, totals > 1000) in the prompt, run the parser, and ...