Story Details

  • Show HN: Morphik – Open-source RAG that understands PDF images, runs locally

    Posted: 2025-04-22 16:18:41

    Morphik is an open-source Retrieval Augmented Generation (RAG) engine designed for local execution. It differentiates itself by incorporating optical character recognition (OCR), enabling it to understand and process information contained within PDF images, not just text-based PDFs. This allows users to build knowledge bases from scanned documents and image-heavy files, querying them semantically via a natural language interface. Morphik offers a streamlined setup process and prioritizes data privacy by keeping all information local.

    Summary of Comments ( 1 )
    https://news.ycombinator.com/item?id=43763814

    HN users generally expressed interest in Morphik, praising its local operation and potential for privacy. Some questioned the licensing (AGPLv3) and its suitability for commercial applications. Several commenters discussed the challenges of accurate OCR, particularly with complex or unusual PDFs, and hoped for future improvements in this area. Others compared it to existing tools, with some suggesting integration with tools like LlamaIndex. There was significant interest in its ability to handle images within PDFs, a feature lacking in many other RAG solutions. A few users pointed out potential use cases, such as academic research and legal document analysis. Overall, the reception was positive, with many eager to experiment with Morphik and contribute to its development.