hackslash dot org

Show HN: Morphik – Open-source RAG that understands PDF images, runs locally

Posted: 2025-04-22 16:18:41

Morphik is an open-source Retrieval Augmented Generation (RAG) engine designed for local execution. It differentiates itself by incorporating optical character recognition (OCR), enabling it to understand and process information contained within PDF images, not just text-based PDFs. This allows users to build knowledge bases from scanned documents and image-heavy files, querying them semantically via a natural language interface. Morphik offers a streamlined setup process and prioritizes data privacy by keeping all information local.

Summary of Comments ( 1 )
https://news.ycombinator.com/item?id=43763814

HN users generally expressed interest in Morphik, praising its local operation and potential for privacy. Some questioned the licensing (AGPLv3) and its suitability for commercial applications. Several commenters discussed the challenges of accurate OCR, particularly with complex or unusual PDFs, and hoped for future improvements in this area. Others compared it to existing tools, with some suggesting integration with tools like LlamaIndex. There was significant interest in its ability to handle images within PDFs, a feature lacking in many other RAG solutions. A few users pointed out potential use cases, such as academic research and legal document analysis. Overall, the reception was positive, with many eager to experiment with Morphik and contribute to its development.

The Hacker News post "Show HN: Morphik – Open-source RAG that understands PDF images, runs locally" (https://news.ycombinator.com/item?id=43763814) has generated a modest number of comments, primarily focusing on the practicalities and potential applications of the Morphik project.

One commenter expressed enthusiasm for the project, highlighting the challenge of extracting information from image-based PDFs and appreciating Morphik's local processing capability. They specifically mentioned the difficulty of dealing with scanned documents and the desire for a self-hosted solution, praising Morphik for addressing these needs.

Another commenter questioned the method used for OCR, wondering if it relied on Tesseract or a different approach. This commenter also inquired about the handling of mathematical formulas within the PDFs, indicating an interest in the project's ability to extract and understand complex information.

A further comment delved into the performance aspects of the project, particularly regarding memory usage. The commenter inquired about the RAM requirements, expressing concern about potential memory limitations, especially with large PDF files. They also touched upon scalability and the ability to process a high volume of documents.

One user provided a concise but valuable comment, pointing out a potential licensing issue. They suggested that the project's use of Apache 2.0 licensed Tesseract might conflict with the AGPLv3 license chosen for Morphik. This raises a significant legal consideration for the project maintainers.

Finally, another commenter made a brief, neutral observation about the project's reliance on Docker for deployment. While not expressing an opinion, this comment highlights a technical aspect of Morphik's implementation.

Overall, the comments on Hacker News demonstrate genuine interest in the Morphik project, focusing on its practical utility, technical aspects, and potential licensing issues. They highlight the demand for tools that can effectively process image-based PDFs locally, while also raising important questions about performance, scalability, and licensing compliance.

Show HN: Open-Source DocumentAI with Ollama

permalink

Posted: 2025-03-08 02:12:13

RLama introduces an open-source Document AI platform powered by the Ollama large language model. It allows users to upload documents in various formats (PDF, Word, TXT) and then interact with their content through natural language queries. RLama handles the complex tasks of document parsing, semantic search, and answer synthesis, providing a user-friendly way to extract information and insights from uploaded files. The project aims to offer a powerful, privacy-respecting, and locally hosted alternative to cloud-based document AI solutions.

Summary of Comments ( 27 )
https://news.ycombinator.com/item?id=43296918

Hacker News users discussed the potential of running powerful LLMs locally with tools like Ollama, expressing excitement about the possibilities for privacy and cost savings compared to cloud-based solutions. Some praised the project's clean UI and ease of use, while others questioned the long-term viability of local processing given the resource demands of large models. There was also discussion around specific features, like fine-tuning and the ability to run multiple models concurrently. Some users shared their experiences using the project, highlighting its performance and comparing it to other similar tools. One commenter raised a concern about the potential for misuse of powerful AI models made easily accessible through such projects. The overall sentiment was positive, with many seeing this as a significant step towards democratizing access to advanced AI capabilities.

The Hacker News post titled "Show HN: Open-Source DocumentAI with Ollama" sparked a discussion with several interesting comments. Many commenters expressed enthusiasm for the project and explored its potential applications and limitations.

One commenter pointed out the benefit of using local models for document processing, highlighting the privacy advantages and the ability to work offline. They also touched upon the cost-effectiveness of open-source models compared to proprietary cloud solutions.

Another commenter questioned the performance of open-source models, particularly in comparison to closed-source models like those from Google. They specifically asked about the benchmark comparisons and how Rlama stacks up against commercial offerings.

The discussion delved into the technical aspects of the project, with one commenter mentioning the challenges of working with large language models (LLMs) for specific document tasks. They emphasized the importance of using appropriate model architectures and fine-tuning techniques to achieve optimal performance.

A commenter raised the issue of hallucinations in LLMs and how Rlama addresses this challenge. This sparked further discussion about the reliability and trustworthiness of LLMs in document processing scenarios.

Some commenters expressed interest in specific use cases, like analyzing legal documents or scientific papers. They inquired about the project's roadmap and whether it plans to support such specialized tasks.

A few commenters also praised the simplicity and ease of use of Rlama. They appreciated the intuitive interface and the clear documentation provided by the developers.

Overall, the comments section revealed a generally positive reception to Rlama. Commenters acknowledged the potential of open-source document AI and explored both the advantages and challenges associated with this approach. The discussion also highlighted the need for further development and benchmarking to fully assess the capabilities of Rlama and similar open-source projects.

Show HN: Fork of Claude-code working with local and other LLM providers

permalink

Posted: 2025-03-04 13:35:12

anon-kode is an open-source fork of Claude-code, a large language model designed for coding tasks. This project allows users to run the model locally or connect to various other LLM providers, offering more flexibility and control over model access and usage. It aims to provide a convenient and adaptable interface for utilizing different language models for code generation and related tasks, without being tied to a specific provider.

Summary of Comments ( 17 )
https://news.ycombinator.com/item?id=43254351

Hacker News users discussed the potential of anon-kode, a fork of Claude-code allowing local and diverse LLM usage. Some praised its flexibility, highlighting the benefits of using local models for privacy and cost control. Others questioned the practicality and performance compared to hosted solutions, particularly for resource-intensive tasks. The licensing of certain models like CodeLlama was also a point of concern. Several commenters expressed interest in contributing or using anon-kode for specific applications like code analysis or documentation generation. There was a general sense of excitement around the project's potential to democratize access to powerful coding LLMs.

The Hacker News post "Show HN: Fork of Claude-code working with local and other LLM providers" (https://news.ycombinator.com/item?id=43254351) sparked a brief but interesting discussion with a few key points raised.

One commenter expressed skepticism about the practical usefulness of local LLMs for coding tasks, arguing that the quality difference compared to cloud-based models like GPT-4 is significant enough to negate the benefits of local processing, especially given the increasing availability of cheaper cloud alternatives. They specifically mentioned that even if local models eventually catch up in performance, the convenience and speed of cloud-based models might still be preferable.

Another commenter highlighted the licensing issue, pointing out that closed-source models can't be used commercially. They argued that this is a major drawback, especially for companies, and that this restriction limits the utility of projects like this one. They implied that open-source models are essential for broader adoption in commercial settings.

A third commenter explored the potential advantages of local models for specific niche use cases, suggesting that even with lower quality, they could be valuable for tasks like code suggestion or autocompletion within a local IDE, particularly if the codebase being worked on is sensitive and cannot be shared with external cloud services. They mentioned that speed and privacy are the primary drivers for such use cases.

Finally, the original poster (OP) responded to some of the comments, acknowledging the current limitations of local LLMs compared to cloud-based options but expressing optimism about the rapid pace of improvement in open-source LLMs. They also clarified the project's aim, emphasizing that it’s focused on providing a framework for using different LLMs locally rather than promoting any specific local model. They seem hopeful that this approach will become more compelling as local LLM technology matures.

In summary, the discussion revolved around the trade-offs between cloud-based and local LLMs for coding, with commenters highlighting the current performance gap, licensing restrictions, and potential niche applications of local models. The OP defended the project by focusing on its flexibility and the future potential of local LLMs.

How to Run DeepSeek R1 671B Locally on a $2000 EPYC Server

permalink

Posted: 2025-02-01 09:46:43

This blog post details how to run the DeepSeek R1 671B large language model (LLM) entirely on a ~$2000 server built with an AMD EPYC 7452 CPU, 256GB of RAM, and consumer-grade NVMe SSDs. The author emphasizes affordability and accessibility, demonstrating a setup that avoids expensive server-grade hardware and leverages readily available components. The post provides a comprehensive guide covering hardware selection, OS installation, configuring the necessary software like PyTorch and CUDA, downloading the model weights, and ultimately running inference using the optimized llama.cpp implementation. It highlights specific optimization techniques, including using bitsandbytes for quantization and offloading parts of the model to the CPU RAM to manage its large size. The author successfully achieves a performance of ~2 tokens per second, enabling practical, albeit slower, local interaction with this powerful LLM.

The blog post "How to Run DeepSeek R1 671B Fully Locally on a $2000 EPYC Rig" details the author's successful endeavor to run the large language model DeepSeek R1 671B on a relatively affordable, self-assembled server. The primary motivation behind this project was to achieve cost-effective, private, and locally accessible large language model inference, avoiding the costs and potential privacy concerns associated with cloud-based solutions like OpenAI's API.

The author carefully selected hardware components to balance performance and budget. The centerpiece of the system is an AMD EPYC 7F72 dual-socket server, chosen for its impressive core count (48 cores per CPU, 96 total) and large L3 cache, crucial for handling the substantial memory requirements of the 671B parameter model. The system also includes 512GB of DDR4 ECC RAM, which, while not sufficient to load the entire model into RAM, allows for offloading to NVMe storage and leveraging the CPU's large cache effectively. Three 2TB NVMe SSDs are configured in RAID 0, maximizing read speed for faster model loading and processing. A relatively modest power supply (1000W) was deemed sufficient, further contributing to the cost-effectiveness of the build.

The software setup involved installing Ubuntu 22.04 and meticulously configuring the necessary dependencies, including CUDA drivers, Python libraries, and the specific DeepSeek inference code. The author highlights the importance of accurate driver versions and provides detailed instructions for their installation, addressing potential compatibility issues. They also outline the steps to download and convert the DeepSeek model to a suitable format for local inference. Optimizations, such as using the bitsandbytes library for 8-bit quantization, are implemented to reduce memory footprint and improve performance. This allows the model to be run on the system with the available RAM, albeit with increased processing time.

The post then walks through the process of running the model using the command-line interface, explaining the relevant parameters and demonstrating a basic example of text generation. The author emphasizes that, while performance is slower compared to cloud-based solutions or systems with larger RAM capacity, the setup successfully achieves local inference with a reasonable response time. The post concludes by acknowledging potential improvements, like utilizing larger RAM or implementing more aggressive quantization techniques, and reinforces the overall feasibility and cost-effectiveness of running large language models locally on a budget-conscious server build. The project effectively demonstrates a practical approach to bringing powerful language models within reach of individuals and small teams without relying on external cloud services.

Summary of Comments ( 157 )
https://news.ycombinator.com/item?id=42897205

HN commenters were skeptical about the true cost and practicality of running a 671B parameter model on a $2,000 server. Several pointed out that the $2,000 figure only covered the CPUs, excluding crucial components like RAM, SSDs, and GPUs, which would significantly inflate the total price. Others questioned the performance on such a setup, doubting it would be usable for anything beyond trivial tasks due to slow inference speeds. The lack of details on power consumption and cooling requirements was also criticized. Some suggested cloud alternatives might be more cost-effective in the long run, while others expressed interest in smaller, more manageable models. A few commenters shared their own experiences with similar hardware, highlighting the challenges of memory bandwidth and the potential need for specialized hardware like Infiniband for efficient communication between CPUs.

The Hacker News post discussing running a large language model (LLM) like DeepSeek R1 671B on a relatively inexpensive EPYC server generated a fair amount of discussion. Several commenters focused on the practicality and nuances of the setup described in the article.

One key point of discussion revolved around the actual cost and complexity of the setup. While the article highlights a $2000 server, commenters pointed out that this price likely doesn't encompass the cost of GPUs, which are essential for running such a large model effectively. They argued that the true cost would be significantly higher when factoring in suitable GPUs. Furthermore, the expertise required to set up and maintain such a system was also a topic of conversation, with commenters suggesting that it's not a trivial task and requires specialized knowledge.

Another thread of discussion centered on the performance trade-offs. Running a 671B parameter model on a less powerful setup compared to what's typically used in large-scale deployments would inevitably lead to slower inference speeds. Commenters discussed the impact of this slower performance on practical usability, suggesting that while it might be technically feasible to run the model, the response times could be too long for many applications.

The potential benefits of running a large language model locally were also acknowledged. Commenters mentioned the advantages of data privacy and control, as locally hosted models don't require sending data to external servers. This aspect was particularly relevant for sensitive data or applications where data security is paramount.

Finally, some commenters expressed skepticism about the overall feasibility and practicality of the approach outlined in the article. They questioned whether the performance gains, even with optimized libraries and techniques, would be sufficient to justify the complexity and cost involved in setting up and maintaining a local LLM of this size. They also raised concerns about the power consumption and cooling requirements for such a system. Overall, the comments reflected a mixture of intrigue and pragmatism, acknowledging the potential benefits while also highlighting the challenges and limitations of running large language models on less powerful hardware.

Stories with Tag Local Deployment

Show HN: Morphik – Open-source RAG that understands PDF images, runs locally

Summary of Comments ( 1 ) https://news.ycombinator.com/item?id=43763814

Show HN: Open-Source DocumentAI with Ollama

Summary of Comments ( 27 ) https://news.ycombinator.com/item?id=43296918

Show HN: Fork of Claude-code working with local and other LLM providers

Summary of Comments ( 17 ) https://news.ycombinator.com/item?id=43254351

How to Run DeepSeek R1 671B Locally on a $2000 EPYC Server

Summary of Comments ( 157 ) https://news.ycombinator.com/item?id=42897205

Summary of Comments ( 1 )
https://news.ycombinator.com/item?id=43763814

Summary of Comments ( 27 )
https://news.ycombinator.com/item?id=43296918

Summary of Comments ( 17 )
https://news.ycombinator.com/item?id=43254351

Summary of Comments ( 157 )
https://news.ycombinator.com/item?id=42897205