hackslash dot org

Stories with Tag Information Extraction

Show HN: Knowledge graph of restaurants and chefs, built using LLMs

Posted: 2025-03-03 15:43:20

Theophile Cantelo has created Foudinge, a knowledge graph connecting restaurants and chefs. Leveraging Large Language Models (LLMs), Foudinge extracts information from various online sources like blogs, guides, and social media to establish relationships between culinary professionals and the establishments they've worked at or own. This allows for complex queries, such as finding all restaurants where a specific chef has worked, discovering connections between different chefs through shared work experiences, and exploring the culinary lineage within the restaurant industry. Currently focused on French gastronomy, the project aims to expand its scope geographically and improve data accuracy through community contributions and additional data sources.

Théophile Cantelobre has introduced "Foudinge," a novel knowledge graph specifically focused on the culinary world, encompassing restaurants and chefs. This project leverages the power of Large Language Models (LLMs) to construct and populate the graph with information extracted from diverse online sources. Cantelobre details the process of building Foudinge, highlighting the challenges and solutions encountered along the way.

Initially, the project aimed to be a comprehensive database of French gastronomy, but it quickly evolved into a more generalized platform capable of representing culinary knowledge globally. The core of Foudinge lies in its ability to identify and link entities such as restaurants and chefs, establishing relationships between them like "Chef X works at Restaurant Y." This linking process is automated using LLMs, which analyze textual data from sources like restaurant websites, blogs, news articles, and social media platforms. This automated approach allows Foudinge to scale rapidly and incorporate information from a vast range of online resources.

The construction of Foudinge involved several key steps. First, an initial dataset was compiled, encompassing various data points related to restaurants and chefs. This data was then processed using LLMs to extract relevant information and transform it into a structured format suitable for a knowledge graph. The LLMs were instrumental in identifying and disambiguating entities, ensuring that the same chef or restaurant is represented consistently across different sources. Furthermore, the LLMs helped to infer relationships between entities based on the contextual information available in the source material.

Cantelobre acknowledges the inherent challenges of working with LLMs, such as potential biases in the training data and occasional inaccuracies in the generated output. To mitigate these challenges, Foudinge incorporates a validation process involving both automated checks and manual review. This iterative refinement process ensures the accuracy and reliability of the knowledge graph.

The long-term vision for Foudinge is to become a valuable resource for culinary enthusiasts, professionals, and researchers. Its structured data and interconnectedness allow for complex queries and analyses, enabling users to explore the culinary landscape in novel ways. For instance, one could trace the career trajectory of a chef, identify restaurants with similar culinary styles, or investigate the influence of specific chefs on regional cuisines. Cantelobre envisions Foudinge as a dynamic and evolving platform, continuously incorporating new information and expanding its coverage of the culinary world. He invites feedback and contributions from the community to further enhance the project and maximize its potential.

Summary of Comments ( 16 )
https://news.ycombinator.com/item?id=43242818

Hacker News users generally expressed skepticism about the value proposition of the presented knowledge graph of restaurants and chefs. Several commenters questioned the accuracy and completeness of the data, especially given its reliance on LLMs. Some doubted the usefulness of connecting chefs to restaurants without further context, like the time period they worked there. Others pointed out the existing prevalence of this information on platforms like Wikipedia and guide sites, questioning the need for a new platform. The lack of a clear use case beyond basic information retrieval was a recurring theme, with some suggesting potential applications like tracking career progression or identifying emerging culinary trends, but ultimately finding the current implementation insufficient. A few commenters appreciated the technical effort, but overall the reception was lukewarm, focused on the need for demonstrable practical application and improved data quality.

The Hacker News post titled "Show HN: Knowledge graph of restaurants and chefs, built using LLMs" generated a moderate amount of discussion, with a focus on the practical application and potential limitations of the project.

Several commenters expressed interest in the project's potential, particularly regarding its use for restaurant recommendations. One commenter highlighted the difficulty of finding good restaurants in unfamiliar cities and suggested the knowledge graph could be helpful in this scenario, particularly if it allowed users to filter by cuisine type and other specific criteria. They also inquired about the possibility of incorporating user reviews or ratings into the system.

Another user echoed this sentiment, pointing out that existing restaurant recommendation platforms often rely on outdated or inaccurate information. They envisioned the project as a valuable tool for both diners and restaurant owners, providing a centralized and up-to-date resource for restaurant information.

However, some commenters expressed concerns about the project's reliance on LLMs. One commenter pointed out the potential for hallucinations and inaccuracies in LLM-generated data, emphasizing the importance of thorough verification and fact-checking. They also questioned the long-term viability of relying solely on LLMs for data collection and maintenance, suggesting that a more robust approach might involve incorporating human input and curation.

The creator of the project engaged with the commenters, acknowledging the challenges of LLM-based data generation and outlining plans to address these concerns. They mentioned plans to implement a feedback mechanism to flag inaccurate information and explore methods for verifying the accuracy of LLM-generated data. They also discussed potential future features, such as incorporating user reviews, dietary information, and real-time menu updates.

A recurring theme in the comments was the need for a practical application or interface for the knowledge graph. Commenters suggested various use cases, including a dedicated search engine for restaurants, a mobile app for on-the-go recommendations, and integration with existing restaurant platforms.

Finally, one commenter raised a broader point about the ethical implications of using LLMs to scrape data from the web, questioning the potential impact on website owners and the overall ecosystem of online information. This sparked a brief discussion about the responsible use of LLMs and the importance of respecting website terms of service. While not directly related to the project itself, this comment highlighted the broader ethical considerations surrounding LLM-driven data collection.

Replace OCR with Vision Language Models

permalink

Posted: 2025-02-26 19:29:37

The notebook demonstrates how Vision Language Models (VLMs) like Donut and Pix2Struct can extract structured data from document images, surpassing traditional OCR in accuracy and handling complex layouts. Instead of relying on OCR's text extraction and post-processing, VLMs directly interpret the image and output the desired data in a structured format like JSON, simplifying downstream tasks. This approach proves especially effective for invoices, receipts, and forms where specific information needs to be extracted and organized. The examples showcase how to define the desired output structure using prompts and how VLMs effectively handle various document layouts and complexities, eliminating the need for complex OCR pipelines and post-processing logic.

The Jupyter Notebook titled "Replace OCR with Vision Language Models" explores a novel approach to extracting structured information from documents, specifically forms, by leveraging the power of Vision Language Models (VLMs) as a superior alternative to traditional Optical Character Recognition (OCR). The notebook demonstrates how VLMs, which are capable of understanding both visual and textual information, can directly interpret the content and layout of a document image to extract key-value pairs and other structured data without the intermediate step of OCR.

The core argument presented is that OCR often struggles with complex layouts, noisy images, and handwritten text, introducing errors that propagate downstream in data processing pipelines. VLMs, on the other hand, can reason about the document's structure and context, enabling them to more accurately identify and extract relevant information even in challenging scenarios. This capability eliminates the need for complex post-processing steps typically required to clean up OCR output, simplifying the overall information extraction process.

The notebook provides a detailed walkthrough of using the vlmrun library, a specialized tool designed to facilitate interactions with various VLMs. It showcases practical examples of extracting data from different form types, including W-2 tax forms and expense reports. The examples demonstrate how to specify target fields for extraction using prompts and how to customize the extraction process to accommodate different document formats and structures. The vlmrun library streamlines the process of querying the VLM and parsing the results into a structured format like JSON, making it readily usable in downstream applications.

Furthermore, the notebook emphasizes the flexibility and adaptability of VLMs by illustrating how they can be applied to various document layouts and extraction tasks. It highlights how the model can be instructed to extract specific information based on the provided prompt, effectively performing targeted information retrieval. The notebook concludes by showcasing how the extracted structured data can be seamlessly integrated into other systems and workflows, emphasizing the practical benefits of adopting VLM-based document processing for real-world applications. The overall message is that VLMs offer a powerful and efficient alternative to OCR, potentially revolutionizing how we extract information from documents and paving the way for more robust and intelligent document processing systems.

Summary of Comments ( 4 )
https://news.ycombinator.com/item?id=43187209

HN users generally expressed excitement about the potential of Vision-Language Models (VLMs) to replace OCR, finding the demo impressive. Some highlighted VLMs' ability to understand context and structure, going beyond mere text extraction to infer meaning and relationships within a document. However, others cautioned against prematurely declaring OCR obsolete, pointing out potential limitations of VLMs like hallucinations, difficulty with complex layouts, and the need for robust evaluation beyond cherry-picked examples. The cost and speed of VLMs compared to mature OCR solutions were also raised as concerns. Several commenters discussed specific use-cases and potential applications, including data entry automation, accessibility for visually impaired users, and historical document analysis. There was also interest in comparing different VLMs and exploring fine-tuning possibilities.

The Hacker News post "Replace OCR with Vision Language Models," linking to a Jupyter Notebook demonstrating the use of Vision Language Models (VLMs) for information extraction from documents, generated a moderate discussion with several insightful comments.

A significant point of discussion revolved around the comparison between VLMs and traditional OCR. One commenter highlighted the different strengths of each approach, suggesting that OCR excels at accurately transcribing text, while VLMs are better suited for understanding the meaning of the document. They noted OCR's struggles with complex layouts and poor quality scans, situations where a VLM might perform better due to its ability to reason about the document's structure and context. This commenter provided a practical example: extracting information from an invoice with varying layouts, where OCR might struggle but a VLM could potentially identify key fields regardless of their position.

Expanding on this theme, another user emphasized that VLMs are particularly useful when dealing with visually noisy or distorted documents. They proposed that the optimal solution might be a hybrid approach: using OCR to get an initial text representation and then leveraging a VLM to refine the results and extract semantic information. This combined approach, they argue, leverages the strengths of both technologies.

Addressing the practical implementation of VLMs, a commenter pointed out the current computational cost and resource requirements, suggesting that these models aren't yet readily accessible to the average user. They expressed hope for further development and optimization, making VLMs more practical for everyday applications.

Another user concurred with the resource intensity concern but also mentioned that open-source models like Donut are making strides in this area. They further suggested that the choice between OCR and VLMs depends heavily on the specific task. For tasks requiring perfect textual accuracy, OCR remains the better choice. However, when the goal is information extraction and understanding, VLMs offer a powerful alternative, especially for documents with complex or inconsistent layouts.

Finally, some comments focused on specific applications, like using VLMs to parse structured documents such as forms. One user highlighted the potential for pre-training VLMs on specific document types to improve accuracy and efficiency. Another commenter mentioned the challenges of evaluating the performance of VLMs on complex layouts, suggesting the need for more robust evaluation metrics.

In summary, the comments section explores the trade-offs between OCR and VLMs, highlighting the strengths and weaknesses of each approach. The discussion also touches upon practical considerations such as resource requirements and the potential for hybrid solutions combining OCR and VLMs. While acknowledging the current limitations of VLMs, the overall sentiment expresses optimism for their future development and wider adoption in various document processing tasks.

Page 1 of 1.

Stories with Tag Information Extraction

Show HN: Knowledge graph of restaurants and chefs, built using LLMs

Summary of Comments ( 16 ) https://news.ycombinator.com/item?id=43242818

Replace OCR with Vision Language Models

Summary of Comments ( 4 ) https://news.ycombinator.com/item?id=43187209

Summary of Comments ( 16 )
https://news.ycombinator.com/item?id=43242818

Summary of Comments ( 4 )
https://news.ycombinator.com/item?id=43187209