hackslash dot org

Show HN: In-Browser Graph RAG with Kuzu-WASM and WebLLM

Posted: 2025-03-10 15:12:57

This blog post demonstrates a Retrieval Augmented Generation (RAG) pipeline running entirely within a web browser. It uses Kuzu-WASM, a WebAssembly build of the Kuzu graph database, to store and query a knowledge graph, and WebLLM, a library for running large language models (LLMs) client-side. The demo allows users to query the graph using natural language, with Kuzu translating the query into its native query language and retrieving relevant information. This retrieved context is then fed to a local LLM (currently, a quantized version of Flan-T5), which generates a natural language response. This in-browser approach offers potential benefits in terms of privacy, reduced latency, and offline functionality, enabling new possibilities for interactive and personalized AI applications.

This blog post introduces a novel approach to implementing Retrieval Augmented Generation (RAG) entirely within a web browser, leveraging the power of Kuzu-WASM, a WebAssembly port of the Kuzu graph database, and WebLLM, a library for running large language models (LLMs) client-side. The post demonstrates how these technologies can be combined to create a powerful and privacy-preserving question-answering system that operates without server-side components.

The core concept revolves around using Kuzu-WASM to store and query a knowledge graph directly in the browser. This eliminates the need for a remote database server and keeps sensitive data localized to the user's machine. The post uses a movie dataset as an example, showcasing how relationships between actors, movies, and genres can be represented within the graph. Queries written in Cypher, KuzuDB's query language, retrieve relevant information from this local graph based on user questions.

WebLLM then enters the scene, taking the results retrieved by Kuzu-WASM and feeding them to a locally running LLM. This LLM uses the retrieved information as context to generate a comprehensive and accurate answer to the user's query. The post highlights the use of a smaller, quantized LLM model optimized for browser execution, emphasizing the potential for performance and efficiency in this client-side architecture.

The post details the technical steps involved in setting up this in-browser RAG pipeline. It covers loading Kuzu-WASM and the chosen LLM model, populating the graph database with the movie dataset, and constructing the logic that connects user queries, graph traversal with Cypher, and LLM-powered answer generation. Code snippets are provided to illustrate the implementation.

The authors emphasize the benefits of this approach, particularly its privacy implications. By keeping data and processing local, user information is never transmitted to a server, offering a significantly more private user experience. Furthermore, the post hints at the potential for offline functionality, suggesting that this architecture could enable powerful knowledge-based applications even without an internet connection. Finally, the post encourages readers to explore and experiment with this technology, positioning it as an exciting development in the evolution of web-based applications and AI.

Summary of Comments ( 4 )
https://news.ycombinator.com/item?id=43321523

HN commenters generally expressed excitement about the potential of in-browser graph RAG, praising the demo's responsiveness and the possibilities it opens up for privacy-preserving, local AI applications. Several users questioned the performance and scalability with larger datasets, highlighting the current limitations of WASM and browser storage. Some suggested potential applications, like analyzing personal knowledge graphs or interacting with codebases. Concerns were raised about the security implications of running LLMs client-side, and the challenge of keeping WASM binaries up-to-date. The closed-source nature of KuzuDB also prompted discussion, with some advocating for open-source alternatives. Several commenters expressed interest in trying the demo and exploring its capabilities further.

The Hacker News post discussing in-browser graph RAG with Kuzu-WASM and WebLLM has generated several comments, offering a range of perspectives on the project.

One commenter expresses excitement about the potential of WebAssembly for database applications, specifically highlighting the possibility of running complex queries client-side without server dependencies. They see this as a significant step toward enabling powerful and responsive web applications. They also inquire about the feasibility of using this technology with larger datasets, acknowledging the current limitations of browser storage.

Another commenter raises a practical concern about the performance implications of handling large graph datasets within the browser. They question whether the current implementation can efficiently manage substantial graphs and suggest that server-side processing might be more suitable for complex graph operations on large datasets. This comment highlights a common trade-off between client-side convenience and server-side performance when dealing with data-intensive applications.

A further comment delves into the specifics of the technology, mentioning the use of Apache Arrow for data serialization. They posit that this choice could be contributing to performance bottlenecks, particularly when transferring data between JavaScript and WebAssembly. They suggest exploring alternative serialization methods or optimizing the data transfer process to improve overall efficiency.

Another individual inquires about the licensing of the project, expressing interest in its potential applications. This highlights the importance of clear licensing information for open-source projects to encourage adoption and collaboration.

The discussion also touches upon the security implications of running database queries within the browser environment. One comment raises the concern of potential vulnerabilities arising from client-side execution and suggests that careful consideration should be given to security best practices.

Finally, a commenter expresses enthusiasm for the project's potential to democratize access to graph databases, making them more accessible to developers and users without requiring specialized server infrastructure. They see this as a positive step towards empowering individuals and smaller organizations to leverage the power of graph technology.

In summary, the comments on the Hacker News post reflect a general interest in the project while also raising important questions and concerns regarding performance, scalability, security, and licensing. The discussion highlights the potential benefits and challenges of bringing graph database technology to the browser environment.

Show HN: Knowledge graph of restaurants and chefs, built using LLMs

permalink

Posted: 2025-03-03 15:43:20

Theophile Cantelo has created Foudinge, a knowledge graph connecting restaurants and chefs. Leveraging Large Language Models (LLMs), Foudinge extracts information from various online sources like blogs, guides, and social media to establish relationships between culinary professionals and the establishments they've worked at or own. This allows for complex queries, such as finding all restaurants where a specific chef has worked, discovering connections between different chefs through shared work experiences, and exploring the culinary lineage within the restaurant industry. Currently focused on French gastronomy, the project aims to expand its scope geographically and improve data accuracy through community contributions and additional data sources.

Théophile Cantelobre has introduced "Foudinge," a novel knowledge graph specifically focused on the culinary world, encompassing restaurants and chefs. This project leverages the power of Large Language Models (LLMs) to construct and populate the graph with information extracted from diverse online sources. Cantelobre details the process of building Foudinge, highlighting the challenges and solutions encountered along the way.

Initially, the project aimed to be a comprehensive database of French gastronomy, but it quickly evolved into a more generalized platform capable of representing culinary knowledge globally. The core of Foudinge lies in its ability to identify and link entities such as restaurants and chefs, establishing relationships between them like "Chef X works at Restaurant Y." This linking process is automated using LLMs, which analyze textual data from sources like restaurant websites, blogs, news articles, and social media platforms. This automated approach allows Foudinge to scale rapidly and incorporate information from a vast range of online resources.

The construction of Foudinge involved several key steps. First, an initial dataset was compiled, encompassing various data points related to restaurants and chefs. This data was then processed using LLMs to extract relevant information and transform it into a structured format suitable for a knowledge graph. The LLMs were instrumental in identifying and disambiguating entities, ensuring that the same chef or restaurant is represented consistently across different sources. Furthermore, the LLMs helped to infer relationships between entities based on the contextual information available in the source material.

Cantelobre acknowledges the inherent challenges of working with LLMs, such as potential biases in the training data and occasional inaccuracies in the generated output. To mitigate these challenges, Foudinge incorporates a validation process involving both automated checks and manual review. This iterative refinement process ensures the accuracy and reliability of the knowledge graph.

The long-term vision for Foudinge is to become a valuable resource for culinary enthusiasts, professionals, and researchers. Its structured data and interconnectedness allow for complex queries and analyses, enabling users to explore the culinary landscape in novel ways. For instance, one could trace the career trajectory of a chef, identify restaurants with similar culinary styles, or investigate the influence of specific chefs on regional cuisines. Cantelobre envisions Foudinge as a dynamic and evolving platform, continuously incorporating new information and expanding its coverage of the culinary world. He invites feedback and contributions from the community to further enhance the project and maximize its potential.

Summary of Comments ( 16 )
https://news.ycombinator.com/item?id=43242818

Hacker News users generally expressed skepticism about the value proposition of the presented knowledge graph of restaurants and chefs. Several commenters questioned the accuracy and completeness of the data, especially given its reliance on LLMs. Some doubted the usefulness of connecting chefs to restaurants without further context, like the time period they worked there. Others pointed out the existing prevalence of this information on platforms like Wikipedia and guide sites, questioning the need for a new platform. The lack of a clear use case beyond basic information retrieval was a recurring theme, with some suggesting potential applications like tracking career progression or identifying emerging culinary trends, but ultimately finding the current implementation insufficient. A few commenters appreciated the technical effort, but overall the reception was lukewarm, focused on the need for demonstrable practical application and improved data quality.

The Hacker News post titled "Show HN: Knowledge graph of restaurants and chefs, built using LLMs" generated a moderate amount of discussion, with a focus on the practical application and potential limitations of the project.

Several commenters expressed interest in the project's potential, particularly regarding its use for restaurant recommendations. One commenter highlighted the difficulty of finding good restaurants in unfamiliar cities and suggested the knowledge graph could be helpful in this scenario, particularly if it allowed users to filter by cuisine type and other specific criteria. They also inquired about the possibility of incorporating user reviews or ratings into the system.

Another user echoed this sentiment, pointing out that existing restaurant recommendation platforms often rely on outdated or inaccurate information. They envisioned the project as a valuable tool for both diners and restaurant owners, providing a centralized and up-to-date resource for restaurant information.

However, some commenters expressed concerns about the project's reliance on LLMs. One commenter pointed out the potential for hallucinations and inaccuracies in LLM-generated data, emphasizing the importance of thorough verification and fact-checking. They also questioned the long-term viability of relying solely on LLMs for data collection and maintenance, suggesting that a more robust approach might involve incorporating human input and curation.

The creator of the project engaged with the commenters, acknowledging the challenges of LLM-based data generation and outlining plans to address these concerns. They mentioned plans to implement a feedback mechanism to flag inaccurate information and explore methods for verifying the accuracy of LLM-generated data. They also discussed potential future features, such as incorporating user reviews, dietary information, and real-time menu updates.

A recurring theme in the comments was the need for a practical application or interface for the knowledge graph. Commenters suggested various use cases, including a dedicated search engine for restaurants, a mobile app for on-the-go recommendations, and integration with existing restaurant platforms.

Finally, one commenter raised a broader point about the ethical implications of using LLMs to scrape data from the web, questioning the potential impact on website owners and the overall ecosystem of online information. This sparked a brief discussion about the responsible use of LLMs and the importance of respecting website terms of service. While not directly related to the project itself, this comment highlighted the broader ethical considerations surrounding LLM-driven data collection.

Representing Graphs in PostgreSQL

permalink

Posted: 2025-02-17 12:15:01

This blog post explores different ways to represent graph data within PostgreSQL. It primarily focuses on the adjacency list model, using a simple table with "source" and "target" columns to define relationships between nodes. The author demonstrates how to perform common graph operations like finding neighbors and traversing paths using recursive CTEs (Common Table Expressions). While acknowledging other models like adjacency matrix and nested sets, the post emphasizes the adjacency list's simplicity and efficiency for many graph use cases within a relational database context. It also briefly touches on performance considerations and the potential for using materialized views for complex or frequently executed queries.

This blog post by Richard Towers explores different methods for representing graph data structures within a PostgreSQL database. It begins by acknowledging the increasing prevalence of graph data in various applications and the consequent need for efficient storage and querying within relational databases. The post then systematically presents three primary approaches to representing graphs in PostgreSQL, evaluating each method's strengths and weaknesses.

The first method discussed is the adjacency list, a classic graph representation. This approach uses a single table with two columns, one representing the source node and the other representing the target node of each edge. The post highlights the simplicity and efficiency of this representation for basic graph traversal queries, especially when using recursive Common Table Expressions (CTEs). However, it also points out the limitations of adjacency lists when dealing with more complex graph properties like edge weights or directedness. The post demonstrates how to add additional columns to the adjacency list table to accommodate such properties, albeit with a slight increase in complexity.

Next, the post introduces the edge list representation, which is fundamentally similar to the adjacency list. The key distinction is a more explicit naming convention for the columns, often using 'source' and 'target' to clearly identify the nodes connected by each edge. This semantic clarity can improve readability and maintainability, especially for larger and more intricate graphs. Functionally, the edge list operates similarly to the adjacency list in terms of query performance and capabilities.

The third and final method presented is the adjacency matrix. This approach employs a table where both rows and columns represent nodes. The presence of a value (typically '1' or 'true') at the intersection of a row and column signifies an edge between the corresponding nodes. The absence of a value indicates no edge. The post emphasizes the advantages of adjacency matrices for certain graph algorithms and operations, particularly those involving dense graphs where checking for the existence of an edge is frequent. However, it also underscores the significant drawbacks of adjacency matrices, specifically their increased storage requirements, especially for sparse graphs, and the potential performance implications when dealing with large graphs. The author notes the difficulty of representing weighted graphs with a simple adjacency matrix and suggests possible workarounds, such as using a separate table to store edge weights.

In conclusion, the post offers a concise overview of three distinct strategies for storing graph data within PostgreSQL. It provides practical SQL examples for each method, enabling readers to experiment and choose the most appropriate representation for their specific use case. The post implicitly encourages developers to carefully consider the trade-offs between simplicity, storage efficiency, and query performance when selecting a graph representation within a relational database like PostgreSQL.

Summary of Comments ( 63 )
https://news.ycombinator.com/item?id=43078100

Hacker News users discussed the practicality and performance implications of representing graphs in PostgreSQL. Several commenters highlighted the existence of specialized graph databases like Neo4j and questioned the suitability of PostgreSQL for complex graph operations, especially at scale. Concerns were raised about the performance of recursive queries and the difficulty of managing deeply nested relationships. Some suggested that while PostgreSQL can handle simpler graph scenarios, dedicated graph databases offer better performance and features for more complex graph use cases. A few commenters mentioned alternative approaches within PostgreSQL, such as using JSON fields or the extension pg_graphql. Others pointed out the benefits of using PostgreSQL for graphs when the graph aspect is secondary to other relational data needs already served by the database.

The Hacker News post "Representing Graphs in PostgreSQL" discussing the linked blog post has generated several comments, exploring different facets of graph representation and database choices.

One commenter highlights the performance benefits of specialized graph databases like Neo4j, especially when dealing with deep traversals, a known weakness of relational databases. They acknowledge PostgreSQL's capabilities for simpler graph operations but advise considering dedicated graph databases for complex graph structures and queries.

Another comment emphasizes the importance of choosing the right tool for the job, echoing the previous sentiment. They suggest that while PostgreSQL can handle graph-like relationships, using a dedicated graph database might be more suitable and efficient for complex graph operations. They point out that the choice depends on the specific use case and the complexity of the graph data and queries.

A different commenter shares their experience with using PostgreSQL for representing a large graph, specifically a social network. They found PostgreSQL's JSONField type to be quite efficient for their needs, storing additional data within the nodes. This suggests that PostgreSQL, while not a dedicated graph database, can be a practical solution for specific graph use cases with appropriate data structuring.

Adding to the discussion of specialized databases, another commenter mentions Amazon Neptune, highlighting its focus on graph data and suggesting it as an alternative for those seeking a managed graph database solution. This broadens the scope of the discussion beyond self-hosted options like Neo4j and PostgreSQL.

One commenter questions the blog post's claim about adjacency lists being simpler, arguing that an adjacency matrix representation could be more straightforward for certain use cases involving dense graphs. They suggest that the choice between adjacency lists and matrices depends on the sparsity or density of the graph data being represented.

Further contributing to the performance discussion, a commenter points out that recursive CTEs (Common Table Expressions) in PostgreSQL, often used for graph traversals, can be significantly slower than dedicated graph databases. They reiterate the advice to choose the right tool based on the complexity of the graph operations.

Finally, a commenter brings up the concept of hypergraphs and the difficulty of representing them efficiently in relational databases. This introduces a more specialized aspect of graph representation, highlighting the limitations of relational databases for certain graph structures.

In summary, the comments on Hacker News offer a diverse range of perspectives on representing graphs in PostgreSQL. While acknowledging PostgreSQL's flexibility, they emphasize the importance of considering the complexity of the graph data and queries when choosing between a relational database and a dedicated graph database. They discuss performance considerations, alternative database solutions, and the nuances of representing different graph structures.

Stories with Tag Graph Databases

Show HN: In-Browser Graph RAG with Kuzu-WASM and WebLLM

Summary of Comments ( 4 ) https://news.ycombinator.com/item?id=43321523

Show HN: Knowledge graph of restaurants and chefs, built using LLMs

Summary of Comments ( 16 ) https://news.ycombinator.com/item?id=43242818

Representing Graphs in PostgreSQL

Summary of Comments ( 63 ) https://news.ycombinator.com/item?id=43078100

Summary of Comments ( 4 )
https://news.ycombinator.com/item?id=43321523

Summary of Comments ( 16 )
https://news.ycombinator.com/item?id=43242818

Summary of Comments ( 63 )
https://news.ycombinator.com/item?id=43078100