hackslash dot org

LumoSQL

Posted: 2025-05-27 10:39:30

LumoSQL is an experimental project aiming to improve SQLite performance and extensibility by rewriting it in a modular fashion using the Lua programming language. It leverages Lua's JIT compiler and flexible nature to potentially surpass SQLite's speed while maintaining compatibility. This modular architecture allows for easier experimentation with different storage engines, virtual table implementations, and other components. LumoSQL emphasizes careful benchmarking and measurement to ensure performance gains are real and significant. The project's current focus is demonstrating performance improvements, after which features like improved concurrency and new functionality will be explored.

LumoSQL is a project with the ambitious goal of building a new, high-performance implementation of the industry-standard SQL database language, leveraging the speed and security advantages of the SQLite database engine. It aims to be a drop-in replacement for existing SQLite deployments, providing significant performance improvements without requiring application code changes. The project's core strategy involves reimplementing the SQL processing layer, including the parser, planner, and optimizer, while retaining the highly optimized storage engine and virtual machine components of SQLite. This approach allows LumoSQL to capitalize on SQLite's strengths while addressing performance bottlenecks in the SQL processing pipeline.

A key aspect of LumoSQL is its modular design, which encourages experimentation and allows for pluggable components. This modularity facilitates the development of new features and optimizations without impacting the stability of the core engine. The project explicitly focuses on improving performance in specific areas, such as query parsing, planning, and execution. This targeted approach, combined with rigorous benchmarking and profiling, allows developers to measure progress and identify areas for further optimization.

LumoSQL is being developed with a strong emphasis on testability and maintainability. Comprehensive test suites are used to ensure correctness and prevent regressions. The project also prioritizes clear documentation and a well-defined development process to promote community involvement and long-term sustainability. While still under active development, LumoSQL represents a promising effort to enhance SQL database performance by building upon the solid foundation of SQLite. The project invites contributions and collaborations from the broader open-source community, encouraging developers to participate in testing, benchmarking, and feature development. Ultimately, LumoSQL aims to deliver a robust, high-performance, and easily deployable SQL database solution suitable for a wide range of applications.

Summary of Comments ( 77 )
https://news.ycombinator.com/item?id=44105619

Hacker News users discussed LumoSQL's approach of compiling SQL to native code via LLVM, expressing interest in its potential performance benefits, particularly for read-heavy workloads. Some questioned the practical advantages over existing optimized databases and raised concerns about the complexity of the compilation process and debugging. Others noted the project's early stage and the need for more benchmarks to validate performance claims. Several commenters were curious about how LumoSQL handles schema changes and concurrency control, with some suggesting comparisons to SQLite's approach. The tight integration with SQLite was also a topic of discussion, with some seeing it as a strength for leveraging existing tooling while others wondered about potential limitations.

The Hacker News post titled "LumoSQL" (https://news.ycombinator.com/item?id=44105619) has a modest number of comments, discussing the project's approach, potential benefits, and some concerns.

Several commenters express interest in the project's goal of building a more reliable and verifiable SQLite. One commenter praises the project's focus on stability and the removal of legacy code, viewing it as a valuable contribution. They specifically mention that the careful approach to backwards compatibility is a wise decision. Another commenter highlights the potential of LumoSQL to serve as a reliable foundation for other projects. The use of SQLite as a base is seen as a strength due to its wide usage and established reputation.

There's a discussion around the use of Lua for extensions. One commenter points out the potential security implications of using Lua, particularly concerning untrusted inputs. They emphasize the importance of careful sandboxing to mitigate these risks. Another commenter acknowledges the security concerns but also mentions Lua's speed and ease of integration as potential benefits.

The licensing of LumoSQL also comes up. One commenter questions the specific terms of the license and its implications for commercial use. Another clarifies that the project uses the same license as SQLite, addressing the initial concern.

One commenter expresses skepticism about the long-term viability of the project, questioning whether it will gain enough traction to sustain itself. They also mention the challenge of attracting contributors and maintaining momentum.

Performance is also a topic of discussion, with one commenter inquiring about any performance benchmarks comparing LumoSQL to SQLite. This comment, however, remains unanswered.

Finally, there are comments focusing on the technical aspects of the project. One commenter asks about the project's approach to compilation, particularly regarding static versus dynamic linking. Another commenter inquires about the rationale behind specific architectural choices. These technical questions generally receive responses from individuals involved with the LumoSQL project, providing further clarification and insights.

Waiting for Postgres 18: Accelerating Disk Reads with Asynchronous I/O

permalink

Posted: 2025-05-07 14:57:03

PostgreSQL 18 introduces asynchronous I/O (AIO) for reading data from disk, significantly improving performance, especially for workloads involving large scans and random access. Previously, reading data from disk was a synchronous process, stalling other database operations. Now, with AIO, PostgreSQL can initiate multiple disk read requests concurrently and continue processing other tasks while waiting, minimizing idle time and latency. This results in substantial speedups for read-heavy workloads, potentially improving performance by up to 3x in some cases. While initially focused on relation data files, future versions aim to extend AIO support to other areas like WAL files and temporary files, further enhancing PostgreSQL's performance.

The PostgreSQL community eagerly anticipates the release of Postgres 18, which promises significant performance improvements, especially for workloads involving extensive disk reads. A key contributor to this enhanced performance is the introduction of asynchronous I/O (async I/O) for reading data from disk. Historically, Postgres has relied on synchronous I/O, meaning the database process would block and wait until the data was completely read from disk before continuing. This waiting period, although seemingly small for individual operations, can accumulate and become a major performance bottleneck, particularly when dealing with large datasets or complex queries requiring retrieval of data from multiple locations on disk.

Asynchronous I/O, the focal point of this performance enhancement, allows Postgres to issue multiple read requests concurrently without waiting for each individual request to complete. This concurrent processing significantly reduces idle time and maximizes throughput. While the database process initiates a read request, it can continue other tasks, such as processing other parts of the query or handling other client requests. Once the data for a specific request becomes available, the process is notified and can immediately utilize the retrieved information. This change effectively decouples the database process from waiting on disk I/O, allowing for more efficient utilization of resources and faster query execution.

The blog post highlights the evolution of Postgres's handling of disk I/O. Pre-version 18, even when using operating system-level asynchronous I/O interfaces like io_uring, Postgres still maintained a synchronous behavior within its own processes. This meant potential performance gains from underlying async capabilities were not fully realized. With Postgres 18, asynchronous I/O is integrated at the database level, enabling the true benefits of concurrent disk reads. This new implementation is expected to lead to substantial performance gains for read-heavy workloads, such as large analytical queries, data warehousing applications, and read replicas.

The article further emphasizes that this improvement is especially beneficial for scenarios involving scattered reads, where data needs to be retrieved from multiple non-contiguous locations on disk. In these situations, the overhead of seeking to different disk locations is significantly reduced because multiple read requests can be issued in parallel, minimizing the seek latency and accelerating data retrieval. This is particularly important for workloads involving large tables or indexes where data is not stored sequentially.

Finally, the blog post notes that this feature is primarily focused on reading data from disk. Writing data to disk still largely uses synchronous methods due to the need to ensure data integrity and durability. However, the developers acknowledge the potential benefits of asynchronous writes and suggest it as a potential area of future development. The introduction of asynchronous I/O for reads marks a significant advancement in Postgres's performance capabilities and paves the way for future optimizations in data access and processing.

Summary of Comments ( 137 )
https://news.ycombinator.com/item?id=43916577

Hacker News users generally expressed excitement about PostgreSQL 18's asynchronous I/O, hoping it would significantly improve performance, especially for read-heavy workloads. Some questioned the potential impact on latency and CPU usage, and whether the benefits would be noticeable in real-world scenarios. A few users discussed the complexities of implementing async I/O effectively and the potential for unintended consequences. Several commenters also mentioned other performance improvements in PostgreSQL 18, and looked forward to benchmarking the new features. There was also some discussion about the challenges of comparing benchmarks and interpreting results, and the importance of testing with realistic workloads.

The Hacker News post "Waiting for Postgres 18: Accelerating Disk Reads with Asynchronous I/O" has generated several comments discussing the implications of asynchronous I/O in Postgres 18.

Several commenters express excitement and anticipation for this feature, highlighting the potential for substantial performance improvements, particularly for read-heavy workloads. Some note that this has been a long-awaited feature and could be a significant step forward for Postgres.

One commenter mentions the complexities involved in implementing asynchronous I/O correctly and efficiently, particularly regarding error handling and ensuring data consistency. They also express curiosity about how Postgres will handle cases where asynchronous I/O isn't available or supported by the underlying operating system.

Another commenter discusses the potential benefits of asynchronous I/O in scenarios involving large datasets and complex queries, where reducing I/O wait times can significantly improve overall query performance. They also raise the question of how this change will impact resource utilization, specifically CPU and memory usage.

A few commenters draw comparisons with other database systems that already utilize asynchronous I/O, speculating on whether Postgres's implementation will offer similar or superior performance gains.

One commenter mentions the importance of benchmarking and real-world testing to fully understand the practical impact of asynchronous I/O in various use cases. They suggest that the actual performance improvements might vary depending on factors such as hardware configuration, workload characteristics, and database settings. They also express interest in seeing comparisons between Postgres 18 and earlier versions using standardized benchmarks.

There's also a discussion about the potential impact on existing applications and whether they will need modifications to take full advantage of asynchronous I/O. Some commenters suggest that the benefits might be realized transparently without code changes, while others anticipate potential compatibility issues or the need for tuning.

Finally, there's a brief discussion about the broader implications of asynchronous I/O for the future development of Postgres, with some commenters expressing hope that it will pave the way for further performance optimizations and new features in future releases.

Sandy Bridge-era motherboard gets M.2 SSD boot support 12 years after launch

permalink

Posted: 2025-05-07 12:14:27

A decade after its last update and 12 years after its initial release, the Asus P8P67 Deluxe motherboard, a Sandy Bridge-era platform, has received a new BIOS update. This surprisingly recent update adds NVMe M.2 SSD boot support through a PCIe adapter card, breathing new life into this aging yet still capable hardware. While not supporting the full speed of modern NVMe drives, this update allows users to significantly upgrade their boot drive performance and extend the lifespan of their Sandy Bridge systems.

In a remarkable display of dedication and technical ingenuity, a user has resurrected a Sandy Bridge-era motherboard, breathing new life into aging hardware. This particular motherboard, the ASUS P8Z68-V PRO, which was initially launched around 2011, has received a significant BIOS update, marking the first such update in over a decade. The key feature of this revitalizing update is the introduction of NVMe boot support, a functionality not originally present on the board. This means users can now leverage the speed and efficiency of modern M.2 NVMe solid-state drives as their primary boot devices, drastically improving system responsiveness and overall performance compared to the traditional SATA-based storage prevalent during the Sandy Bridge era. The achievement is particularly noteworthy due to the age of the platform. Twelve years in the technology world is an eternity, and motherboards from this period were designed and manufactured long before NVMe drives became commonplace. This BIOS update essentially bridges a significant technological gap, allowing relatively modern storage solutions to be utilized on a platform that predates them. This demonstrates not only the flexibility of the original hardware design but also the persistent community support and technical prowess involved in making such an update possible. The user behind this endeavor accomplished this feat through meticulous reverse engineering and modification of existing BIOS code, a complex and time-consuming process that highlights a dedication to preserving and enhancing older hardware. The result is a tangible performance boost for users who might still be utilizing Sandy Bridge systems, offering them a pathway to modernize a key component without requiring a full platform overhaul. This update breathes new life into what some might consider obsolete hardware, showcasing the potential longevity of well-designed computer components and the enduring value of community-driven support.

Summary of Comments ( 16 )
https://news.ycombinator.com/item?id=43914677

Hacker News commenters generally expressed appreciation for the dedication and ingenuity involved in updating a 12-year-old motherboard to support modern NVMe drives. Several users shared similar experiences of reviving older hardware, highlighting the satisfaction of extending the lifespan of functional components. Some questioned the practical benefits given the age of the platform, suggesting a full system upgrade might be more worthwhile for performance gains. Others pointed out the potential value for specific use cases like home servers or retro gaming rigs where maintaining compatibility with older hardware is desirable. A few users also discussed the technical challenges involved in such updates, including BIOS limitations and potential compatibility issues.

The Hacker News post discussing the Tom's Hardware article about a Sandy Bridge-era motherboard receiving M.2 SSD boot support has generated several comments. Many users expressed appreciation for the dedication and effort involved in updating such an old platform. One user highlighted the positive implications for e-waste reduction, stating that extending the life of older hardware is a significant step towards sustainability. They further pointed out the potential cost savings for users who can upgrade existing systems rather than buying new ones. The sentiment of appreciating the environmental benefits of this BIOS update was echoed by several other commenters.

Several users discussed the technical aspects of the update, questioning the specific methods used to achieve M.2 boot support on a platform that predates the standard. Some speculated about the involvement of PCIe to M.2 adapters and the potential performance limitations of such a solution. One user questioned whether the BIOS update truly adds native NVMe support or relies on SATA emulation for the M.2 drive. This prompted discussion about the chipset's capabilities and the likelihood of achieving full NVMe speeds.

There was some reminiscing about the Sandy Bridge era, with users sharing their experiences and memories of building and using systems based on that platform. One user expressed surprise that motherboards from that generation are still functional, highlighting the longevity of well-built hardware.

A few commenters also discussed the broader implications of this update for older hardware. They pondered the possibility of similar updates for other legacy platforms and expressed hope that manufacturers would continue to support older products. The discussion touched on the increasing complexity of modern hardware and the challenges associated with maintaining backward compatibility. Some users suggested that community-driven projects and open-source firmware could play a role in extending the lifespan of older hardware.

One commenter provided additional context about the specific motherboard model, the ASUS P8P67 Deluxe, mentioning its relative high-end status at the time of release and the robust feature set that might have contributed to its continued relevance.

Finally, there were some lighthearted comments about the irony of a twelve-year-old motherboard receiving an update while newer devices often suffer from planned obsolescence. This sentiment reflected a broader concern about the consumer electronics industry's tendency to push for frequent upgrades rather than supporting existing products.

Why Flatpak Apps Use So Much Disk Space on Linux

permalink

Posted: 2025-05-04 14:57:59

Flatpaks consume significant disk space because they bundle all their dependencies, including libraries and runtimes, within each application. This avoids dependency conflicts but leads to redundancy, especially when multiple Flatpaks share common libraries. While deduplication efforts exist at the file system level with OSTree, and some shared runtimes are used, many applications still ship with their own unique copies of common dependencies. This "bundling everything" approach, while beneficial for consistent performance and cross-distribution compatibility, contributes to the larger storage footprint compared to traditional package managers that leverage shared system libraries. Furthermore, Flatpak stores multiple versions of the same application for rollback functionality, further increasing disk usage.

The article "Why Flatpak Apps Use So Much Disk Space on Linux" on OSTechnix explores the reasons behind the seemingly large disk space consumption of Flatpak applications compared to traditional Linux packages. It begins by acknowledging the user perception of Flatpak's disk usage, explaining that while a Flatpak application might appear to be significantly larger than its traditional counterpart, the reality is more nuanced.

The core of Flatpak's disk usage lies in its fundamental design principle: application isolation. Unlike traditional packages that often share libraries and dependencies system-wide, Flatpaks bundle almost everything an application needs to run within its own self-contained environment. This includes libraries, runtimes, and other dependencies, leading to a larger initial download and installation size. The article explains that while this approach appears redundant – having multiple copies of the same library on the system – it provides key benefits like enhanced security and stability by preventing conflicts between applications and the host system, and ensuring consistent execution across different Linux distributions.

The article then delves into the concept of runtimes, which are pre-built sets of common libraries bundled together. Applications built for the same runtime can share these libraries, thus minimizing redundancy and reducing overall disk space usage. This mechanism is analogous to a foundation upon which multiple applications can be built. While the initial download of a runtime can be substantial, subsequent applications utilizing that same runtime only need to download the application-specific components, minimizing the incremental disk space impact.

The concept of deduplication is also explained, highlighting that while multiple applications may install the same runtime or libraries, the underlying file system often employs mechanisms to avoid storing multiple physical copies of these identical files. Instead, hard links or similar techniques are used to point multiple applications to the same physical data blocks on the disk, significantly reducing the actual disk space consumed.

Furthermore, the article contrasts Flatpak's approach with traditional package managers, which often scatter dependencies across the file system. It argues that while Flatpak's self-contained bundles appear large, they actually offer a more organized and manageable approach to dependencies, simplifying updates and removal. With Flatpak, uninstalling an application removes all its associated files completely, leaving no orphaned dependencies behind, unlike traditional package management where leftover dependencies can accumulate over time.

Finally, the article touches on the trade-off between disk space and the benefits offered by Flatpaks. While acknowledging the larger initial download size, it emphasizes the advantages of isolation, security, cross-distribution compatibility, and cleaner application management, concluding that the extra disk space usage is often a reasonable price to pay for these benefits in the modern Linux ecosystem.

Summary of Comments ( 1 )
https://news.ycombinator.com/item?id=43887073

HN commenters generally agree that Flatpak's disk space usage is a valid concern, especially for users with limited storage. Several point out that the deduplication system, while theoretically efficient, doesn't always work as intended, leading to redundant libraries and inflated app sizes. Some suggest that the benefits of Flatpak, like sandboxing and consistent runtime environments, outweigh the storage costs, particularly for less experienced users. Others argue that alternative packaging formats like .deb or .rpm are more space-efficient and sufficient for most use cases. A few commenters mention potential solutions, such as improved deduplication or allowing users to share runtimes across different distributions, but acknowledge the complexity of implementing these changes. The lack of clear communication about Flatpak's disk usage and the absence of easy tools to manage it are also criticized.

The Hacker News post "Why Flatpak Apps Use So Much Disk Space on Linux" generated a robust discussion with several compelling comments. Many users chimed in with their perspectives and experiences regarding Flatpak disk usage.

A significant number of commenters agreed with the article's premise, noting the often-redundant nature of runtime dependencies leading to wasted space. Some users shared anecdotes of surprisingly large disk space consumption by seemingly small applications. The observation about separate runtimes for even minor version differences between dependencies was a recurring theme, with many echoing the frustration of having multiple copies of essentially the same libraries.

Several commenters delved into the tradeoffs inherent in Flatpak's design. While acknowledging the disk space issue, they highlighted the benefits of application isolation and predictable execution environments provided by Flatpak. The discussion touched upon the advantages of this approach for software distribution and maintenance, especially regarding dependency management and avoiding conflicts. Some argued that the disk space tradeoff is acceptable in exchange for the stability and ease of use Flatpak offers.

Some users offered potential solutions and mitigating factors. The potential for deduplication of runtime files was mentioned, as was the relatively low cost of storage these days. Others suggested that the issue might be less pronounced with more widespread adoption of Flatpak, as more applications would share common runtimes.

A few commenters compared Flatpak to other packaging formats like Snaps, drawing parallels and highlighting differences in their approaches to dependency management and disk usage. The relative merits and drawbacks of each system were debated, with no clear consensus emerging.

A smaller subset of the comments questioned the article's methodology or argued that the issue was exaggerated. They pointed out that the disk space used by runtimes is often shared between multiple applications, mitigating the overall impact. Others suggested that the article focused on extreme cases and didn't represent typical Flatpak usage.

ZFS: Apple's New Filesystem that wasn't (2016)

permalink

Posted: 2025-04-27 09:25:36

The blog post explores the history of Apple's rumored adoption of ZFS, the advanced file system. While Apple engineers internally prototyped and tested ZFS integration, ultimately licensing and legal complexities, combined with performance concerns specific to Apple's hardware (particularly flash storage) and the desire for full control over the file system's development, prevented its official adoption. Though ZFS offered appealing features, Apple chose to focus on its own in-house solutions, culminating in APFS. The post debunks claims of a fully functioning "ready to ship" ZFS implementation within OS X 10.5, clarifying it was experimental and never intended for release.

Adam Leventhal's blog post, "ZFS: Apple's New Filesystem that wasn't (2016)," delves into the historical flirtation between Apple and the ZFS filesystem, ultimately explaining why the promising union never materialized into a shipped product. Leventhal begins by setting the stage in the mid-2000s, a period when Apple was actively seeking a successor to their aging HFS+ filesystem. ZFS, with its advanced features like copy-on-write, snapshots, checksumming, and dynamic striping, appeared to be a compelling candidate. This potential was further amplified by Sun Microsystems, the creators of ZFS, open-sourcing the filesystem under the Common Development and Distribution License (CDDL). Apple even hired Don Brady, a key ZFS engineer, fueling speculation of imminent ZFS integration into macOS.

The blog post then highlights a pivotal moment: the WWDC 2006 "leopard sneak peek," where Apple showcased ZFS as a potential cornerstone of their upcoming operating system, Mac OS X Leopard. Screenshots displayed ZFS volumes seamlessly integrated within Disk Utility, bolstering the perception that ZFS adoption was all but certain. However, this initial enthusiasm eventually waned.

Leventhal then carefully dissects the reasons behind Apple's eventual abandonment of ZFS. He points to the licensing incompatibility between the CDDL and Apple's proprietary software ecosystem as a significant hurdle. Integrating a CDDL-licensed component into macOS could potentially create legal complexities and restrict Apple's ability to maintain tight control over their operating system's codebase. Furthermore, the blog post mentions Sun Microsystems' acquisition by Oracle in 2010, introducing further uncertainty into the future of ZFS and potentially complicating any pre-existing agreements between Apple and Sun.

The author also explores technical challenges that might have contributed to Apple's decision. While ZFS boasted impressive features, its performance characteristics may not have been ideally suited for Apple's hardware, particularly on systems with limited RAM. The memory intensity of ZFS, although beneficial for data integrity and performance in certain scenarios, could have posed a problem on lower-end Mac configurations.

Finally, Leventhal concludes that while ZFS integration into macOS seemed promising at one point, a combination of licensing conflicts, the Oracle acquisition of Sun, and potential performance considerations ultimately led Apple to pursue alternative filesystem solutions. He underscores the missed opportunity, acknowledging the potential benefits ZFS could have brought to macOS users, but ultimately recognizing the complex factors that influenced Apple's final decision. The post serves as a retrospective analysis of a significant "what-if" moment in Apple's history, illuminating the intricate interplay of technical capabilities, legal constraints, and business strategy that shape technological advancements.

Summary of Comments ( 63 )
https://news.ycombinator.com/item?id=43810566

HN commenters discuss Apple's exploration and ultimate rejection of ZFS. Some highlight the licensing incompatibility as the primary roadblock, with ZFS's CDDL clashing with Apple's restrictive approach. Others speculate about Apple's internal politics and the potential "not invented here" syndrome influencing the decision. A few express disappointment, believing ZFS would have significantly benefited macOS, while some counter that APFS, Apple's eventual solution, adequately addresses their needs. The potential performance implications of ZFS on Apple hardware are also debated, with some arguing that Apple's hardware is uniquely suited to ZFS's strengths. Finally, the technical challenges of integrating ZFS, especially regarding snapshots and Time Machine, are mentioned as potential reasons for Apple's decision.

The Hacker News post discussing the article "ZFS: Apple's New Filesystem that wasn't" contains a robust discussion with several compelling comments.

Several commenters discuss the licensing incompatibility between ZFS's CDDL and Apple's macOS/iOS ecosystem. One commenter succinctly explains that while Apple could have theoretically re-licensed their OS, the effort and potential instability would have outweighed the benefits. They suggest the legal complications were likely insurmountable. Another clarifies the nuanced differences between "GPL infectiousness" (which ZFS/CDDL doesn't have) and the practical impossibility of integrating CDDL code into a proprietary derivative work like macOS. This incompatibility issue is the central theme of many comments.

Another thread dives into the technical aspects of ZFS and why it might have been appealing to Apple. One commenter praises ZFS's snapshotting and checksumming capabilities, highlighting the potential for a more robust and reliable file system for macOS. However, another points out the performance overhead ZFS could introduce, particularly with Apple's focus on flash storage at the time. The discussion explores the trade-offs between features like data integrity and raw speed, suggesting Apple likely prioritized the latter for their target hardware and use cases.

Several commenters reminisce about the anticipation surrounding the rumored integration of ZFS into macOS and the subsequent disappointment. One shares personal anecdotes about testing early ZFS implementations on OS X, hinting at the promise that ultimately went unfulfilled.

The performance characteristics of ZFS are debated, with some arguing its RAM requirements would have been prohibitive for mainstream adoption on Apple hardware at the time. Counterarguments suggest that ZFS's design could have benefited from the increasing RAM capacities in later Apple machines.

Finally, some comments touch upon the political and business dynamics between Apple and Sun Microsystems (later Oracle), speculating about the potential role these relationships played in the decision. While mostly conjecture, these comments add another layer to the discussion surrounding the failed integration.

In summary, the comments section provides a multifaceted perspective on the ZFS and Apple situation, covering the legal, technical, and historical aspects of the unfulfilled integration. The discussion highlights the complexities of software licensing, the trade-offs in filesystem design, and the realities of business decisions within the tech industry.

Google Cloud Rapid Storage

permalink

Posted: 2025-04-10 01:05:30

Google Cloud has expanded its AI infrastructure with new offerings focused on speed and scale. The A3 VMs, based on Nvidia H100 GPUs, are designed for large language models and generative AI training and inference, providing significantly improved performance compared to previous generations. Google is also improving networking infrastructure with the introduction of Cross-Cloud Network platform, allowing easier and more secure connections between Google Cloud and on-premises environments. Furthermore, Google Cloud is enhancing data and storage capabilities with updates to Cloud Storage and Dataproc Spark, boosting data access speeds and enabling faster processing for AI workloads.

The Google Cloud blog post titled "What’s new with the AI hypercomputer" details recent advancements and expansions within Google's cloud infrastructure specifically designed to support and accelerate Artificial Intelligence workloads. While the title might suggest a singular, monolithic "hypercomputer," the post clarifies that it refers to a comprehensive and interconnected suite of hardware and software services working in concert. This "AI hypercomputer" aims to provide researchers and developers with the necessary tools to train and deploy increasingly complex and demanding AI models.

A central theme of the post is the optimization of performance and scalability. Google highlights its custom-designed Tensor Processing Units (TPUs), specifically the TPU v5e, emphasizing its cost-effectiveness and improved training performance per dollar compared to its predecessor, the TPU v4. The TPU v5e is presented as a versatile option suitable for a wide range of AI tasks, including large language models, generative AI, and diffusion models, accessible through various compute options like single virtual machines or larger pods for more demanding workloads. Furthermore, the post elaborates on the flexible scaling capabilities of the TPU v5e, enabling users to dynamically adjust resources to match the fluctuating demands of their AI training processes.

Beyond just raw processing power, the post underscores advancements in networking infrastructure. It introduces Cloud TPU performance characterization, providing users with valuable insights into the performance characteristics of their chosen TPU configuration, helping them to optimize their workloads and predict training times more accurately. The post also emphasizes the importance of efficient data movement for AI training, showcasing advancements like the integration of the Google Kubernetes Engine (GKE) with TPUs, facilitating seamless orchestration and management of containerized AI workloads.

The post also touches upon software and tooling enhancements within the broader AI platform. Mention is made of the integration of Gemini, Google's latest large language model, into Vertex AI, providing developers with access to advanced language processing capabilities. The post also highlights advancements in the Model Garden, a curated collection of pre-trained models, and Generative AI Studio, a suite of tools designed to streamline the development and deployment of generative AI applications. These additions further enhance the accessibility and usability of Google's AI platform, empowering developers to leverage the full potential of the underlying hardware infrastructure. In summary, the post paints a picture of a continuously evolving and expanding AI ecosystem within Google Cloud, focused on delivering performance, scalability, and accessibility to researchers and developers pushing the boundaries of artificial intelligence.

Summary of Comments ( 68 )
https://news.ycombinator.com/item?id=43639642

HN commenters are skeptical of Google's "AI hypercomputer" announcement, viewing it more as a marketing push than a substantial technical advancement. They question the vagueness of the term "hypercomputer" and the lack of concrete details on its architecture and capabilities. Several point out that Google is simply catching up to existing offerings from competitors like AWS and Azure in terms of interconnected GPUs and high-speed networking. Others express cynicism about Google's track record of abandoning cloud projects. There's also discussion about the actual cost-effectiveness and accessibility of such infrastructure for smaller research teams, with doubts raised about whether the benefits will trickle down beyond large, well-funded organizations.

There's White Stuff Growing on Your Cheese That Isn't Mold (2018)

permalink

Posted: 2025-03-31 14:49:42

That white stuff on your cheese might not be mold! It could be calcium lactate crystals, a harmless byproduct of aging. These crystals, often found on aged cheeses like cheddar, Gouda, and Parmesan, form when lactic acid reacts with calcium in the cheese. They usually appear as small, white, gritty or crunchy spots and indicate a well-aged and flavorful cheese. While they might look unusual, calcium lactate crystals are safe to eat and contribute to the cheese's unique texture and taste. So, before you toss that block of cheese, consider that the "white stuff" might actually be a sign of quality.

In a 2018 article titled "There's White Stuff Growing on Your Cheese That Isn't Mold," The Philadelphia Cheesesteak Company's blog elucidates the nature of the sometimes alarming, yet frequently harmless, white crystalline formations that can appear on the surface of various cheeses. The article meticulously explains that these formations are not, in fact, mold, but rather crystalline structures composed of calcium lactate. This compound arises from a fascinating interplay of chemical processes within the cheese during the aging process. Lactic acid, a natural byproduct of the bacterial cultures responsible for cheesemaking, reacts with calcium present in the cheese. This reaction results in the formation of calcium lactate, which, due to its lower solubility, precipitates out of the cheese's internal moisture, eventually appearing as small, white, often crunchy crystals on the cheese's surface.

The article further elaborates on the specific conditions that favor the formation of these calcium lactate crystals. Higher moisture content within the cheese provides a more conducive environment for the reaction between lactic acid and calcium, increasing the likelihood of crystal formation. Additionally, the article notes that aged cheeses, particularly harder varieties, are more prone to exhibiting these crystals due to the longer timeframe for these chemical reactions to occur and the denser texture which can concentrate the crystals on the surface.

The article emphasizes the benign nature of calcium lactate crystals, assuring readers that they are perfectly safe to consume and do not indicate spoilage. Indeed, these crystals often contribute a pleasant textural complexity to the cheese, sometimes described as a slight crunch. While visually they might be mistaken for mold, the article provides clear distinctions: mold tends to appear fuzzy, exhibit a variety of colors beyond white, and often imparts an unpleasant odor. Calcium lactate crystals, in contrast, maintain a distinctly crystalline structure, remain consistently white, and are odorless.

Finally, the article offers practical advice for handling cheese exhibiting these crystals. While perfectly safe to eat, the article acknowledges that some might find the texture undesirable. It suggests gently scraping the crystals off the cheese's surface if preferred. The article concludes by reassuring consumers that the presence of calcium lactate crystals is a natural phenomenon in cheesemaking, often indicative of a well-aged and flavorful product.

Summary of Comments ( 17 )
https://news.ycombinator.com/item?id=43535688

Hacker News users discuss the various types of "white stuff" that can appear on cheese, beyond just mold. Several commenters point out that the article fails to mention tyrosine crystals, which are common on aged cheeses and contribute to their flavor. Calcium lactate is also mentioned as another common, harmless crystalline formation. Some users express concern about the author's seeming encouragement to just scrape off the mold and eat the cheese, with several arguing that this is unsafe for certain molds that penetrate deeply. Others note the article conflates "safe" with "harmless", pointing out that even harmless molds might not be palatable. Finally, a few comments offer additional resources for identifying cheese molds and determining their safety.

The Hacker News post linking to the article "There's White Stuff Growing on Your Cheese That Isn't Mold" has generated a moderate number of comments, primarily discussing the nature of cheese crystals, their edibility, and different types of cheese.

Several commenters delve into the science behind these crystals, identifying them as calcium lactate or tyrosine. They explain that calcium lactate crystals are common in aged cheeses like Cheddar and Parmesan, and are formed when lactic acid reacts with calcium. These crystals are generally considered harmless and even contribute to the texture and flavor of the cheese. Tyrosine crystals, on the other hand, appear as small, white, needle-like structures and are often found in harder cheeses. They're also generally harmless and indicate a well-aged cheese.

A significant portion of the discussion revolves around the edibility and palatability of these crystals. Many commenters state that they are perfectly safe to consume and often contribute a pleasant crunch. Some even suggest that the presence of these crystals is a sign of a good quality, aged cheese. However, some express personal preferences, with a few finding the texture unpleasant.

The types of cheese prone to crystal formation are also a topic of discussion. Cheddar, Parmesan, and aged Gouda are frequently mentioned as examples. One commenter even shares a personal anecdote about encountering these crystals in an aged Comte cheese.

While the overall tone is informative and amicable, a minor debate arises about the accuracy of the article's title. Some argue that while technically these crystals aren't mold, the phrasing might be misleading to the average reader. They suggest a more precise title would focus on the harmless nature of the crystals rather than their distinction from mold.

Finally, some comments veer slightly off-topic, discussing other aspects of cheese aging and storage, including the use of cheese paper and the ideal humidity for preserving cheese.

In S3 simplicity is table stakes

permalink

Posted: 2025-03-14 11:55:17

Werner Vogels argues that while Amazon S3's simplicity was initially a key differentiator and driver of its widespread adoption, maintaining that simplicity in the face of ever-increasing scale and feature requests is an ongoing challenge. He emphasizes that adding features doesn't equate to improving the customer experience and that preserving S3's core simplicity—its fundamental object storage model—is paramount. This involves thoughtful API design, backwards compatibility, and a focus on essential functionality rather than succumbing to the pressure of adding complexity for its own sake. S3's continued success hinges on keeping the service easy to use and understand, even as the underlying technology evolves dramatically.

Werner Vogels, Amazon CTO and Vice President, in his blog post titled "In S3 simplicity is table stakes," reflects on the fifteenth anniversary of Amazon S3, the Simple Storage Service. He emphasizes that while S3's core principle and enduring value proposition has always been its radical simplicity, maintaining this simplicity amidst an ever-expanding feature set has been a continuous and deliberate effort. He argues that simplicity is no longer a differentiating factor, but rather a fundamental requirement, the "table stakes," for any storage service in today's cloud landscape.

Vogels details how the design principle of "start with the customer and work backwards" has been instrumental in preserving S3's simplicity. He illustrates this by explaining how new features are meticulously evaluated for their alignment with the core tenets of S3, ensuring they seamlessly integrate without complicating the user experience. This customer-centric approach ensures that adding features enhances, rather than detracts from, the overall simplicity. He highlights that even complex features, such as object lifecycle management and sophisticated access control mechanisms, are designed to be accessible and easily understood by users.

Furthermore, Vogels underscores the importance of backward compatibility in maintaining simplicity. He explains that changes to S3 are implemented with utmost care to avoid disrupting existing applications that rely on its consistent behavior. This commitment to backward compatibility, he asserts, provides developers with the confidence to build upon S3, knowing that their applications won't break due to unexpected changes. He elaborates on the immense scale at which S3 operates, emphasizing the careful consideration required when introducing changes that could potentially impact millions of users and trillions of objects.

The post also touches upon the growing ecosystem around S3, acknowledging the numerous third-party tools and services that integrate with it. Vogels argues that this thriving ecosystem further underscores the importance of S3's simplicity, as it allows for seamless integration and interoperability with other systems. This, he claims, allows developers to leverage the vast functionalities of S3 without having to grapple with complex integrations.

Finally, Vogels reiterates that the continuous focus on simplicity has been key to S3's long-term success. He concludes by reaffirming Amazon's commitment to maintaining this principle as S3 continues to evolve and adapt to the changing demands of the cloud computing landscape. He suggests that while the feature set may expand, the core value of simplicity will remain paramount, guaranteeing a user-friendly and dependable storage solution for years to come.

Summary of Comments ( 6 )
https://news.ycombinator.com/item?id=43361737

Hacker News users largely agreed with the premise of the article, emphasizing that S3's simplicity is its greatest strength, while also acknowledging areas where improvements could be made. Several commenters pointed out the hidden complexities of S3, such as eventual consistency and subtle performance gotchas. The discussion also touched on the trade-offs between simplicity and more powerful features, with some arguing that S3's simplicity forces users to build solutions on top of it, leading to more robust architectures. The lack of a true directory structure and efficient renaming operations were also highlighted as pain points. Some users suggested potential improvements like native support for symbolic links or atomic renaming, but the general consensus was that any added features should be carefully considered to avoid compromising S3's core simplicity. A few comments compared S3 to other storage solutions, noting that while some offer more advanced features, none have matched S3's simplicity and ubiquity.

The Hacker News post "In S3 simplicity is table stakes" (linking to an article on Werner Vogels' blog) generated a moderate discussion with several insightful comments focusing on the complexities hidden beneath S3's seemingly simple interface and the challenges of building robust systems around it.

Several commenters echoed the sentiment that S3's simplicity is deceptive. While the basic operations appear straightforward, building production-ready systems requires grappling with eventual consistency, data integrity guarantees, and performance optimization. One commenter highlighted the challenges of "exactly-once" semantics and the intricacies of handling failures during multipart uploads. Another pointed out the hidden costs associated with things like data retrieval and egress fees, which can become significant at scale.

The discussion also touched on the trade-offs between S3's simplicity and the more complex features offered by other storage solutions. One commenter noted that while S3 excels at simple storage and retrieval, it lacks the robust querying capabilities of databases. This leads to situations where users need to build their own indexing and querying mechanisms on top of S3, adding complexity to the overall system. Another commenter mentioned the increasing reliance on third-party tools and services to manage and optimize S3 usage, further highlighting the hidden complexities.

One compelling thread explored the challenges of achieving strong consistency with S3. A commenter mentioned the limitations of using list operations for consistency checks and the need for careful consideration of eventual consistency when designing applications. This led to a discussion about the trade-offs between consistency and availability and the different approaches for mitigating consistency issues.

Another interesting comment thread focused on the evolution of S3 and the increasing demand for more advanced features. While acknowledging S3's strengths, commenters expressed a desire for features like native support for structured data and more sophisticated access control mechanisms. This reflects the growing complexity of data storage needs and the limitations of a purely object-based storage model.

Finally, some commenters discussed alternatives to S3, including cloud-based solutions from other providers and self-hosted object storage systems. This highlighted the competitive landscape and the ongoing innovation in the cloud storage space.

In summary, the comments on the Hacker News post reveal a nuanced perspective on S3's simplicity. While acknowledging its ease of use for basic tasks, the discussion emphasizes the hidden complexities and challenges that arise when building robust, scalable systems. The comments also highlight the evolving needs of users and the ongoing development of alternative solutions in the cloud storage ecosystem.

IO Devices and Latency

permalink

Posted: 2025-03-13 16:46:27

The blog post "IO Devices and Latency" explores the significant impact of I/O operations on overall database performance, emphasizing that optimizing queries alone isn't enough. It breaks down the various types of latency involved in storage systems, from the physical limitations of different storage media (like NVMe drives, SSDs, and HDDs) to the overhead introduced by the operating system and file system layers. The post highlights the performance benefits of using direct I/O, which bypasses the OS page cache, for predictable, low-latency access to data, particularly crucial for database workloads. It also underscores the importance of understanding the characteristics of your storage hardware and software stack to effectively minimize I/O latency and improve database performance.

The blog post "IO Devices and Latency" from PlanetScale delves into the intricacies of Input/Output operations and their profound impact on the performance of database systems, particularly within the context of PlanetScale's distributed database architecture. It emphasizes that understanding IO device characteristics and their associated latencies is crucial for optimizing database performance and minimizing query execution times.

The post begins by establishing the fundamental concept of latency as the delay incurred during an operation, specifically focusing on the latency introduced by various storage devices utilized in a database environment. It highlights the significant performance disparity between different storage mediums, ranging from in-memory stores like Redis, which exhibit extremely low latencies, to traditional hard disk drives (HDDs), known for their comparatively high latency. Solid-state drives (SSDs), positioned between these two extremes, offer a balance of performance and cost-effectiveness. The authors meticulously illustrate these latency differences with real-world measurements, showcasing the orders-of-magnitude performance gains achievable by leveraging faster storage technologies.

A core aspect explored in the post is the impact of queuing on IO latency. It elucidates how concurrent requests to a storage device can lead to queuing delays, where operations must wait in line before being serviced. This queuing effect can significantly amplify the base latency of the storage device, especially under heavy load. The authors use an analogy of customers waiting in line at a coffee shop to illustrate this concept, emphasizing how a longer queue (more concurrent requests) translates to a longer wait time (higher latency).

The post then delves into the architectural details of PlanetScale's database system, explaining how they leverage a combination of different storage technologies to optimize performance. They discuss the strategic use of Vitess, a database clustering system for horizontal scaling of MySQL, and the importance of separating compute and storage layers. This separation allows for independent scaling of each layer, adapting to varying workload demands. The authors also highlight their use of remote storage for backups and other less performance-sensitive operations, acknowledging the higher latency inherent in such solutions but emphasizing their role in overall system resilience and cost-effectiveness.

Finally, the post concludes by reiterating the significance of considering IO device characteristics when designing and operating database systems. It underscores that choosing the appropriate storage technology for a given workload is essential for achieving optimal performance and meeting service level objectives. The authors emphasize the importance of understanding the trade-offs between performance, cost, and capacity when selecting storage solutions, and how a tiered approach, combining different storage technologies, can be a highly effective strategy.

Summary of Comments ( 128 )
https://news.ycombinator.com/item?id=43355031

Hacker News users discussed the challenges of measuring and mitigating I/O latency. Some questioned the blog post's methodology, particularly its reliance on fio and the potential for misleading results due to caching effects. Others offered alternative tools and approaches for benchmarking storage performance, emphasizing the importance of real-world workloads and the limitations of synthetic tests. Several commenters shared their own experiences with storage latency issues and offered practical advice for diagnosing and resolving performance bottlenecks. A recurring theme was the complexity of the storage stack and the need to understand the interplay of various factors, including hardware, drivers, file systems, and application behavior. The discussion also touched on the trade-offs between performance, cost, and complexity when choosing storage solutions.

The Hacker News post titled "IO Devices and Latency" (linking to a PlanetScale blog post) generated a moderate amount of discussion with several insightful comments.

A recurring theme in the comments is the importance of understanding the different types of latency and how they interact. One commenter points out that the blog post focuses mainly on device latency, but that other forms of latency, such as software overhead and queueing delays, often play a larger role in overall performance. They emphasize that optimizing solely for device latency might not yield significant improvements if these other bottlenecks are not addressed.

Another commenter delves into the complexities of measuring I/O latency, highlighting the differences between average, median, and tail latency. They argue that focusing on average latency can be misleading, as it obscures the impact of occasional high-latency operations, which can significantly degrade user experience. They suggest paying closer attention to tail latency (e.g., 99th percentile) to identify and mitigate the worst-case scenarios.

Several commenters discuss the practical implications of the blog post's findings, particularly in the context of database performance. One commenter mentions the trade-offs between using faster storage devices (like NVMe SSDs) and optimizing database design to minimize I/O operations. They suggest that, while faster storage can help, efficient data modeling and indexing are often more effective for reducing overall latency.

One comment thread focuses on the nuances of different I/O scheduling algorithms and their impact on latency. Commenters discuss the pros and cons of various schedulers (e.g., noop, deadline, cfq) and how they prioritize different types of workloads. They also touch upon the importance of tuning these schedulers to match the specific characteristics of the application and hardware.

Another interesting point raised by a commenter is the impact of virtualization on I/O performance. They explain how virtualization layers can introduce additional latency and variability, especially in shared environments. They suggest carefully configuring virtual machine settings and employing techniques like passthrough or dedicated I/O devices to minimize the overhead.

Finally, a few commenters share their own experiences with optimizing I/O performance in various contexts, offering practical tips and recommendations. These anecdotes provide valuable real-world insights and complement the more theoretical discussions in other comments.

ParadeDB (YC S23) Is Hiring a Rust Database Engineer

permalink

Posted: 2025-03-07 21:01:16

ParadeDB, a YC S23 startup building a distributed, relational, NewSQL database in Rust, is hiring a Rust Database Engineer. This role involves designing and implementing core database components like query processing, transaction management, and distributed consensus. Ideal candidates have experience building database systems, are proficient in Rust, and possess a strong understanding of distributed systems concepts. They will contribute significantly to the database's architecture and development, working closely with the founding team. The position is remote and offers competitive salary and equity.

ParadeDB, a company specializing in time-series database technology and a recent graduate of Y Combinator's Summer 2023 cohort, is actively seeking a skilled Rust Database Engineer to join their expanding team. This position offers a unique opportunity to contribute to the development of a cutting-edge, high-performance time-series database built from the ground up in Rust. The ideal candidate possesses a strong proficiency in the Rust programming language, coupled with a deep understanding of database internals. Experience with storage engines, query processing, and distributed systems is highly desirable.

The successful applicant will play a pivotal role in shaping the future of ParadeDB by designing and implementing core database features, optimizing performance for demanding workloads, and ensuring the reliability and scalability of the system. This will involve working closely with a small, highly motivated team of engineers in a fast-paced startup environment. Responsibilities encompass a wide range of tasks, from contributing to the core database engine to developing new functionalities, as well as actively participating in the open-source community surrounding ParadeDB.

ParadeDB offers a comprehensive compensation package, including a competitive salary, equity options that provide ownership in the company's success, and a comprehensive benefits plan covering health, dental, and vision. Additionally, the company fosters a flexible and remote-first work culture, allowing employees to contribute from anywhere in the world. This position presents a compelling opportunity for individuals passionate about database technology to make a significant impact on a rapidly growing project at the forefront of innovation within the time-series database domain. ParadeDB is looking for someone who is not just proficient in Rust, but also deeply interested in crafting elegant and efficient solutions to complex database challenges. The role demands a proactive individual capable of independent problem-solving and eager to contribute to a collaborative and dynamic team.

Summary of Comments ( 0 )
https://news.ycombinator.com/item?id=43294602

HN commenters discuss ParadeDB's hiring post, expressing skepticism about the wisdom of choosing Rust for a database due to its complexity and potential performance overhead compared to C++. Some question the value proposition of yet another database, wondering what niche ParadeDB fills that isn't already addressed by existing solutions. Others suggest focusing on a specific problem domain rather than building a general-purpose database. There's also discussion about the startup's name and logo, with some finding them unmemorable or confusing. Finally, a few commenters offer practical advice on hiring, suggesting reaching out to university research groups or specialized job boards.

The Hacker News post titled "ParadeDB (YC S23) Is Hiring a Rust Database Engineer" linking to a ParadeDB job posting generated a modest discussion with a few interesting points raised.

One commenter questions the wisdom of choosing Rust for a database, citing complexities in memory management and garbage collection as potential performance bottlenecks. They express skepticism about Rust's suitability for this particular application, suggesting that languages like C++ might offer better performance characteristics. However, they acknowledge that Rust's strong type system could be beneficial for correctness. This comment sparks a small thread where another user counters that modern Rust makes memory management relatively straightforward and efficient, especially compared to the manual memory management required in C++. They argue that the safety and reliability benefits of Rust outweigh any potential performance trade-offs, particularly for a database where data integrity is paramount. This back-and-forth highlights a common debate in systems programming around the trade-offs between performance and safety.

Another comment focuses on the specific requirements listed in the job posting, noting the emphasis on distributed systems experience. They point out the high bar this sets for potential applicants, speculating that ParadeDB is aiming to build a complex, distributed database system. This observation provides some insight into the ambition and technical direction of ParadeDB based on the skills they are seeking.

A further comment simply expresses interest in the job posting and asks about the company's remote work policy. This reflects the common concern among Hacker News users regarding remote work options.

Finally, one commenter raises the question of why ParadeDB is choosing to build a new database rather than utilizing existing solutions. They suggest that existing, mature databases likely already address many of the problems ParadeDB is attempting to solve. This comment raises a valid point about the challenges of competing in a crowded database market and prompts reflection on what unique problem or approach ParadeDB might be bringing to the table.

While the discussion is not extensive, it touches on relevant aspects of the job posting and the broader context of database development, including language choices, distributed systems, and market competition. It offers a glimpse into the community's perception of ParadeDB's technical choices and ambitions.

12 years of Backblaze data center storage drives, visualized

permalink

Posted: 2025-02-18 19:55:33

Backblaze's 12-year hard drive failure rate analysis, visualized through interactive charts, reveals interesting trends. While drive sizes have increased significantly, failure rates haven't followed a clear pattern related to size. Different manufacturers demonstrate varying reliability, with some models showing notably higher or lower failure rates than others. The data allows exploration of failure rates over time, by manufacturer, model, and size, providing valuable insights into drive longevity for large-scale deployments. The visualization highlights the complexity of predicting drive failure and the importance of ongoing monitoring.

This comprehensive and visually engaging blog post, titled "12 Years of Backblaze Data Center Storage Drives," meticulously presents an extensive analysis of hard drive failure rates within Backblaze's data centers, spanning from April 2013 to March 2025. The analysis leverages an impressive dataset encompassing over 2.6 million drive days and covering 32 distinct drive models from various manufacturers, primarily Seagate, Western Digital, HGST, and Toshiba.

The author employs a variety of graphical representations, including line charts, bar graphs, and heatmaps, to illustrate the evolving landscape of hard drive reliability over this 12-year period. A key focus of the visualization is the Annualized Failure Rate (AFR), which is calculated for each drive model and year, providing a standardized metric for comparison. The charts depict the AFR fluctuations across different manufacturers, capacities, and drive models, revealing trends and outliers within the dataset.

The post meticulously details the methodology behind the AFR calculations, emphasizing the importance of accounting for drive lifespan and population size to avoid biases. It explains how the data is aggregated and smoothed to present clearer trends, while acknowledging the limitations inherent in analyzing such a complex dataset. The visualizations highlight which drive models have demonstrated consistently low failure rates, which models have experienced periods of elevated failures, and which have been discontinued or phased out over time.

Furthermore, the interactive nature of the visualizations allows for granular exploration. Users can filter the data by manufacturer, capacity, or drive model, enabling them to focus on specific subsets of the data and gain deeper insights into the performance of particular drives. This level of interactivity allows for customized analysis based on individual interests and requirements. The author concludes by providing contextual information about Backblaze's data center environment and operational practices, offering further nuance to the interpretation of the presented data. The post serves as a valuable resource for anyone interested in understanding the long-term reliability trends of various hard drive models in a real-world production environment.

Summary of Comments ( 41 )
https://news.ycombinator.com/item?id=43094241

Hacker News users discussed the methodology and presentation of the Backblaze data drive statistics. Several commenters questioned the lack of confidence intervals or error bars, making it difficult to draw meaningful conclusions about drive reliability, especially regarding less common models. Others pointed out the potential for selection bias due to Backblaze's specific usage patterns and purchasing decisions. Some suggested alternative visualizations, like Kaplan-Meier survival curves, would be more informative. A few commenters praised the long-term data collection and its value for the community, while also acknowledging its limitations. The visualization itself was generally well-received, with some suggestions for improvements like interactive filtering.

The Hacker News post titled "12 years of Backblaze data center storage drives, visualized" generated a fair number of comments discussing various aspects of Backblaze's drive statistics and data presentation.

Several commenters focused on the visualization itself. Some praised its clarity and the ability to easily compare drive models and failure rates over time. Others suggested improvements, like logarithmic scales for better visualizing failure rates across different orders of magnitude, or different groupings and filtering options to further analyze the data. One commenter specifically wished for a way to see the correlation between drive age and failure rate independent of model.

A significant portion of the discussion revolved around the reliability of different drive manufacturers and models, with commenters sharing their own experiences and comparing them to Backblaze's data. Some pointed out the apparent good performance of HGST drives, while others noted the variability within specific Seagate models. The complexities of interpreting annualized failure rates were also discussed, with some commenters emphasizing the importance of considering drive age and usage patterns. One commenter even offered a detailed explanation of how Backblaze calculates their annualized failure rates.

Several commenters delved into the technical aspects of drive technology, such as Shingled Magnetic Recording (SMR) and its potential impact on reliability. The discussion touched on the challenges of extrapolating consumer-grade drive reliability to data center environments and the different workloads and usage patterns in each.

Some commenters also discussed the business implications of Backblaze's data, including how it might influence purchasing decisions for individuals and businesses. The topic of data recovery and backup strategies also emerged, with some commenters sharing their preferred methods and tools.

A few commenters expressed interest in the raw data and wished for Backblaze to make it publicly available for further analysis and exploration. Others speculated on the reasons behind certain trends in the data, such as the observed increase in drive sizes over time.

Finally, a handful of commenters mentioned other resources and tools for monitoring drive health and predicting failures, offering alternative perspectives on the topic of drive reliability.

Catalytic computing taps the full power of a full hard drive

permalink

Posted: 2025-02-18 16:08:20

Catalytic computing, a new theoretical framework, aims to overcome the limitations of traditional computing by leveraging the entire storage capacity of a device, such as a hard drive, for computation. Instead of relying on limited working memory, catalytic computing treats the entire memory system as a catalyst, allowing data to transform itself through local interactions within the storage itself. This approach, inspired by chemical catalysts, could drastically expand the complexity and scale of computations possible, potentially enabling the efficient processing of massive datasets that are currently intractable for conventional computers. While still theoretical, catalytic computing represents a fundamental shift in thinking about computation, promising to unlock the untapped potential of existing hardware.

This Quanta Magazine article delves into the groundbreaking concept of "catalytic computing," a novel approach to computation that promises to revolutionize how we utilize memory-intensive systems. Traditional computing architectures face a bottleneck when dealing with massive datasets, often requiring complex data shuffling between storage (like a hard drive) and active memory (like RAM). This back-and-forth movement significantly hinders processing speed and efficiency, especially when the dataset size eclipses the available RAM capacity. Catalytic computing elegantly sidesteps this limitation by allowing computations to occur directly within the storage medium itself, effectively transforming the entire hard drive into a processing unit.

The article uses the analogy of a chemical catalyst to explain the principle. Just as a catalyst facilitates a chemical reaction without being consumed itself, in catalytic computing, a small amount of active memory acts as a "catalyst" to trigger and guide computations within the vast expanse of data stored on the hard drive. Instead of transferring large chunks of data to RAM, the catalyst delivers small, targeted instructions or "seeds" to the storage device. These seeds initiate localized computations, processing data in-situ and generating partial results. These intermediate outputs can then be combined or further processed, dramatically reducing the need for extensive data movement and unlocking the full processing potential of the entire storage capacity.

The core of catalytic computing lies in leveraging the inherent parallelism within storage devices. Modern hard drives and solid-state drives possess internal processing capabilities that are typically underutilized. By distributing the computational workload across the storage medium, catalytic computing exploits this inherent parallelism, performing calculations concurrently across multiple locations on the drive. This distributed processing paradigm drastically accelerates computation speed, particularly for tasks involving large datasets, such as searching, sorting, and analyzing complex data structures.

The article highlights the potential transformative impact of catalytic computing on various fields, including artificial intelligence, big data analytics, and scientific simulations. By eliminating the memory bottleneck, this new computational paradigm could pave the way for significantly faster and more efficient processing of massive datasets, enabling breakthroughs in areas like drug discovery, climate modeling, and personalized medicine. The development of catalytic computing is still in its early stages, with researchers actively exploring different implementation strategies and hardware designs. However, the potential benefits of this revolutionary approach are substantial, promising to reshape the landscape of computing and unlock new frontiers in data processing and analysis. While challenges remain in optimizing the interaction between the catalyst and the storage device, and in developing specialized programming models for catalytic computing, the promise of harnessing the full power of a hard drive as a computational resource represents a significant leap forward in computational efficiency and capability.

Summary of Comments ( 15 )
https://news.ycombinator.com/item?id=43091159

Hacker News users discussed the potential and limitations of catalytic computing. Some expressed skepticism about the practicality and scalability of the approach, questioning the overhead and energy costs involved in repeatedly reading and writing data. Others highlighted the potential benefits, particularly for applications involving massive datasets that don't fit in RAM, drawing parallels to memory mapping and virtual memory. Several commenters pointed out that the concept isn't entirely new, referencing existing techniques like using SSDs as swap space or leveraging database indexing. The discussion also touched upon the specific use cases where catalytic computing might be advantageous, like bioinformatics and large language models, while acknowledging the need for further research and development to overcome current limitations. A few commenters also delved into the theoretical underpinnings of the concept, comparing it to other computational models.

The Hacker News thread discussing the Quanta Magazine article "Catalytic computing taps the full power of a full hard drive" contains several interesting comments exploring the potential and limitations of the proposed catalytic computing paradigm.

Several commenters express excitement about the potential of catalytic computing to revolutionize data processing by enabling the use of all data stored on a hard drive simultaneously. They see this as a potential game-changer for fields dealing with massive datasets, like genomics and machine learning. The analogy to chemical reactions, where a catalyst facilitates a process without being consumed, is seen as a compelling and potentially fruitful way to rethink computation.

Some commenters delve into the technical aspects of the proposed system. One commenter questions the practical feasibility of achieving simultaneous access to all data on a hard drive, pointing out physical limitations like read/write head speed and data bus bandwidth. This leads to a discussion about the possible need for novel hardware architectures and data storage mechanisms to truly realize the vision of catalytic computing. Another comment explores the potential connection between catalytic computing and existing concepts like in-memory computing and distributed systems, suggesting that catalytic computing might represent a novel combination or extension of these ideas.

A few commenters express skepticism about the scalability and practicality of the proposed approach. They raise concerns about the potential energy consumption of such a system, particularly if it involves simultaneous access to all data on a large hard drive. The potential for noise and interference in a system with so many simultaneous operations is also mentioned as a potential challenge.

There's also a discussion about the potential applications of catalytic computing beyond the examples mentioned in the article. One commenter suggests its potential use in cryptography, particularly for breaking current encryption methods. Another commenter speculates on its application in areas like artificial intelligence and drug discovery.

Finally, some commenters express a desire for more technical details about the proposed catalytic computing system. They request more information about the specific mechanisms for data access, the nature of the "catalysts," and the expected performance characteristics of such a system. They suggest that a deeper understanding of these technical details is essential for assessing the true potential and limitations of catalytic computing.

Backblaze Drive Stats for 2024

permalink

Posted: 2025-02-11 14:55:45

Backblaze's 2024 hard drive stats reveal a continued decline in annualized failure rates (AFR) across most drive models. The overall AFR for 2024 was 0.83%, the lowest ever recorded by Backblaze. Larger capacity drives, particularly 16TB and larger, demonstrated remarkably low failure rates, with some models exhibiting AFRs below 0.5%. While some older drives experienced higher failure rates as expected, the data suggests increasing drive reliability overall. Seagate drives dominated Backblaze's data centers, comprising the majority of drives and continuing to perform reliably. The report highlights the ongoing trend of larger drives becoming more dependable, contributing to the overall improvement in data storage reliability.

Backblaze's 2024 hard drive statistics report, covering Q1, provides a detailed look into the reliability of various hard drive models used in their data centers. The report encompasses data from a staggering 235,960 spinning hard drives, totaling over 2.8 exabytes of storage. While this vast collection includes boot drives and drives in storage pods undergoing decommissioning, the primary focus for failure rate analysis is the 222,341 data drives actively storing customer data.

A significant highlight of the report is the introduction of a new metric: the "Annualized Failure Rate (AFR) curve." This curve goes beyond the traditional annualized failure rate calculation, which provides a single snapshot in time, to showcase how drive failure rates evolve over their lifespan. The AFR curve offers valuable insights into the "bathtub curve" of drive reliability, visualizing the higher failure rates during the early "infant mortality" phase, the lower, more stable rates during the operational life, and the eventual increase in failures as drives approach end-of-life or experience "wear-out."

The report dives deep into the performance of various drive manufacturers and models, with specific focus on the high-capacity 16TB, 18TB, and 22TB drives, reflecting the increasing demand for larger storage solutions. Detailed tables showcase each model's population, total drive days, drive failures, and the calculated annualized failure rate, allowing for direct comparison across manufacturers and capacities. The report observes trends in failure rates, noting that Seagate drives exhibit slightly higher AFRs than Western Digital models in certain capacity segments. However, Backblaze emphasizes the overall decline in average failure rates across most drive capacities compared to previous years, indicating improved reliability in modern hard drive technology.

Beyond the AFR data, the report also delves into the lifetime AFR for different drive models, offering a broader perspective on long-term reliability. This includes an overview of models that have reached their end-of-life and been retired from service, providing a complete picture of their performance throughout their operational lifespan. The report underscores the importance of continuous monitoring and analysis for optimizing drive selection and proactively mitigating potential storage failures. Backblaze concludes by acknowledging the complexities involved in interpreting drive statistics and the need for considering multiple factors, including manufacturer, model, capacity, workload, and environmental conditions when assessing drive reliability. They also reiterate their commitment to transparency and sharing these detailed statistics with the wider community, fostering informed decision-making for individuals and organizations alike.

Summary of Comments ( 101 )
https://news.ycombinator.com/item?id=43013431

Hacker News users discuss Backblaze's 2024 drive stats, focusing on the high failure rates of WDC drives, especially the 16TB and 18TB models. Several commenters question Backblaze's methodology and data interpretation, suggesting their usage case (consumer drives in enterprise settings) skews the results. Others point out the difficulty in comparing different drive models directly due to varying usage and deployment periods. Some highlight the overall decline in drive reliability and express concerns about the industry trend of increasing capacity at the expense of longevity. The discussion also touches on SMART stats, RMA processes, and the potential impact of SMR technology. A few users share their personal experiences with different drive brands, offering anecdotal evidence that contradicts or supports Backblaze's findings.

The Hacker News post titled "Backblaze Drive Stats for 2024" has generated several comments discussing the linked Backblaze report on hard drive reliability. Many of the comments focus on the continued dominance of HGST drives in terms of reliability, with users expressing their preference for these drives based on Backblaze's consistent findings over the years.

Several commenters discuss the surprising resilience of older drives. Some note that the high failure rates of newer drives, particularly larger capacity models, is concerning. This leads to speculation about potential contributing factors, such as manufacturing processes, component quality, or even increased susceptibility to external factors. The observation that larger drives often have higher failure rates sparks a discussion around the balance between capacity and reliability.

The methodology used by Backblaze is also a topic of conversation. Some users acknowledge the limitations of the data, noting that Backblaze's usage case (primarily in data centers) may not reflect typical consumer usage. Despite this, the data is still considered valuable for providing general insights into drive reliability trends.

Another recurring theme in the comments is the trade-off between cost and reliability. While HGST drives are generally praised for their reliability, their higher price point is also acknowledged. Some users suggest that the lower cost of other drives, even with slightly higher failure rates, might represent a better value proposition depending on the specific use case.

A few commenters mention their personal experiences with different drive brands, often corroborating or contrasting with Backblaze's findings. These anecdotal accounts add another layer to the discussion, providing real-world context to the statistical data.

Finally, there's a brief exchange about the implications of these statistics for different storage strategies, including RAID configurations and cloud backups. Some users emphasize the importance of redundancy regardless of drive brand, highlighting that any hard drive can fail eventually.

Hard disk fraud: long runtimes on new Seagate hard disks

permalink

Posted: 2025-02-09 14:50:11

Reports are surfacing about new Seagate hard drives, predominantly sold through Chinese online marketplaces, exhibiting suspiciously long power-on hours and high usage statistics despite being advertised as new. This suggests potential fraud, where used or refurbished drives are being repackaged and sold as new. While Seagate has acknowledged the issue and is investigating, the extent of the problem remains unclear, with speculation that the drives might originate from cryptocurrency mining operations or other data centers. Buyers are urged to check SMART data upon receiving new Seagate drives to verify their actual usage.

This Heise online article reports on a growing concern surrounding potentially fraudulent Seagate hard drives, specifically focusing on drives exhibiting unusually long runtime counts despite being advertised as new. These drives, predominantly purchased through online marketplaces like AliExpress and eBay, are raising red flags among buyers who are discovering SMART data indicating significantly higher operating hours than expected for a brand-new product. This discrepancy suggests that the drives may not be new at all, but rather refurbished or used drives being deceptively marketed as new.

The article highlights several key aspects of the alleged fraud. Firstly, it points out the widespread nature of the issue, with numerous reports emerging from different users experiencing the same problem. This suggests a systematic issue rather than isolated incidents. Secondly, it underscores the financial implications of this deceptive practice. Consumers are paying for new, premium-priced hard drives but receiving used products with potentially diminished lifespans and reliability. This represents a clear violation of consumer trust and a potential breach of sales contracts.

The article delves into the possible origins of these fraudulent drives, pointing towards evidence suggesting a connection to China. While not definitively pinpointing the source, the prevalence of affected drives originating from or transiting through Chinese distribution channels strengthens this hypothesis. The article mentions a potential scenario where used server hard drives are being relabelled and sold as new consumer drives. This is particularly concerning given the heavy workloads server drives typically endure, leading to increased wear and tear compared to drives intended for consumer use.

Further contributing to the complexity of the issue is the difficulty in definitively proving fraud. While the unusually high runtime counts raise strong suspicions, it remains challenging to unequivocally demonstrate malicious intent. This ambiguity complicates efforts to hold sellers accountable and obtain refunds or replacements. The article emphasizes the importance of vigilance when purchasing hard drives, particularly from online marketplaces. It advises consumers to be wary of suspiciously low prices and to utilize tools to check SMART data upon receiving a new drive to verify its actual usage history.

Finally, the article mentions the potential long-term consequences of this alleged fraud, both for consumers and the storage industry as a whole. Eroding consumer trust could lead to decreased sales and damage the reputation of reputable hard drive manufacturers. The article implicitly calls for increased scrutiny of supply chains and stricter measures to combat fraudulent practices in the online marketplace.

Summary of Comments ( 2 )
https://news.ycombinator.com/item?id=42991006

Hacker News users discuss potential explanations for unexpectedly high reported runtime hours on seemingly new Seagate hard drives. Some suggest these drives are refurbished units falsely marketed as new, with inflated SMART data to disguise their prior use. Others propose the issue stems from quality control problems leading to extended testing periods at the factory, or even the use of drives in cryptocurrency mining operations before being sold as new. Several users share personal anecdotes of encountering similar issues with Seagate drives, reinforcing suspicion about the company's practices. Skepticism also arises about the reliability of SMART data as an indicator of true drive usage, with some arguing it can be manipulated. Some users suggest buying hard drives from more reputable retailers or considering alternative brands to avoid potential issues.

The Hacker News comments section for the submitted article "Hard disk fraud: long runtimes on new Seagate hard disks" contains a lively discussion revolving around the potential fraud, its implications, and personal experiences with Seagate drives.

Several commenters express skepticism about the "fraud" claim. One prominent argument is that the observed high runtime hours could be attributed to Seagate repurposing returned or refurbished drives without properly resetting the SMART data. This theory suggests that the drives aren't necessarily counterfeit, but rather used drives being sold as new, which is still deceptive but different from outright counterfeiting. This explanation resonates with several other users who have observed similar discrepancies in SMART data.

Another commenter questions the methodology of the investigation, pointing out the small sample size and the lack of clarity about the source of the drives. They suggest that the drives might have been purchased from less reputable resellers who could be tampering with them rather than Seagate directly. This raises the concern that the article might be prematurely accusing Seagate without sufficient evidence.

Some users share their own experiences with Seagate drives, with mixed results. While some report having had numerous Seagate drives fail prematurely, others state they've had long-lasting and reliable experiences with the brand. This anecdotal evidence highlights the complexity of the issue, suggesting that while there might be a problem with some Seagate drives, it's not necessarily a widespread or systematic issue affecting all their products.

One commenter draws a parallel to the practice of "mining" with GPUs, where the cards are subjected to heavy workloads for extended periods, impacting their lifespan and potentially being resold as new. They suggest a similar scenario could be occurring with hard drives, where they might be used for tasks like cryptocurrency mining or other intensive operations before being repackaged and sold as new.

The discussion also touches on the legal and consumer protection aspects of selling used drives as new. Commenters discuss the difficulties of proving such fraud and the importance of purchasing from reputable vendors. Some suggest checking SMART data upon receiving a new drive and returning it immediately if discrepancies are found.

Finally, a few comments offer technical explanations for how SMART data can be manipulated or misinterpreted, further muddying the waters and adding complexity to the already multifaceted discussion. This reinforces the notion that while the high runtime hours are undoubtedly suspicious, definitively concluding fraud requires further investigation and more robust evidence. The overall sentiment seems to be one of cautious skepticism towards the "fraud" claim, with many users advocating for further investigation and due diligence when purchasing hard drives.

Seagate: 'new' hard drives used for tens of thousands of hours

permalink

Posted: 2025-01-29 13:42:03

German consumers are reporting that Seagate hard drives advertised and sold as new were actually refurbished drives with heavy prior usage. Some drives reportedly logged tens of thousands of power-on hours and possessed SMART data indicating significant wear, including reallocated sectors and high spin-retry counts. This affects several models, including IronWolf and Exos enterprise-grade drives purchased through various retailers. While Seagate has initiated replacements for some affected customers, the extent of the issue and the company's official response remain unclear. Concerns persist regarding the potential for widespread resale of used drives as new, raising questions about Seagate's quality control and refurbishment practices.

Summary of Comments ( 164 )
https://news.ycombinator.com/item?id=42864788

Hacker News commenters express skepticism and concern over the report of Seagate allegedly selling used hard drives as new in Germany. Several users doubt the veracity of the claims, suggesting the reported drive hours could be a SMART reporting error or a misunderstanding. Others point out the potential for refurbished drives to be sold unknowingly, highlighting the difficulty in distinguishing between genuinely new and refurbished drives. Some commenters call for more evidence, suggesting analysis of the drive's physical condition or firmware versions. A few users share anecdotes of similar experiences with Seagate drives failing prematurely. The overall sentiment is one of caution towards Seagate, with some users recommending alternative brands.

The Hacker News post "Seagate: 'new' hard drives used for tens of thousands of hours" has generated a significant discussion with a variety of comments. Many users express skepticism and concern about Seagate's quality control and business practices.

Several commenters share personal anecdotes of Seagate hard drive failures, reinforcing the negative perception of the brand. Some suggest that Seagate might be repackaging and reselling returned or refurbished drives as new, potentially without adequately disclosing this to consumers. This practice is viewed as deceptive and raises concerns about the true lifespan and reliability of these "new" drives.

A few commenters propose that the issue might be related to a specific batch or retailer, rather than a widespread problem across all Seagate products. They suggest that the drives in question might have been used for testing or burn-in procedures before being repackaged and sold. However, this explanation is met with skepticism by others, who argue that such extensive usage (tens of thousands of hours) is unusual even for testing purposes.

Some users discuss the importance of checking SMART data (Self-Monitoring, Analysis and Reporting Technology) upon receiving a new hard drive. This data can reveal the drive's usage history, including power-on hours and error counts, allowing buyers to identify potentially problematic drives. Several commenters share tools and techniques for accessing and interpreting SMART data.

A few commenters mention alternative hard drive brands, such as Western Digital, and suggest that consumers consider these options due to the perceived reliability issues with Seagate. However, others point out that all hard drive manufacturers can have occasional failures and that brand loyalty is not always a reliable indicator of quality.

There is also a discussion about the legal and ethical implications of selling used hard drives as new. Some users argue that this practice constitutes fraud and that consumers should be entitled to refunds or replacements. Others discuss the difficulty of proving that a drive was previously used, especially if the SMART data has been reset or modified.

Finally, some commenters offer practical advice for mitigating the risk of hard drive failure, such as regularly backing up data and using RAID configurations for redundancy. They emphasize the importance of data security and the potential consequences of relying on a single hard drive for critical information.

Working with Files Is Hard (2019)

permalink

Posted: 2025-01-23 16:28:34

Dan Luu's "Working with Files Is Hard" explores the surprising complexity of file I/O. While seemingly simple, file operations are fraught with subtle difficulties stemming from the interplay of operating systems, filesystems, programming languages, and hardware. The post dissects various common pitfalls, including partial writes, renaming and moving files across devices, unexpected caching behaviors, and the challenges of ensuring data integrity in the face of interruptions. Ultimately, the article highlights the importance of understanding these complexities and employing robust strategies, such as atomic operations and careful error handling, to build reliable file-handling code.

Dan Luu's 2019 blog post, "Working with Files Is Hard," delves into the complexities and often-overlooked challenges inherent in file system interactions, arguing that the seemingly simple act of reading and writing files is fraught with significantly more intricacy than most programmers realize. He begins by highlighting the deceptive simplicity of basic file operations, noting how straightforward examples in introductory programming courses can lead to a false sense of security about the robustness of these actions. This initial simplicity, he contends, masks a plethora of potential pitfalls and edge cases that can arise in real-world scenarios.

Luu meticulously dissects several layers of abstraction that contribute to the difficulty of working with files reliably. He examines the operating system's role in mediating file access, explaining how system calls, buffering, and caching mechanisms introduce complexities that can lead to unexpected behavior, especially when dealing with concurrent access or system failures. He further explores the variations in file system implementations across different operating systems, emphasizing the lack of a universally consistent behavior and the challenges posed by platform-specific quirks. This platform dependence, he argues, necessitates careful consideration and testing when developing cross-platform applications that interact with the file system.

The post further explores the intricate details of file formats and encoding schemes, highlighting the potential for data corruption or misinterpretation if these aspects are not handled meticulously. Luu underscores the importance of understanding the specific nuances of different file formats and the need for robust error handling to prevent data loss or application crashes. He also touches upon the complexities of dealing with metadata, such as file permissions and timestamps, emphasizing their significance for security and data integrity.

Beyond the technical intricacies of file systems and formats, Luu delves into the human element of file management. He discusses the challenges of naming files consistently and meaningfully, noting the potential for confusion and ambiguity when dealing with large numbers of files or collaborative projects. He emphasizes the importance of establishing clear conventions and employing appropriate tools for organizing and managing files effectively.

Finally, Luu advocates for a more cautious and deliberate approach to file handling in software development. He encourages programmers to move beyond the simplistic view presented in introductory tutorials and develop a deeper understanding of the underlying mechanisms and potential pitfalls. He recommends employing robust error handling strategies, thoroughly testing file operations across different platforms and scenarios, and utilizing appropriate libraries or tools to abstract away some of the complexities. By acknowledging the inherent difficulties of working with files and adopting a more sophisticated approach, developers can build more reliable and resilient software systems.

Summary of Comments ( 15 )
https://news.ycombinator.com/item?id=42805425

HN commenters largely agree with the premise that file handling is surprisingly complex. Many shared anecdotes reinforcing the difficulties encountered with different file systems, character encodings, and path manipulation. Some highlighted the problems of hidden characters causing issues, the challenges of cross-platform compatibility (especially Windows vs. *nix), and the subtle bugs that can arise from incorrect assumptions about file sizes or atomicity. A few pointed out the relative simplicity of dealing with files in Plan 9, and others mentioned more modern approaches like using memory-mapped files or higher-level libraries to abstract away some of the complexity. The lack of libraries to handle text files reliably across platforms was a recurring theme. A top comment emphasizes how corner cases, like filenames containing newlines or other special characters, are often overlooked until they cause real-world problems.

The Hacker News post "Working with Files Is Hard (2019)" linking to Dan Luu's blog post of the same name has a moderately active comment section with a variety of perspectives on the challenges of file I/O.

Several commenters agree with the premise of the article, sharing their own anecdotes of difficulties encountered when dealing with files. One commenter highlights the unexpected complexity that arises from seemingly simple operations like moving or copying files, particularly across different filesystems or operating systems. They point out that subtle differences in how these operations are implemented can lead to data loss or corruption if not carefully considered. Another echoes this sentiment, emphasizing the numerous edge cases that developers often overlook, such as handling different character encodings, file permissions, and the potential for partial writes or reads due to interruptions.

The discussion also touches upon the complexities introduced by network filesystems, with one user detailing the issues they've faced with NFS and its sometimes unpredictable behavior concerning file locking and consistency guarantees. The lack of atomicity in many file operations is also brought up as a major pain point, with commenters suggesting that higher-level abstractions or libraries could help mitigate some of these risks.

Some commenters offer practical advice and solutions. One suggests using robust libraries that handle many of these edge cases automatically, while another proposes employing techniques like checksumming and versioning to ensure data integrity. The use of dedicated tools for specific file manipulation tasks is also mentioned as a way to avoid common pitfalls.

A few commenters express a slightly different viewpoint, arguing that while file I/O certainly has its complexities, many of the issues highlighted in the article and comments are not unique to files and can be encountered in other areas of programming as well. They suggest that a solid understanding of operating system principles and careful attention to detail are crucial for avoiding these types of problems regardless of the specific context.

One commenter questions the focus on low-level file operations, suggesting that in many modern applications, developers rarely interact directly with files at this level and instead rely on higher-level abstractions provided by frameworks and libraries. However, this prompts a counter-argument that understanding the underlying mechanisms is still important for debugging and performance optimization.

Finally, a couple of commenters offer additional resources and links to related articles and tools that they believe are helpful for dealing with file I/O challenges. Overall, the comment section provides a valuable discussion around the nuances of working with files, acknowledging the difficulties involved while also offering practical advice and different perspectives on how to address them.

File Systems: The Original Hypermedia

permalink

Posted: 2025-01-21 00:16:25

The blog post argues that file systems, particularly hierarchical ones, are a form of hypermedia that predates the web. It highlights how directories act like web pages, containing links (files and subdirectories) that can lead to other content or executable programs. This linking structure, combined with metadata like file types and modification dates, allows for navigation and information retrieval similar to browsing the web. The post further suggests that the web's hypermedia capabilities essentially replicate and expand upon the fundamental principles already present in file systems, emphasizing a deeper connection between these two technologies than commonly recognized.

Jon Udell's blog post, "File Systems: The Original Hypermedia," posits that the fundamental principles of hypermedia, often associated with the World Wide Web, actually predate the web and are deeply rooted in the design and functionality of file systems. He argues that the hierarchical structure of directories and the ability to link files together, even across different directories or devices, constitute a foundational form of hypermedia.

Udell elaborates on this concept by drawing parallels between file system operations and web interactions. He highlights how navigating a file system through directory traversal mirrors browsing the web by following links. Just as clicking a link on a webpage transports the user to a different location, opening a file within a file system "jumps" the user to the content of that file. He emphasizes that files, like web pages, can contain various forms of media, including text, images, and executable code, and the act of opening a file can be viewed as activating or "rendering" that media, similar to how a web browser renders a webpage.

Furthermore, the post explores the notion of links within a file system. Symbolic links, specifically, are presented as analogous to hyperlinks on the web, allowing for indirect access to files regardless of their physical location. This indirection allows for the creation of complex relationships between files and fosters a non-linear navigation paradigm, a core characteristic of hypermedia systems. He notes that while symbolic links offer a direct form of linking, even the act of embedding a file path within a document can be considered a rudimentary form of linking, akin to embedding a URL within a webpage.

Udell underscores the importance of recognizing the inherent hypermedia capabilities of file systems, suggesting that this understanding can inform the development and evolution of future hypermedia systems. He proposes that the robustness and maturity of file systems, honed over decades of use, offer valuable lessons for the design of web-based and other hypermedia platforms. The post concludes by suggesting that the simplicity and power of the file system as a hypermedia platform should not be overlooked, and that it can serve as both a practical tool and a conceptual model for exploring the potential of hypermedia.

Summary of Comments ( 17 )
https://news.ycombinator.com/item?id=42774758

Hacker News users largely praised the article for its clear explanation of file systems as a foundational hypermedia system. Several commenters highlighted the elegance and simplicity of this concept, often overlooked in the modern web's complexity. Some discussed the potential of leveraging file system principles for improved web experiences, like decentralized systems or simpler content management. A few pointed out limitations, such as the lack of inherent versioning in basic file systems and the challenges of metadata handling. The discussion also touched on related concepts like Plan 9 and the semantic web, contrasting their approaches to linking and information organization with the basic file system model. Several users reminisced about early computing experiences and the directness of navigating files and folders, suggesting a potential return to such simplicity.

The Hacker News post "File Systems: The Original Hypermedia" discussing Jon Gjengset's blog post sparked a lively discussion with a variety of perspectives on the relationship between filesystems and hypermedia.

Several commenters agreed with the premise, highlighting the fundamental similarities. One user pointed out that filesystems and the web share core concepts like links, directories acting as indices, and the ability to move between different "sites" (different parts of the filesystem or different websites). They further elaborated on how tools like symbolic links mirror web links and how both systems allow for non-linear navigation. Another commenter mentioned how early web servers directly exposed the filesystem, blurring the lines further. This user reminisced about early personal web pages residing directly within their public HTML folder.

Some commenters discussed the advantages of the filesystem's simplicity and power. One noted that the filesystem predates hypermedia and already incorporates many of its concepts, highlighting the robust tooling built around manipulating filesystem data compared to hypermedia. This commenter also mentioned that MIME types, while a web concept, actually enhance filesystem functionality by associating data with applications.

Others focused on differences and limitations of the analogy. One pointed out that while filesystems allow for links, they lack a standardized way to embed metadata and display formatted content within the structure itself, a core aspect of hypermedia. Another emphasized that web links are fundamentally bidirectional, as websites can see who links to them (through backlinks and referrer headers), while filesystems typically lack this backlink capability. This lack of backlink information in filesystems prevents things like global search based on connections, something inherent in the web’s structure.

The discussion also touched upon the evolution of both systems. One commenter suggested that Plan 9 from Bell Labs took the filesystem-as-hypermedia concept further than traditional operating systems, integrating it deeper into the OS architecture. Another pointed out the shift in web development towards client-side rendering and APIs, moving away from direct filesystem exposure and consequently diminishing the original connection.

Finally, some comments drifted towards related concepts. One commenter discussed the distinction between the web and the internet, with the latter being the physical infrastructure and the former being the hypermedia system built on top. They pondered the lack of a single, unified global filesystem, suggesting technical and social challenges as reasons for its absence.

In summary, the comments on Hacker News explored the nuances of the filesystem-as-hypermedia analogy, acknowledging the similarities while also recognizing crucial distinctions in structure, functionality, and evolution. The discussion reflected an appreciation for the simplicity and power of the filesystem while also recognizing the unique capabilities of the web as a hypermedia system.

Migrating Away from Bcachefs

permalink

Posted: 2025-01-20 21:29:54

The author migrated away from Bcachefs due to persistent performance issues and instability despite extensive troubleshooting. While initially impressed with Bcachefs's features, they experienced slowdowns, freezes, and data corruption, especially under memory pressure. Attempts to identify and fix the problems through kernel debugging and communication with the developers were unsuccessful, leaving the author with no choice but to switch back to ZFS. Although acknowledging Bcachefs's potential, the author concludes it's not currently production-ready for their workload.

Kent Sesse, the author, details their experience migrating away from the experimental Bcachefs filesystem after encountering several issues that ultimately made it untenable for their production server. Initially drawn to Bcachefs due to its appealing features such as copy-on-write, compression, and checksumming, Sesse hoped it would offer performance improvements and data integrity benefits over their existing ext4 setup.

The migration process itself was described as relatively straightforward, involving creating a Bcachefs filesystem on a new partition and using rsync to copy data from the existing ext4 filesystem. However, problems arose soon after. Performance, contrary to expectations, was noticeably worse than with ext4, particularly in random read/write scenarios. This performance degradation became especially pronounced when the SSD caching layer filled up, leading to significant slowdowns.

The author further experienced disconcerting stability issues. The blog post recounts two instances of silent data corruption, where files became unreadable despite Bcachefs's built-in checksumming feature. This loss of data, although seemingly minor in terms of the files affected, eroded Sesse's trust in the filesystem's integrity. Additionally, the blog post mentions an incident involving a kernel panic directly attributable to Bcachefs. This kernel panic, along with a documented history of Bcachefs-related kernel crashes, contributed further to the decision to migrate away.

The final straw, leading to the immediate decision to switch back, was a catastrophic failure where the Bcachefs filesystem became completely corrupted and unmountable. This incident required Sesse to restore from a backup, highlighting the risks associated with using an experimental filesystem in a production environment.

Ultimately, despite the initial promise and the author's acknowledgment of Bcachefs's potential, the combination of poor performance, data corruption, kernel panics, and the catastrophic failure led to the decision to abandon Bcachefs and revert to a more stable and reliable ext4 filesystem. The author concludes by expressing disappointment but also understanding that Bcachefs is still under development and might not be suitable for production use at its current stage. They maintain some hope for the future of Bcachefs, suggesting they might reconsider it once it reaches a more mature state.

Summary of Comments ( 16 )
https://news.ycombinator.com/item?id=42773296

HN commenters generally express disappointment with Bcachefs's lack of mainline inclusion in the kernel, viewing it as a significant barrier to adoption and a potential sign of deeper issues. Some suggest the lengthy development process and stalled upstreaming might indicate fundamental flaws or maintainability problems within the filesystem itself. Several commenters express a preference for established filesystems like ZFS and btrfs, despite their own imperfections, due to their maturity and broader community support. Others question the wisdom of investing time in a filesystem unlikely to become a standard, citing concerns about future development and maintenance. While acknowledging Bcachefs's technically intriguing features, the consensus leans toward caution and skepticism about its long-term viability. A few offer more neutral perspectives, suggesting the author's experience might not be universally applicable and hoping for the project's eventual success.

The Hacker News post "Migrating Away from Bcachefs" (https://news.ycombinator.com/item?id=42773296) has a moderate number of comments discussing the author's experience with and decision to migrate from the Bcachefs filesystem. Many of the comments revolve around the perceived complexities and lack of mainstream adoption of Bcachefs, as well as alternative filesystems.

Several commenters expressed sympathy for the author's frustrations. One commenter mentioned their own difficulties compiling Bcachefs, highlighting the non-trivial process required to get it running. This sentiment was echoed by others, reinforcing the idea that Bcachefs isn't as user-friendly as more established filesystems. The complexities surrounding its checksumming features and lack of clear documentation were also mentioned as contributing factors to its perceived difficulty.

The discussion also touched upon the trade-offs between features and stability. Some commenters questioned the value of Bcachefs's advanced features, especially considering the potential for data loss or corruption that the author experienced. The inherent risks associated with using a less mature filesystem were weighed against the potential benefits, with some arguing that the stability of established solutions like ZFS or btrfs might be preferable for production environments.

A few commenters offered alternative filesystem suggestions, with ZFS and btrfs being the most frequently mentioned. The relative merits of each were debated, with some emphasizing ZFS's maturity and robustness, while others pointed to btrfs's lighter weight and integration with the Linux kernel. One commenter specifically mentioned XFS as another alternative, praising its stability and established track record.

The author themselves participated in the comments section, responding to questions and clarifying some points raised in the blog post. They acknowledged the experimental nature of Bcachefs and explained their rationale for choosing it initially. They also clarified the specific issues they encountered, which involved checksum mismatches and potential data corruption.

The overall tone of the comments is one of cautious curiosity about Bcachefs. While acknowledging its potential, many commenters expressed reservations about its complexity and lack of maturity. The discussion highlights the challenges faced by newer filesystems in gaining widespread adoption, especially when competing against well-established alternatives. The author's experience serves as a cautionary tale for those considering using Bcachefs in production environments, emphasizing the importance of thoroughly understanding the risks involved.

Home Loss File System

permalink

Posted: 2025-01-14 17:54:51

This spreadsheet documents a personal file system designed to mitigate data loss at home. It outlines a tiered backup strategy using various methods and media, including cloud storage (Google Drive, Backblaze), local network drives (NAS), and external hard drives. The system emphasizes redundancy by storing multiple copies of important data in different locations, and incorporates a structured approach to file organization and a regular backup schedule. The author categorizes their data by importance and sensitivity, employing different strategies for each category, reflecting a focus on preserving critical data in the event of various failure scenarios, from accidental deletion to hardware malfunction or even house fire.

The document "Home Loss File System" outlines a meticulously detailed and comprehensive system for organizing digital files related to a significant and traumatic event: the loss of one's home. Recognizing the overwhelming nature of such a situation and the crucial importance of readily accessible documentation, the spreadsheet provides a structured framework for managing various types of files across different categories. The system aims to streamline the process of retrieving vital information during an already stressful period by categorizing files logically and suggesting specific naming conventions.

The system divides information into five primary categories: Finance, Property, Memories, Daily Life, and Important Documents. Each category is further broken down into subcategories with specific file naming recommendations to ensure consistency and facilitate easy searching. For instance, the Finance category includes subcategories like Insurance, Bills, and Donations Received, while Property encompasses subcategories such as Before Photos, Appraisal Documents, and Repair Estimates. The Memories category provides a space for preserving precious photos, videos, and audio recordings, while Daily Life focuses on managing the logistics of displacement, including temporary housing, food, and transportation. The Important Documents category covers essential personal records such as identification, medical information, and legal documents.

The spreadsheet not only suggests detailed subcategories and file naming conventions but also provides a column for notes, allowing users to add specific context or details about each file. This allows for greater clarity and understanding when revisiting these documents later. Furthermore, the inclusion of a "Location" column emphasizes the importance of backing up these crucial files in multiple locations, such as cloud storage, external hard drives, or physical copies, to mitigate the risk of data loss.

Essentially, the "Home Loss File System" acts as a crucial organizational tool designed to empower individuals navigating the complexities of losing their home. By providing a clear and structured approach to file management, it seeks to alleviate the burden of information retrieval and provide a sense of control during a challenging time. The system's emphasis on detailed categorization, specific file naming, and multiple backups ensures that vital information remains accessible and secure throughout the recovery process.

Summary of Comments ( 75 )
https://news.ycombinator.com/item?id=42700997

Several commenters on Hacker News expressed skepticism about the practicality and necessity of the "Home Loss File System" presented in the linked Google Doc. Some questioned the complexity introduced by the system, suggesting simpler solutions like cloud backups or RAID would be more effective and less prone to user error. Others pointed out potential vulnerabilities related to security and data integrity, especially concerning the proposed encryption method and the reliance on physical media exchange. A few commenters questioned the overall value proposition, arguing that the risk of complete home loss, while real, might be better mitigated through insurance rather than a complex custom file system. The discussion also touched on potential improvements to the system, such as using existing decentralized storage solutions and more robust encryption algorithms.

The Hacker News post titled "Home Loss File System" with the linked Google spreadsheet detailing personal experiences with home loss (presumably due to natural disasters) generated a moderate number of comments, many expressing empathy and sharing related anxieties.

Several commenters focused on the emotional impact of the spreadsheet's contents. They found the accounts poignant and unsettling, highlighting the precariousness of housing security and the devastating consequences of such losses. The raw, personal nature of the entries resonated deeply, reminding readers of the human cost behind these statistics. Some expressed a sense of shared vulnerability and acknowledged the fear of facing similar situations.

A few commenters discussed the practical implications of the data, suggesting it could be valuable for research or advocacy related to disaster preparedness and housing resilience. They pointed out the potential for using this kind of crowdsourced information to understand trends, identify vulnerabilities, and inform policy decisions.

Some of the more compelling comments included reflections on the importance of insurance and the limitations thereof. Commenters discussed the complexities of navigating insurance claims and the potential gaps in coverage that can leave individuals financially devastated. The inadequacy of insurance in truly covering the emotional and personal losses associated with home destruction was also a recurring theme.

Several individuals shared personal anecdotes related to home loss or near misses, adding their own experiences to the collective narrative presented in the spreadsheet. These personal accounts added further weight to the discussion, underscoring the real-world implications of the issues being discussed.

The thread also touched upon broader societal issues related to climate change and its increasing impact on housing security. Some commenters expressed concern about the growing frequency and intensity of natural disasters and the need for more proactive measures to mitigate these risks and protect vulnerable communities.

While there wasn't an overwhelming number of comments, the existing ones provided valuable insights and perspectives on the human impact of home loss, the complexities of insurance, and the growing concerns about climate change and its implications for housing security.

Show HN: Store and render ASCII diagrams in Obsidian

permalink

Posted: 2024-11-12 02:03:21

Obsidian-textgrams is a plugin that allows users to create and embed ASCII diagrams directly within their Obsidian notes. It leverages code blocks and a custom renderer to display the diagrams, offering features like syntax highlighting and the ability to store diagram source code within the note itself. This provides a convenient way to visualize information using simple text-based graphics within the Obsidian environment, eliminating the need for external image files or complex drawing tools.

This GitHub project, titled "obsidian-textgrams," introduces a novel approach to managing and displaying ASCII diagrams within Obsidian, a popular note-taking and knowledge management application. The plugin specifically addresses the challenge of storing and rendering these text-based diagrams, which are often used for visualizations, technical illustrations, and quick sketches. Instead of relying on image embedding, which can be cumbersome and inflexible, obsidian-textgrams allows users to store these diagrams directly within their Markdown files as code blocks. This maintains the inherent portability and editability of plain text.

The plugin leverages a custom code block language identifier, likely textgram or similar, to delineate these diagrams within the Markdown document. This allows Obsidian, with the plugin installed, to distinguish them from standard code blocks. Upon encountering a textgram code block, the plugin intercepts the rendering process. Instead of displaying the raw ASCII text, it parses the content and dynamically generates a visual representation of the diagram. This rendering is likely achieved using a JavaScript library capable of interpreting and visualizing ASCII characters as graphical elements, connecting lines, and forming shapes based on the provided input.

This approach offers several advantages. Firstly, it keeps the diagrams within the text file itself, promoting version control friendliness and avoiding the need to manage separate image files. Secondly, it facilitates easier editing. Users can directly modify the ASCII text within the code block, and the rendered diagram will update accordingly, streamlining the iterative design process. Finally, this method likely preserves the semantic meaning of the diagram, as the underlying ASCII text remains accessible and searchable within Obsidian. This stands in contrast to raster image-based diagrams where the underlying information is lost in the pixel data. In essence, obsidian-textgrams transforms Obsidian into a more powerful tool for working with ASCII diagrams, offering a more integrated and streamlined workflow compared to traditional image-based approaches.

Summary of Comments ( 7 )
https://news.ycombinator.com/item?id=42112168

HN users generally expressed interest in the Obsidian Textgrams plugin, praising its lightweight approach compared to alternatives like Excalidraw or Mermaid. Some suggested improvements, including the ability to embed rendered diagrams as images for compatibility with other Markdown editors, and better text alignment within shapes. One commenter highlighted the usefulness for quickly mocking up system designs or diagrams, while another appreciated its simplicity for note-taking. The discussion also touched upon alternative tools like PlantUML and Graphviz, but the consensus leaned towards appreciating Textgrams' minimalist and fast rendering capabilities within Obsidian. A few users expressed interest in seeing support for more complex shapes and connections.

The Hacker News post "Show HN: Store and render ASCII diagrams in Obsidian" at https://news.ycombinator.com/item?id=42112168 generated several comments discussing various aspects of the project.

Several commenters appreciated the utility of the tool, particularly for quickly sketching out diagrams within Obsidian. One user pointed out the advantage of having diagrams rendered directly within the note-taking application, rather than relying on external tools or image uploads. They specifically mentioned the convenience this offers for quick brainstorming and idea capture. This sentiment was echoed by another user who highlighted the speed and ease of use compared to traditional diagramming software.

The discussion also delved into the technical aspects of the project. One commenter inquired about the rendering process, specifically whether it was client-side or server-side. The project creator clarified that rendering is handled client-side using JavaScript within Obsidian. This prompted further discussion about potential performance implications for complex diagrams.

The choice of using Mermaid.js for rendering was also a topic of conversation. One commenter suggested PlantUML as an alternative, praising its flexibility and extensive feature set. They also pointed out PlantUML's wider adoption and the availability of server-side rendering options. This led to a discussion about the trade-offs between different rendering engines, considering factors like ease of use, feature richness, and performance.

Some commenters expressed interest in extending the plugin's functionality. One suggestion involved integrating with other Obsidian plugins, specifically those focused on graph visualization. Another user proposed adding support for other diagram formats beyond Mermaid.js, such as Graphviz.

Overall, the comments reflect a positive reception of the project, with users acknowledging its practicality and potential for enhancing the Obsidian note-taking experience. The discussion also highlighted areas for potential improvement and expansion, including exploring alternative rendering engines and integrating with other Obsidian plugins. There was a definite interest in the technical aspects of implementation and a healthy discussion regarding the chosen technical stack as well as some alternatives.

Stories with Tag Storage

Summary of Comments ( 77 ) https://news.ycombinator.com/item?id=44105619

Summary of Comments ( 137 ) https://news.ycombinator.com/item?id=43916577

Summary of Comments ( 16 ) https://news.ycombinator.com/item?id=43914677

Summary of Comments ( 1 ) https://news.ycombinator.com/item?id=43887073

Summary of Comments ( 63 ) https://news.ycombinator.com/item?id=43810566

Summary of Comments ( 68 ) https://news.ycombinator.com/item?id=43639642

Summary of Comments ( 17 ) https://news.ycombinator.com/item?id=43535688

Summary of Comments ( 6 ) https://news.ycombinator.com/item?id=43361737

Summary of Comments ( 128 ) https://news.ycombinator.com/item?id=43355031

Summary of Comments ( 0 ) https://news.ycombinator.com/item?id=43294602

Summary of Comments ( 41 ) https://news.ycombinator.com/item?id=43094241

Summary of Comments ( 15 ) https://news.ycombinator.com/item?id=43091159

Summary of Comments ( 101 ) https://news.ycombinator.com/item?id=43013431

Summary of Comments ( 2 ) https://news.ycombinator.com/item?id=42991006

Summary of Comments ( 164 ) https://news.ycombinator.com/item?id=42864788

Summary of Comments ( 15 ) https://news.ycombinator.com/item?id=42805425

Summary of Comments ( 17 ) https://news.ycombinator.com/item?id=42774758

Summary of Comments ( 16 ) https://news.ycombinator.com/item?id=42773296

Summary of Comments ( 75 ) https://news.ycombinator.com/item?id=42700997

Summary of Comments ( 7 ) https://news.ycombinator.com/item?id=42112168

Summary of Comments ( 77 )
https://news.ycombinator.com/item?id=44105619

Summary of Comments ( 137 )
https://news.ycombinator.com/item?id=43916577

Summary of Comments ( 16 )
https://news.ycombinator.com/item?id=43914677

Summary of Comments ( 1 )
https://news.ycombinator.com/item?id=43887073

Summary of Comments ( 63 )
https://news.ycombinator.com/item?id=43810566

Summary of Comments ( 68 )
https://news.ycombinator.com/item?id=43639642

Summary of Comments ( 17 )
https://news.ycombinator.com/item?id=43535688

Summary of Comments ( 6 )
https://news.ycombinator.com/item?id=43361737

Summary of Comments ( 128 )
https://news.ycombinator.com/item?id=43355031

Summary of Comments ( 0 )
https://news.ycombinator.com/item?id=43294602

Summary of Comments ( 41 )
https://news.ycombinator.com/item?id=43094241

Summary of Comments ( 15 )
https://news.ycombinator.com/item?id=43091159

Summary of Comments ( 101 )
https://news.ycombinator.com/item?id=43013431

Summary of Comments ( 2 )
https://news.ycombinator.com/item?id=42991006

Summary of Comments ( 164 )
https://news.ycombinator.com/item?id=42864788

Summary of Comments ( 15 )
https://news.ycombinator.com/item?id=42805425

Summary of Comments ( 17 )
https://news.ycombinator.com/item?id=42774758

Summary of Comments ( 16 )
https://news.ycombinator.com/item?id=42773296

Summary of Comments ( 75 )
https://news.ycombinator.com/item?id=42700997

Summary of Comments ( 7 )
https://news.ycombinator.com/item?id=42112168