The author discovered a critical remote zero-day vulnerability (CVE-2025-37899) in the Linux kernel's SMB implementation, ksmbd, using the o3 fuzzer. This vulnerability allows for remote code execution without authentication, potentially enabling attackers to compromise vulnerable systems. The flaw resides in the handling of extended attributes, specifically when processing EA metadata within SMB2_SET_INFO requests. The fuzzer pinpointed an integer overflow leading to a heap out-of-bounds write, which could then be exploited to gain control. The author developed a proof-of-concept exploit demonstrating arbitrary kernel memory reads and writes, highlighting the severity of the issue. A patch was submitted and accepted upstream, and distributions subsequently released updates addressing this vulnerability.
Jazzberry, a Y Combinator-backed startup, has launched an AI-powered agent designed to automatically find and reproduce bugs in software. It integrates with existing testing workflows and claims to reduce debugging time significantly by autonomously exploring different application states and pinpointing the steps leading to a failure. Jazzberry then provides a detailed report with reproduction steps, stack traces, and contextual information, allowing developers to quickly understand and fix the issue.
The Hacker News comments on Jazzberry, an AI bug-finding agent, express skepticism and raise practical concerns. Several commenters question the value proposition, particularly for complex or nuanced bugs that require deep code understanding. Some doubt the AI's ability to surpass existing static analysis tools or experienced human developers. Others highlight the potential for false positives and the challenge of integrating such a tool into existing workflows. A few express interest in seeing concrete examples or a public beta to assess its real-world capabilities. The lack of readily available information about Jazzberry's underlying technology and methodology further fuels the skepticism. Overall, the comments reflect a cautious wait-and-see attitude towards this new tool.
The blog post advocates using unit tests as a powerful debugging tool for logic errors in Java, particularly when traditional debuggers fall short. It emphasizes writing focused tests around the suspected faulty logic, isolating the problem area and allowing for systematic exploration of different inputs and expected outputs. This approach provides a clear, reproducible way to understand the bug's behavior and verify the fix, offering a more efficient and less frustrating debugging experience compared to stepping through complex code. The post demonstrates this with an example of a faulty binary search implementation, showcasing how targeted tests pinpoint the error and guide the correction process. Finally, it highlights the added benefit of expanding the test suite, providing future protection against regressions and enhancing overall code quality.
Hacker News users generally agreed with the premise of using tests as a debugging tool. Several commenters emphasized that Test-Driven Development (TDD) naturally leads to this approach, as writing tests before the code forces a clearer understanding of the desired behavior and facilitates faster identification of logic errors. Some pointed out that debuggers are still valuable tools, especially for complex issues, but tests provide a more structured and repeatable debugging process. One commenter highlighted the benefit of "mutation testing" to ensure test suite effectiveness. Another user cautioned that while tests are helpful, relying solely on them for debugging might mask deeper architectural issues. There's also a brief discussion about the differences and benefits of unit vs. integration tests in this context.
Well-Typed's blog post introduces Falsify, a new property-based testing tool for Haskell. Falsify shrinks failing test cases by intelligently navigating the type space, aiming for minimal, reproducible examples. Unlike traditional shrinking approaches that operate on the serialized form of a value, Falsify leverages type information to generate simpler values directly within Haskell, often resulting in dramatically smaller and more understandable counterexamples. This type-directed approach allows Falsify to effectively handle complex data structures and custom types, significantly improving the debugging experience for Haskell developers. Furthermore, Falsify's design promotes composability and integration with existing Haskell testing libraries.
Hacker News users discussed Falsify's approach to property-based testing, praising its clever use of type information and noting its potential advantages over traditional shrinking methods. Some commenters expressed interest in similar tools for other languages, while others questioned the performance implications of its Haskell implementation. Several pointed out the connection to Hedgehog's shrinking approach, highlighting Falsify's type-driven refinements. The overall sentiment was positive, with many expressing excitement about the potential improvements Falsify could bring to property-based testing workflows. A few commenters also discussed specific examples and potential use cases, showcasing practical applications of the library.
The blog post argues that speedrunners possess many of the same skills and mindsets as vulnerability researchers. They both meticulously analyze systems, searching for unusual behavior and edge cases that can be exploited for an advantage, whether that's saving milliseconds in a game or bypassing security measures. Speedrunners develop a deep understanding of a system's inner workings through experimentation and observation, often uncovering unintended functionality. This makes them naturally suited to vulnerability research, where finding and exploiting these hidden flaws is the primary goal. The author suggests that with some targeted training and a shift in focus, speedrunners could easily transition into security research, offering a fresh perspective and valuable skillset to the field.
HN commenters largely agree with the premise that speedrunners possess skills applicable to vulnerability research. Several highlighted the meticulous understanding of game mechanics and the ability to manipulate code execution paths as key overlaps. One commenter mentioned the "arbitrary code execution" goal of both speedrunners and security researchers, while another emphasized the creative problem-solving mindset required for both disciplines. A few pointed out that speedrunners already perform a form of vulnerability research when discovering glitches and exploits. Some suggested that formalizing a pathway for speedrunners to transition into security research would be beneficial. The potential for identifying vulnerabilities before game release through speedrunning techniques was also raised.
This blog post explores the challenges of creating a robust test suite for Time-Based One-Time Password (TOTP) algorithms. The author highlights the difficulty in balancing the need for deterministic, repeatable tests with the time-sensitive nature of TOTP codes. They propose using a fixed timestamp and shared secret as a starting point, then exploring variations in time steps and time drift to ensure the algorithm handles edge cases correctly. The post concludes with a call for collaboration and shared test vectors to improve the overall security and reliability of TOTP implementations.
The Hacker News comments discuss the practicality and usefulness of the proposed TOTP test suite. Several commenters point out that existing libraries like oathtool already provide robust implementations and question the need for a new test suite, suggesting that focusing on testing against these established libraries would be more effective. Others highlight the potential value in testing edge cases and different implementations, particularly for less common languages or when implementing TOTP from scratch. The difficulty in obtaining a diverse and representative set of real-world TOTP secrets for testing is also mentioned. Finally, some commenters express concern about the security implications of publishing a comprehensive test suite, fearing it could be misused for malicious purposes.
Roark, a Y Combinator-backed startup, launched a platform to simplify voice AI testing. It addresses the challenges of building and maintaining high-quality voice experiences by providing automated testing tools for conversational flows, natural language understanding (NLU), and speech recognition. Roark allows developers to create test cases, run them across different voice platforms (like Alexa and Google Assistant), and analyze results through a unified dashboard, ultimately reducing manual testing efforts and improving the overall quality and reliability of voice applications.
The Hacker News comments express skepticism and raise practical concerns about Roark's value proposition. Some question whether voice AI testing is a significant enough pain point to warrant a dedicated solution, suggesting existing tools and methods suffice. Others doubt the feasibility of effectively testing the nuances of voice interactions, like intent and emotion, expressing concern about automating such subjective evaluations. The cost and complexity of implementing Roark are also questioned, with some users pointing out the potential overhead and the challenge of integrating it into existing workflows. There's a general sense that while automated testing is valuable, Roark needs to demonstrate more clearly how it addresses the specific challenges of voice AI in a way that justifies its adoption. A few comments offer alternative approaches, like crowdsourced testing, and some ask for clarification on Roark's pricing and features.
Testtrim, a tool designed to reduce the size of test suites while maintaining coverage, ironically struggled to effectively test itself due to its reliance on ptrace for syscall tracing. This limitation prevented Testtrim from analyzing nested calls, leading to incomplete coverage data and hindering its ability to confidently trim its own test suite. A recent update introduces a novel approach using eBPF, enabling Testtrim to accurately trace nested syscalls. This breakthrough allows Testtrim to thoroughly analyze its own behavior and finally optimize its test suite, demonstrating its newfound self-testing capability and reinforcing its effectiveness as a test suite reduction tool.
The Hacker News comments discuss the complexity of testing tools like Testtrim, which aim to provide comprehensive syscall tracing. Several commenters appreciate the author's deep dive into the technical challenges and the clever solution involving a VM and intercepting the vmexit
instruction. Some highlight the inherent difficulties in testing tools that operate at such a low level, where the very act of observation can alter the behavior of the system. One commenter questions the practical applications, suggesting that existing tools like strace
and ptrace
might be sufficient in most scenarios. Others point out that Testtrim's targeted approach, specifically focusing on nested virtualization, addresses a niche but important use case not covered by traditional tools. The discussion also touches on the value of learning obscure assembly instructions and the excitement of low-level debugging.
Matt Keeter describes how an aesthetically pleasing test suite, visualized as colorful 2D and 3D renders, drives development and debugging of his implicit CAD system. He emphasizes the psychological benefit of attractive tests, arguing they encourage more frequent and thorough testing. By visually confirming expected behavior and quickly pinpointing failures through color-coded deviations, the tests guide implementation and accelerate the iterative design process. This approach has proven invaluable in tackling complex geometry problems, allowing him to confidently refactor and extend his system while ensuring correctness.
HN commenters largely praised the author's approach to test-driven development and the resulting elegance of the code. Several appreciated the focus on geometric intuition and visualization, finding the interactive, visual tests particularly compelling. Some pointed out the potential benefits of this approach for education, suggesting it could make learning geometry more engaging. A few questioned the scalability and maintainability of such a system for larger projects, while others noted the inherent limitations of relying solely on visual tests. One commenter suggested exploring formal verification methods like TLA+ to complement the visual approach. There was also a brief discussion on the choice of Python and its suitability for such computationally intensive tasks.
rqlite's testing strategy employs a multi-layered approach. Unit tests cover individual components and functions. Integration tests, leveraging Docker Compose, verify interactions between rqlite nodes in various cluster configurations. Property-based tests, using Hypothesis, automatically generate and run diverse test cases to uncover unexpected edge cases and ensure data integrity. Finally, end-to-end tests simulate real-world scenarios, including node failures and network partitions, focusing on cluster stability and recovery mechanisms. This comprehensive testing regime aims to guarantee rqlite's reliability and robustness across diverse operating environments.
HN commenters generally praised the rqlite testing approach for its simplicity and reliance on real-world SQLite. Several noted the clever use of Docker to orchestrate a realistic distributed environment for testing. Some questioned the level of test coverage, particularly around edge cases and failure scenarios, and suggested adding property-based testing. Others discussed the benefits and drawbacks of integration testing versus unit testing in this context, with some advocating for a more balanced approach. The author of rqlite also participated, responding to questions and clarifying details about the testing strategy and future plans. One commenter highlighted the educational value of the article, appreciating its clear explanation of the testing process.
Rishi Mehta reflects on the key contributions and learnings from AlphaProof, his AI research project focused on automated theorem proving. He highlights the successes of AlphaProof in tackling challenging mathematical problems, particularly in abstract algebra and group theory, emphasizing its unique approach of combining language models with symbolic reasoning engines. The post delves into the specific techniques employed, such as the use of chain-of-thought prompting and iterative refinement, and discusses the limitations encountered. Mehta concludes by emphasizing the significant progress made in bridging the gap between natural language and formal mathematics, while acknowledging the open challenges and future directions for research in automated theorem proving.
Hacker News users discuss AlphaProof's approach to testing, questioning its reliance on property-based testing and mutation testing for catching subtle bugs. Some commenters express skepticism about the effectiveness of these techniques in real-world scenarios, arguing that they might not be as comprehensive as traditional testing methods and could lead to a false sense of security. Others suggest that AlphaProof's methodology might be better suited for specific types of problems, such as concurrency bugs, rather than general software testing. The discussion also touches upon the importance of code review and the potential limitations of automated testing tools. Some commenters found the examples provided in the original article unconvincing, while others praised AlphaProof's innovative approach and the value of exploring different testing strategies.
This paper introduces a new fuzzing technique called Dataflow Fusion (DFusion) specifically designed for complex interpreters like PHP. DFusion addresses the challenge of efficiently exploring deep execution paths within interpreters by strategically combining coverage-guided fuzzing with taint analysis. It identifies critical dataflow paths and generates inputs that maximize the exploration of these paths, leading to the discovery of more bugs. The researchers evaluated DFusion against existing PHP fuzzers and demonstrated its effectiveness in uncovering previously unknown vulnerabilities, including crashes and memory safety issues, within the PHP interpreter. Their results highlight the potential of DFusion for improving the security and reliability of interpreted languages.
Hacker News users discussed the potential impact and novelty of the PHP fuzzer described in the linked paper. Several commenters expressed skepticism about the significance of the discovered vulnerabilities, pointing out that many seemed related to edge cases or functionalities rarely used in real-world PHP applications. Others questioned the fuzzer's ability to uncover truly impactful bugs compared to existing methods. Some discussion revolved around the technical details of the fuzzing technique, "dataflow fusion," with users inquiring about its specific advantages and limitations. There was also debate about the general state of PHP security and whether this research represents a meaningful advancement in securing the language.
Summary of Comments ( 178 )
https://news.ycombinator.com/item?id=44081338
Hacker News users discussed the efficacy of using static analysis tools like O3, with some praising its potential while acknowledging it's not a silver bullet. Several commenters pointed out the vulnerability seemed relatively simple to spot, questioning the need for O3 in this specific case. The conversation also touched on the disclosure process and the discoverer's decision to publish exploit details before a patch was available, sparking debate about responsible disclosure practices. Some users criticized aspects of the write-up itself, such as claims about the novelty of O3's capabilities. Finally, the prevalence of memory safety issues in C code and the role of tools like Rust in mitigating such vulnerabilities were also discussed.
The Hacker News post discussing the blog post about CVE-2025-37899 has generated a substantial number of comments, many of which delve into various technical aspects of the vulnerability and the process used to discover it.
Several commenters commend the author's approach of using compiler optimizations (specifically
-O3
) to uncover the vulnerability. They note the ingenuity of leveraging a tool not typically associated with security research for this purpose. Some discuss how compiler optimizations, while designed to improve performance, can sometimes expose latent bugs by rearranging code in ways that reveal unexpected behavior.A few comments delve into the specific details of the vulnerability, discussing the memory management issues that ultimately lead to the exploit. They analyze how the
-O3
optimization changed the code's execution flow in a way that made the bug manifest.The use of KASAN (Kernel Address Sanitizer) is also highlighted in the comments, with users praising its efficacy in pinpointing the source of the problem. The discussion touches on the importance of robust sanitizers in modern software development, especially for complex systems like the Linux kernel.
Some commenters express concern about the implications of this discovery, pointing out the potential severity of a remote zero-day in such a widely used component. They discuss the potential impact on various systems and the importance of prompt patching.
There's also a discussion around the responsible disclosure process, with commenters expressing appreciation for the author's approach and the timely patching of the vulnerability. The comments highlight the importance of coordinated disclosure to minimize potential harm while ensuring that users have access to necessary updates.
A recurring theme in the comments is the relative simplicity of the vulnerability once it was uncovered. This leads to some speculation about why it wasn't discovered earlier, with suggestions ranging from the complexity of the codebase to the limitations of traditional testing methods.
Finally, some commenters share their own experiences with similar vulnerabilities and discuss the challenges of finding and fixing bugs in complex systems. They offer insights into various debugging techniques and tools, contributing to a broader conversation about software security and best practices.