Researchers have built a 32-bit RISC-V processor using a monolayer of molybdenum disulfide (MoS₂), a two-dimensional semiconductor. This achievement demonstrates the potential of 2D materials for creating extremely thin and energy-efficient transistors, pushing the boundaries of Moore's Law. While slower and larger than state-of-the-art silicon chips, this prototype represents a significant step towards practical applications of 2D semiconductors in computing. The processor, dubbed RV16XNano, successfully executed instructions and represents a promising foundation for future development of more complex and powerful 2D-material-based circuits.
Zentool is a utility for manipulating the microcode of AMD Zen CPUs. It allows researchers and security analysts to extract, inject, and modify microcode updates directly from the processor, bypassing the typical update mechanisms provided by the operating system or BIOS. This enables detailed examination of microcode functionality, identification of potential vulnerabilities, and development of mitigations. Zentool supports various AMD Zen CPU families and provides options for specifying the target CPU core and displaying microcode information. While offering significant research opportunities, it also carries inherent risks, as improper microcode modification can lead to system instability or permanent damage.
Hacker News users discussed the potential security implications and practical uses of Zentool. Some expressed concern about the possibility of malicious actors using it to compromise systems, while others highlighted its potential for legitimate purposes like performance tuning and bug fixing. The ability to modify microcode raises concerns about secure boot and the trust chain, with commenters questioning the verifiability of microcode updates. Several users pointed out the lack of documentation regarding which specific CPU instructions are affected by changes, making it difficult to assess the full impact of modifications. The discussion also touched upon the ethical considerations of such tools and the potential for misuse, with a call for responsible disclosure practices. Some commenters found the project fascinating from a technical perspective, appreciating the insight it provides into low-level CPU operations.
Ken Shirriff's blog post details the surprisingly complex circuitry the Pentium CPU uses for multiplication by three. Instead of simply adding a number to itself twice (A + A + A), the Pentium employs a Booth recoding optimization followed by a Wallace tree of carry-save adders and a final carry-lookahead adder. This approach, while requiring more transistors, allows for faster multiplication compared to repeated addition, particularly with larger numbers. Shirriff reverse-engineered this process by analyzing die photos and tracing the logic gates involved, showcasing the intricate optimizations employed in seemingly simple arithmetic operations within the Pentium.
Hacker News users discussed the complexity of the Pentium's multiply-by-three circuit, with several expressing surprise at its intricacy. Some questioned the necessity of such a specialized circuit, suggesting simpler alternatives like shifting and adding. Others highlighted the potential performance gains achieved by this dedicated hardware, especially in the context of the Pentium's era. A few commenters delved into the historical context of Booth's multiplication algorithm and its potential relation to the circuit's design. The discussion also touched on the challenges of reverse-engineering hardware and the insights gained from such endeavors. Some users appreciated the detailed analysis presented in the article, while others found the explanation lacking in certain aspects.
Chips and Cheese investigated Zen 5's AVX-512 behavior and found that while AVX-512 is enabled and functional, using these instructions significantly reduces clock speeds. Their testing shows a consistent frequency drop across various AVX-512 workloads, with performance ultimately worse than using AVX2 despite the higher theoretical throughput of AVX-512. This suggests that AMD likely enabled AVX-512 for compatibility rather than performance, and users shouldn't expect a performance uplift from applications leveraging these instructions on Zen 5. The power consumption also significantly increases with AVX-512 workloads, exceeding even AMD's own TDP specifications.
Hacker News users discussed the potential implications of the observed AVX-512 frequency behavior on Zen 5. Some questioned the benchmarks, suggesting they might not represent real-world workloads and pointed out the importance of considering power consumption alongside frequency. Others discussed the potential benefits of AVX-512 despite the frequency drop, especially for specific workloads. A few comments highlighted the complexity of modern CPU design and the trade-offs involved in balancing performance, power efficiency, and heat management. The practicality of disabling AVX-512 for higher clock speeds was also debated, with users considering the potential performance hit from switching instruction sets. Several users expressed interest in further benchmarks and a more in-depth understanding of the underlying architectural reasons for the observed behavior.
The F8 is a new 8-bit computer architecture designed for efficiency in both code size and memory usage, especially when programming in C. It aims to achieve performance comparable to 16-bit systems while maintaining the simplicity and resource efficiency of 8-bit designs. This is accomplished through features like a hybrid stack/register-based architecture, variable-width instructions, and dedicated instructions for common C operations like pointer manipulation and function calls. The F8 also emphasizes practical applications with features like a built-in bootloader and support for direct connection to peripherals.
Hacker News users discussed the F8 architecture's unusual design choices. Several commenters questioned the practical applications given the performance tradeoffs for memory efficiency, particularly with modern memory availability. Some debated the value of 8-bit architectures in niche applications like microcontrollers, while others pointed out existing alternatives like AVR. The unusual register structure and lack of hardware stack were also discussed, with some suggesting it might hinder C compiler optimization. A few expressed interest in the unique approach, though skepticism about real-world viability was prevalent. Overall, the comments reflected a cautious curiosity towards F8 but with reservations about its usefulness compared to established architectures.
The blog post details troubleshooting high CPU usage attributed to the writeback
process in a Linux kernel. After initial investigations pointed towards cgroups and specifically the cpu.cfs_period_us
parameter, the author traced the issue to a tight loop within the cgroup writeback mechanism. This loop was triggered by a large number of cgroups combined with a specific workload pattern. Ultimately, increasing the dirty_expire_centisecs
kernel parameter, which controls how long dirty data stays in memory before being written to disk, provided the solution by significantly reducing the writeback activity and lowering CPU usage.
Commenters on Hacker News largely discuss practical troubleshooting steps and potential causes of the high CPU usage related to cgroups writeback described in the linked blog post. Several suggest using tools like perf
to profile the kernel and pinpoint the exact function causing the issue. Some discuss potential problems with the storage layer, like slow I/O or a misconfigured RAID, while others consider the possibility of a kernel bug or an interaction with specific hardware or drivers. One commenter shares a similar experience with NFS and high CPU usage related to writeback, suggesting a potential commonality in networked filesystems. Several users emphasize the importance of systematic debugging and isolation of the problem, starting with simpler checks before diving into complex kernel analysis.
A high-severity vulnerability, dubbed "SQUIP," affects AMD EPYC server processors. This flaw allows attackers with administrative privileges to inject malicious microcode updates, bypassing AMD's signature verification mechanism. Successful exploitation could enable persistent malware, data theft, or system disruption, even surviving operating system reinstalls. While AMD has released patches and updated documentation, system administrators must apply the necessary BIOS updates to mitigate the risk. This vulnerability underscores the importance of secure firmware update processes and highlights the potential impact of compromised low-level system components.
Hacker News users discussed the implications of AMD's microcode signature verification vulnerability, expressing concern about the severity and potential for exploitation. Some questioned the practical exploitability given the secure boot process and the difficulty of injecting malicious microcode, while others highlighted the significant potential damage if exploited, including bypassing hypervisors and gaining kernel-level access. The discussion also touched upon the complexity of microcode updates and the challenges in verifying their integrity, with some users suggesting hardware-based solutions for enhanced security. Several commenters praised Google for responsibly disclosing the vulnerability and AMD for promptly addressing it. The overall sentiment reflected a cautious acknowledgement of the risk, balanced by the understanding that exploitation likely requires significant resources and sophistication.
T1 is an open-source, research-oriented implementation of a RISC-V vector processor. It aims to explore the microarchitecture tradeoffs of the RISC-V vector extension (RVV) by providing a configurable and modular platform for experimentation. The project includes a synthesizable core written in SystemVerilog, a software toolchain, and a cycle-accurate simulator. T1 allows researchers to modify various parameters, such as vector register file size, number of functional units, and memory subsystem configuration, to evaluate their impact on performance and area. Its primary goal is to advance RISC-V vector processing research and foster collaboration within the community.
Hacker News users discuss the open-sourced T1 RISC-V vector processor, expressing excitement about its potential and implications. Several commenters praise its transparency, contrasting it with proprietary vector extensions. The modular and scalable design is highlighted, making it suitable for diverse applications. Some discuss the potential impact on education, enabling hands-on learning of vector processor design. Others express interest in seeing benchmark comparisons and exploring potential uses in areas like AI acceleration and HPC. Some question its current maturity and performance compared to existing solutions. The lack of clear licensing information is also raised as a concern.
Stats is a free and open-source macOS menu bar application that provides a comprehensive overview of system performance. It displays real-time information on CPU usage, memory, network activity, disk usage, battery health, and fan speeds, all within a customizable and compact menu bar interface. Users can tailor the displayed modules and their appearance to suit their needs, choosing from various graph styles and refresh rates. Stats aims to be a lightweight yet powerful alternative to larger system monitoring tools.
Hacker News users generally praised Stats' minimalist design and useful information display in the menu bar. Some suggested improvements, including customizable refresh rates, more detailed CPU information (like per-core usage), and GPU temperature monitoring for M1 Macs. Others questioned the need for another system monitor given existing options, with some pointing to iStat Menus as a more mature alternative. The developer responded to several comments, acknowledging the suggestions and clarifying current limitations and future plans. Some users appreciated the open-source nature of the project and the developer's responsiveness. There was also a minor discussion around the chosen license (GPLv3).
Researchers have revealed new speculative execution attacks impacting all modern Apple CPUs. These attacks, named "Macchiato" and "Espresso," exploit speculative access to virtual memory and the memory management unit (MMU), respectively. Unlike previous speculative execution vulnerabilities, Macchiato can leak data cross-process, while Espresso can bypass memory isolation protections entirely, potentially allowing malicious apps to access kernel memory. While mitigations exist, they come with a performance cost. These attacks highlight the ongoing challenge of securing modern processors against increasingly sophisticated side-channel attacks.
HN commenters discuss the practicality and impact of the speculative execution attacks detailed in the linked article. Some doubt the real-world exploitability, citing the complexity and specific conditions required. Others express concern about the ongoing nature of these vulnerabilities and the difficulty in mitigating them fully. A few highlight the cat-and-mouse game between security researchers and hardware vendors, with mitigations often leading to new attack vectors. The lack of concrete proof-of-concept exploits is also a point of discussion, with some arguing it diminishes the severity of the findings while others emphasize the potential for future exploitation. The overall sentiment leans towards cautious skepticism, acknowledging the research's importance while questioning the immediate threat level.
SiFive's P550 is a high-performance RISC-V CPU microarchitecture designed for applications needing high single-threaded performance. It achieves this through a deep, out-of-order execution pipeline with a 13-stage front-end and a 7-stage back-end. Key features include a large reorder buffer, sophisticated branch prediction, and a high-bandwidth memory subsystem. While inheriting some features from the P550's predecessor (the U74), the P550 boasts significant IPC improvements, increased clock speeds, and enhanced vector performance, positioning it competitively against Arm's Cortex-A75. The microarchitecture prioritizes performance density, aiming to deliver high throughput within a reasonable area footprint.
Hacker News users discuss SiFive's P550 microarchitecture, generally praising its performance and efficiency gains. Several commenters note the clever innovations, like the register renaming scheme and the out-of-order execution improvements. Some express interest in seeing comparisons against Arm's Cortex-A710, while others focus on the potential of RISC-V and its open-source nature to disrupt the established processor landscape. A few users raise questions about the microarchitecture's power consumption and its suitability for specific applications, such as mobile devices. The overall sentiment appears positive, with many anticipating further developments and wider adoption of RISC-V based designs.
A quirk in the Motorola 68030 processor inadvertently enabled the Mac Classic II to boot despite its ROM lacking proper 32-bit addressing support. The Classic II's ROM mistakenly used a "MOVEA" instruction with a 32-bit address, which should have caused a failure on the 24-bit address bus. However, the 68030, when configured for a 24-bit bus, ignores the upper byte of the 32-bit address in this specific instruction. This unintentional compatibility allowed the flawed ROM to function, making the Classic II's boot process seemingly normal despite the underlying programming error.
Hacker News commenters on the Mac Classic II boot anomaly generally express fascination with the technical details and the serendipitous nature of the discovery. Several commenters delve into the specifics of 680x0 instruction sets and how an invalid instruction could inadvertently lead to a successful boot, speculating about memory initialization and undocumented behavior. Some share anecdotes about similar unexpected behaviors encountered during their own retrocomputing explorations. A few commenters also highlight the importance of such stories in preserving computer history and understanding the quirks of older hardware. The overall sentiment reflects appreciation for the ingenuity and occasional happy accidents that shaped early computing.
Chips and Cheese's analysis of AMD's Zen 5 architecture reveals the performance impact of its op-cache and clustered decoder design. By disabling the op-cache, they demonstrated a significant performance drop in most benchmarks, confirming its effectiveness in reducing instruction fetch traffic. Their investigation also highlighted the clustered decoder structure, showing how instructions are distributed and processed within the core. This clustering likely contributes to the core's increased instruction throughput, but the authors note further research is needed to fully understand its intricacies and potential bottlenecks. Overall, the analysis suggests that both the op-cache and clustered decoder play key roles in Zen 5's performance improvements.
Hacker News users discussed the potential implications of Chips and Cheese's findings on Zen 5's op-cache. Some expressed skepticism about the methodology, questioning the use of synthetic benchmarks and the lack of real-world application testing. Others pointed out that disabling the op-cache might expose underlying architectural bottlenecks, providing valuable insight for future CPU designs. The impact of the larger decoder cache also drew attention, with speculation on its role in mitigating the performance hit from disabling the op-cache. A few commenters highlighted the importance of microarchitectural deep dives like this one for understanding the complexities of modern CPUs, even if the specific findings aren't directly applicable to everyday usage. The overall sentiment leaned towards cautious curiosity about the results, acknowledging the limitations of the testing while appreciating the exploration of low-level CPU behavior.
The blog post details the creation of an extremely fast phrase search algorithm leveraging the AVX-512 instruction set, specifically the VPCONFLICTM
instruction. This instruction, designed to detect hash collisions, is repurposed to efficiently find exact occurrences of phrases within a larger text. By cleverly encoding both the search phrase and the text into a format suitable for VPCONFLICTM
, the algorithm can rapidly compare multiple sections of the text against the phrase simultaneously. This approach bypasses the character-by-character comparisons typical in other string search methods, resulting in significant performance gains, particularly for short phrases. The author showcases impressive benchmarks demonstrating substantial speed improvements compared to existing techniques.
Several Hacker News commenters express skepticism about the practicality of the described AVX-512 phrase search algorithm. Concerns center around the limited availability of AVX-512 hardware, the potential for future deprecation of the instruction set, and the complexity of the code making it difficult to maintain and debug. Some question the benchmark methodology and the real-world performance gains compared to simpler SIMD approaches or existing optimized libraries. Others discuss the trade-offs between speed and portability, suggesting that the niche benefits might not outweigh the costs for most use cases. There's also a discussion of alternative approaches and the potential for GPUs to outperform CPUs in this task. Finally, some commenters express fascination with the cleverness of the algorithm despite its practical limitations.
VexRiscv is a highly configurable 32-bit RISC-V CPU implementation written in SpinalHDL, specifically designed for FPGA integration. Its modular and customizable architecture allows developers to tailor the CPU to their specific application needs, including features like caches, MMU, multipliers, and various peripherals. This flexibility offers a balance between performance and resource utilization, making it suitable for a wide range of embedded systems. The project provides a comprehensive ecosystem with simulation tools, examples, and pre-configured configurations, simplifying the process of integrating and evaluating the CPU.
Hacker News users discuss VexRiscv's impressive performance and configurability, highlighting its usefulness for FPGA projects. Several commenters praise its clear documentation and ease of customization, with one mentioning successful integration into their own projects. The minimalist design and the ability to tailor it to specific needs are seen as major advantages. Some discussion revolves around comparisons with other RISC-V implementations, particularly regarding performance and resource utilization. There's also interest in the SpinalHDL language used to implement VexRiscv, with some inquiries about its learning curve and benefits over traditional HDLs like Verilog.
Ken Shirriff reverse-engineered interesting BiCMOS circuits within the Intel Pentium processor, specifically focusing on the clock driver and the bus transceiver. He discovered a clever BiCMOS clock driver design that utilizes both bipolar and CMOS transistors to achieve high speed and low power consumption. This driver employs a push-pull output stage with bipolar transistors for fast switching and CMOS transistors for level shifting. Shirriff also analyzed the Pentium's bus transceiver, revealing a BiCMOS circuit designed for bidirectional communication with external memory. This transceiver leverages the benefits of both technologies to achieve both high speed and strong drive capability. Overall, the analysis showcases the sophisticated circuit design techniques employed in the Pentium to balance performance and power efficiency.
HN commenters generally praised the article for its detailed analysis and clear explanations of complex circuitry. Several appreciated the author's approach of combining visual inspection with simulations to understand the chip's functionality. Some pointed out the rarity and value of such in-depth reverse-engineering work, particularly on older hardware. A few commenters with relevant experience added further insights, discussing topics like the challenges of delayering chips and the evolution of circuit design techniques. One commenter shared a similar decapping endeavor revealing the construction of a different Intel chip. Overall, the discussion expressed admiration for the technical skill and dedication involved in this type of reverse-engineering project.
The AMD Radeon Instinct MI300A boasts a massive, unified memory subsystem, key to its performance as an APU designed for AI and HPC workloads. It combines 128GB of HBM3 memory with 8 stacks of 16GB each, offering impressive bandwidth. This memory is unified across the CPU and GPU dies, simplifying programming and boosting efficiency. AMD achieves this through a sophisticated design involving a combination of Infinity Fabric links, memory controllers integrated into the CPU dies, and a complex scheduling system to manage data movement. This architecture allows the MI300A to access and process large datasets efficiently, crucial for the demanding tasks it's targeted for.
Hacker News users discussed the complexity and impressive scale of the MI300A's memory subsystem, particularly the challenges of managing coherence across such a large and varied memory space. Some questioned the real-world performance benefits given the overhead, while others expressed excitement about the potential for new kinds of workloads. The innovative use of HBM and on-die memory alongside standard DRAM was a key point of interest, as was the potential impact on software development and optimization. Several commenters noted the unusual architecture and speculated about its suitability for different applications compared to more traditional GPU designs. Some skepticism was expressed about AMD's marketing claims, but overall the discussion was positive, acknowledging the technical achievement represented by the MI300A.
Qualcomm has prevailed in a significant licensing dispute with Arm. A confidential arbitration ruling affirmed Qualcomm's right to continue licensing Arm's instruction set architecture for its Nuvia-designed chips under existing agreements. This victory allows Qualcomm to proceed with its plans to incorporate these custom-designed processors into its products, potentially disrupting the server chip market. Arm had argued that the licenses were non-transferable after Qualcomm acquired Nuvia, but the arbitrator disagreed. Financial details of the ruling remain undisclosed.
Hacker News commenters largely discuss the implications of Qualcomm's legal victory over Arm. Several express concern that this decision sets a dangerous precedent, potentially allowing companies to sub-license core technology they don't fully own, stifling innovation and competition. Some speculate this could push other chip designers to RISC-V, an open-source alternative to Arm's architecture. Others question the long-term viability of Arm's business model if they cannot control their own licensing. Some commenters see this as a specific attack on Nuvia's (acquired by Qualcomm) custom core designs, with Qualcomm leveraging their market power. Finally, a few express skepticism about the reporting and suggest waiting for further details to emerge.
Summary of Comments ( 39 )
https://news.ycombinator.com/item?id=43621378
Hacker News users discuss the implications of a RISC-V processor built with a 2D semiconductor. Several express excitement about the potential for flexible electronics and extremely low power consumption, envisioning applications in wearables and IoT devices. Some question the practicality due to the current limitations in clock speed and memory integration, while others point out the significant achievement of creating a functional processor with this technology at all. A few commenters delve into the specifics of the fabrication process and the challenges of scaling this technology for commercial production. Concerns about the fragility of the material and the potential difficulty in handling and packaging are also raised. Overall, the sentiment leans towards cautious optimism about the long-term possibilities of 2D semiconductors in computing.
The Hacker News post "A 32-bit processor made with an atomically thin semiconductor" discussing an Ars Technica article about a RISC-V processor built using a 2D semiconductor, generated a moderate number of comments, many of which delve into the technical details and potential implications of the research.
Several commenters focused on the performance aspects. One noted the extremely low clock speed (1 kHz) and questioned the practical applications given this limitation. Another commenter built on this, explaining that the low clock speed is likely due to the high resistance of the thin semiconductor material. They further elaborated that while the transistor density could theoretically be much higher, the interconnect resistance would become a bottleneck.
The discussion also touched upon the challenges of manufacturing and scaling this technology. A commenter pointed out that creating larger, more complex chips using this 2D material would be difficult due to defects. They questioned whether it would be possible to scale this to create a commercially viable product. Another commenter highlighted the specific challenges in achieving uniformity and consistency in a large-scale manufacturing process for atomically thin materials.
The potential advantages of 2D semiconductors were also discussed. One commenter mentioned the possibility of flexible electronics, suggesting that this technology could pave the way for devices that are bendable or even foldable. Another commenter mentioned potential applications in areas where power consumption is extremely important since reducing the thickness to the atomic level can impact a device's energy requirements.
Some comments delved into the specifics of the RISC-V architecture. One commenter pointed out that while the processor is a 32-bit RISC-V design, it lacks features commonly found in modern processors, making it more of a proof-of-concept rather than a practical processor.
Finally, a few commenters expressed skepticism, suggesting that this research, while interesting, is a long way from commercial viability. They emphasized that the current limitations in performance and manufacturing make it unlikely to replace existing silicon technology in the near future.
In summary, the comments section explored the technical complexities, potential benefits, and significant challenges associated with using 2D semiconductors for processor design. While excitement was expressed for the potential of this technology, many commenters remained realistic about the long road ahead for commercialization.