Ken Shirriff's blog post details the surprisingly complex circuitry the Pentium CPU uses for multiplication by three. Instead of simply adding a number to itself twice (A + A + A), the Pentium employs a Booth recoding optimization followed by a Wallace tree of carry-save adders and a final carry-lookahead adder. This approach, while requiring more transistors, allows for faster multiplication compared to repeated addition, particularly with larger numbers. Shirriff reverse-engineered this process by analyzing die photos and tracing the logic gates involved, showcasing the intricate optimizations employed in seemingly simple arithmetic operations within the Pentium.
Ken Shirriff reverse-engineered interesting BiCMOS circuits within the Intel Pentium processor, specifically focusing on the clock driver and the bus transceiver. He discovered a clever BiCMOS clock driver design that utilizes both bipolar and CMOS transistors to achieve high speed and low power consumption. This driver employs a push-pull output stage with bipolar transistors for fast switching and CMOS transistors for level shifting. Shirriff also analyzed the Pentium's bus transceiver, revealing a BiCMOS circuit designed for bidirectional communication with external memory. This transceiver leverages the benefits of both technologies to achieve both high speed and strong drive capability. Overall, the analysis showcases the sophisticated circuit design techniques employed in the Pentium to balance performance and power efficiency.
HN commenters generally praised the article for its detailed analysis and clear explanations of complex circuitry. Several appreciated the author's approach of combining visual inspection with simulations to understand the chip's functionality. Some pointed out the rarity and value of such in-depth reverse-engineering work, particularly on older hardware. A few commenters with relevant experience added further insights, discussing topics like the challenges of delayering chips and the evolution of circuit design techniques. One commenter shared a similar decapping endeavor revealing the construction of a different Intel chip. Overall, the discussion expressed admiration for the technical skill and dedication involved in this type of reverse-engineering project.
Summary of Comments ( 62 )
https://news.ycombinator.com/item?id=43233143
Hacker News users discussed the complexity of the Pentium's multiply-by-three circuit, with several expressing surprise at its intricacy. Some questioned the necessity of such a specialized circuit, suggesting simpler alternatives like shifting and adding. Others highlighted the potential performance gains achieved by this dedicated hardware, especially in the context of the Pentium's era. A few commenters delved into the historical context of Booth's multiplication algorithm and its potential relation to the circuit's design. The discussion also touched on the challenges of reverse-engineering hardware and the insights gained from such endeavors. Some users appreciated the detailed analysis presented in the article, while others found the explanation lacking in certain aspects.
The Hacker News post titled "The Pentium contains a complicated circuit to multiply by three" generated a lively discussion with several insightful comments. Many commenters focused on the trade-offs between speed and gate count in early processor design.
One commenter pointed out the historical context, noting that in the era of the Pentium, saving even a single gate could mean substantial cost savings when multiplied across millions of chips. This reinforces the author's point about the lengths designers went to optimize for gate count, even if it resulted in complex logic for seemingly simple operations like multiplication by three.
Another commenter delved into the specifics of the "Booth recoding" technique mentioned in the article, explaining how it efficiently handles signed multiplication. They highlighted that while multiplying by three might appear simple, it becomes more complex when dealing with signed numbers represented in two's complement. Booth recoding, they argued, helps simplify the necessary logic and potentially reduce the overall gate count.
Several commenters discussed the practical implications of such optimizations, particularly in the context of performance-critical code. One pointed out that multiplication by small constants is a common operation in many algorithms. Optimizing these operations, even slightly, could lead to noticeable performance gains overall. They suggested that this kind of optimization was particularly relevant in the early days of computing when processor speeds were significantly lower than they are today.
The complexities of carry-save adders and Wallace trees were also discussed, with commenters explaining how these structures contribute to faster addition, which is a fundamental component of multiplication. One commenter explained how carry-save adders delay the handling of carry bits, allowing for faster addition of multiple numbers. Another commenter linked this back to the original article, suggesting that the Pentium's complex multiplication circuit likely incorporated these techniques to maximize performance.
Some commenters expressed a sense of admiration for the ingenuity of the engineers who designed these circuits. They acknowledged the difficulty of optimizing for both speed and gate count, especially given the limitations of the technology at the time.
Finally, a few commenters touched on the evolution of processor design, contrasting the optimizations used in the Pentium with modern approaches. They noted that with the increasing density and speed of transistors, the focus has shifted somewhat from minimizing gate count to optimizing for other factors like power consumption and thermal management. However, they also acknowledged that the fundamental principles of logic optimization remain relevant even today.