The author meticulously debugged a mysterious issue where transferring Apple DOS 3.3 system files to a blank diskette sometimes resulted in a bootable disk, and sometimes a non-bootable one, despite seemingly identical procedures. Through painstaking analysis of the DOS 3.3 source code and assembly-level debugging, they discovered the culprit: a timing-sensitive bug within the SYS.COM
program related to how it handled track zero formatting. Specifically, SYS.COM
occasionally failed to wait for the drive head to settle after seeking to track zero before writing, resulting in corrupted data on the disk. This timing issue was sensitive to drive mechanics and environmental factors, explaining the intermittent nature of the problem. The author's fix involved adding a small delay within SYS.COM
to ensure the drive head had stabilized before writing, resolving the frustrating bug and guaranteeing consistent creation of bootable disks.
This blog post by Eric Brutman recounts a fascinating deep dive into the inner workings of Apple DOS 3.3's SYS.COM
utility, a program designed to make a disk bootable. The author sets the scene by describing the seemingly simple task of transferring SYS.COM
to a newly formatted disk, a process which unexpectedly failed when attempted on an Apple IIe. This anomaly sparked a multi-faceted investigation into the precise mechanics of the utility and the underlying reasons for its failure.
Brutman begins by meticulously outlining the expected behavior of SYS.COM
. It should copy specific sectors containing the boot loader and DOS kernel from itself to the target disk, making it bootable. He leverages a disk imaging tool to analyze the contents of both a known working boot disk and the malfunctioning disk created on the Apple IIe. This comparative analysis reveals a critical difference: a single byte discrepancy in the boot sector of the problematic disk. This tiny error, a value of 00
where 03
was expected, effectively rendered the disk unbootable.
The quest to pinpoint the root cause of this byte corruption takes Brutman down a rabbit hole of assembly code analysis. He painstakingly disassembles the SYS.COM
utility, scrutinizing every instruction related to the writing of the boot sector. The investigation leads him to a section of code responsible for moving data from memory to a specific location on the disk, a process involving the Apple II's sophisticated memory addressing modes. Specifically, the bug is traced to an indirect indexed addressing mode instruction interacting with zero page usage and a subtle side effect related to how the code handles track sector lists. This particular instruction, designed to calculate the memory address for the next sector to be written, inadvertently modifies a zero page location used by the disk write routine, ultimately corrupting the crucial byte in the boot sector.
Brutman further clarifies that this bug manifests only on the Apple IIe, a machine with a slightly different memory map compared to its predecessors. He pinpoints the root cause to a hardware change: the relocation of the firmware involved in disk I/O operations to a different memory address. This shift, coupled with the specific way SYS.COM
was written, creates the unforeseen side effect that corrupts the boot sector.
The post concludes with a proposed solution: a patched version of SYS.COM
that circumvents the problematic instruction by using a less ambiguous addressing mode. This corrected version successfully creates bootable disks on the Apple IIe, resolving the initial mystery. Brutman emphasizes the value of understanding the intricacies of hardware and software interactions, showcasing how even a single byte error can unravel the functionality of a seemingly straightforward utility. The entire investigation stands as a testament to the power of meticulous debugging and the complexities hidden beneath the surface of vintage computing systems.
Summary of Comments ( 5 )
https://news.ycombinator.com/item?id=43154451
Several Hacker News commenters praised the author's clear and detailed write-up of the bug hunt, appreciating the methodical approach and the insights into early DOS development. Some shared their own experiences with similar bugs and debugging processes in other systems. One commenter pointed out the historical significance of relying on undocumented behavior, a common practice at the time due to limited documentation. Others discussed the challenges of working with older hardware and software, and the satisfaction of successfully solving such intricate problems. The overall sentiment reflects admiration for the detective work involved and nostalgia for the era of simpler, yet more opaque, computing.
The Hacker News post "The DOS 3.3 Sys.com Bug Hunt" has a modest number of comments, generating a discussion focused primarily on the technical details of the bug and the debugging process.
Several commenters express admiration for the author's detective work in tracking down the obscure bug. One commenter describes the process as "a fantastic example of really digging into a problem and not giving up until it was solved." Another echoes this sentiment, highlighting the satisfaction derived from "carefully isolating a subtle, difficult to reproduce bug" and emphasizing the methodical approach required in an environment with limited debugging tools.
The technical discussion delves into the intricacies of the 6502 processor and Apple II disk controller hardware. One comment explains the bug's root cause as an unexpected interaction between interrupt handling, DMA timing, and the specific behavior of the RWTS (Read/Write Track Sector) routine in the DOS code. This comment details how the bug manifested as a race condition that only occurred under very specific circumstances related to track sector reads and the interrupt timing. Another commenter adds further technical depth, explaining how the exact timing of the DMA and interrupt sequence could lead to the corruption of a critical variable used in the DOS file system operations.
Beyond the technical analysis, some comments reflect on the challenges of debugging in the early days of personal computing. One user recounts personal experiences with similar debugging struggles on the Apple II, emphasizing the patience and creativity required in such resource-constrained environments. Another comment points out the historical context, noting that the availability of source code and detailed hardware documentation, which are taken for granted today, were valuable resources that greatly aided the author's investigation.
While the majority of comments focus on the technical aspects of the bug, some users share anecdotal experiences related to the Apple II and early DOS versions, contributing to a sense of nostalgia for the early days of personal computing.