Story Details

  • Recommendations for designing magic numbers of binary file formats

    Posted: 2025-03-14 20:05:53

    To minimize the risks of file format ambiguity, choose magic numbers for binary files that are uncommon and easily distinguishable. Favor longer magic numbers (at least 4 bytes) and incorporate asymmetry and randomness while avoiding printable ASCII characters. Consider including a version number within the magic to facilitate future evolution and potentially embedding the magic at both the beginning and end of the file for enhanced validation. This approach helps differentiate your file format from existing ones, reducing the likelihood of misidentification and improving long-term compatibility.

    Summary of Comments ( 18 )
    https://news.ycombinator.com/item?id=43366671

    HN users discussed various strategies for handling magic numbers in binary file formats. Several commenters emphasized using longer, more unique magic numbers to minimize the chance of collisions with other file types. Suggestions included incorporating version numbers, checksums, or even reserved bytes within the magic number sequence. The use of human-readable ASCII characters within the magic number was debated, with some advocating for it for easier identification in hex dumps, while others prioritized maximizing entropy for more robust collision resistance. Using an initial "container" format with metadata and a secondary magic number for the embedded data was also proposed as a way to handle versioning and complex file structures. Finally, the discussion touched on the importance of registering new magic numbers to avoid conflicts and the practical reality that collisions can often be resolved contextually, even with shorter magic numbers.