Story Details

  • Show HN: I modeled the Voynich Manuscript with SBERT to test for structure

    Posted: 2025-05-18 16:09:01

    The author used Sentence-BERT (SBERT), a semantic similarity model, to analyze the Voynich Manuscript, hoping to uncover hidden structure. They treated each line of "Voynichese" as a separate sentence and embedded them using SBERT, then visualized these embeddings in a 2D space using UMAP. While visually intriguing patterns emerged, suggesting some level of semantic organization within sections of the manuscript, the author acknowledges that this doesn't necessarily mean the text is meaningful or decipherable. They released their code and data, inviting further exploration and analysis by the community. Ultimately, the project demonstrated a novel application of SBERT to a historical mystery but stopped short of cracking the code itself.

    Summary of Comments ( 28 )
    https://news.ycombinator.com/item?id=44022353

    HN commenters are generally skeptical of the analysis presented. Several point out the small sample size and the risk of overfitting when dealing with such limited data. One commenter notes that previous NLP analysis using Markov chains produced similar results, suggesting the observed "structure" might be an artifact of the method rather than a genuine feature of the manuscript. Another expresses concern that the approach doesn't account for potential cipher keys or transformations, making the comparison to known languages potentially meaningless. There's a general feeling that while interesting, the analysis doesn't provide strong evidence for or against any particular theory about the Voynich Manuscript's origins. A few commenters request more details about the methodology and specific findings to better assess the claims.