Story Details

  • Pre-Trained Large Language Models Use Fourier Features for Addition (2024)

    Posted: 2025-02-06 10:31:06

    This paper investigates how pre-trained large language models (LLMs) perform integer addition. It finds that LLMs, despite lacking explicit training on arithmetic, learn to leverage positional encoding based on Fourier features to represent numbers internally. This allows them to achieve surprisingly good accuracy on addition tasks, particularly within the range of numbers present in their training data. The authors demonstrate this by analyzing attention patterns and comparing LLM performance with models using alternative positional encodings. They also show how manipulating or ablating these Fourier features directly impacts the models' ability to add, strongly suggesting that LLMs have implicitly learned a form of Fourier-based arithmetic.

    Summary of Comments ( 15 )
    https://news.ycombinator.com/item?id=42960989

    Hacker News users discussed the surprising finding that LLMs appear to use Fourier features internally to perform addition, as indicated by the linked paper. Several commenters expressed fascination with this emergent behavior, highlighting how LLMs discover and utilize mathematical concepts without explicit instruction. Some questioned the paper's methodology and the strength of its conclusions, suggesting alternative explanations or calling for further research to solidify the claims. A few users also discussed the broader implications of this discovery for understanding how LLMs function and how they might be improved. The potential link to the Fourier-based positional encoding used in Transformer models was also noted as a possible contributing factor.