The blog post explores the potential downsides of using polynomial features in machine learning, particularly focusing on their instability in high dimensions. While polynomial expansion can improve model fit by capturing non-linear relationships, it can also lead to extreme sensitivity to input changes, causing wild oscillations and poor generalization. The author demonstrates this issue with visualizations of simple polynomials raised to high powers and illustrates how even small perturbations in the input can drastically alter the output. They suggest Bernstein polynomials as a more stable alternative, highlighting their properties like non-negativity and partition of unity, which contribute to smoother behavior and better extrapolation. The post concludes that while polynomial features can be beneficial, their inherent instability requires careful consideration and potentially exploration of alternative basis functions like Bernstein polynomials.
"Understanding Machine Learning: From Theory to Algorithms" provides a comprehensive overview of machine learning, bridging the gap between theoretical principles and practical applications. The book covers a wide range of topics, from basic concepts like supervised and unsupervised learning to advanced techniques like Support Vector Machines, boosting, and dimensionality reduction. It emphasizes the theoretical foundations, including statistical learning theory and PAC learning, to provide a deep understanding of why and when different algorithms work. Practical aspects are also addressed through the presentation of efficient algorithms and their implementation considerations. The book aims to equip readers with the necessary tools to both analyze existing learning algorithms and design new ones.
HN users largely praised Shai Shalev-Shwartz and Shai Ben-David's "Understanding Machine Learning" as a highly accessible and comprehensive introduction to the field. Commenters highlighted the book's clear explanations of fundamental concepts, its rigorous yet approachable mathematical treatment, and the helpful inclusion of exercises. Several pointed out its value for both beginners and those with prior ML experience seeking a deeper theoretical understanding. Some compared it favorably to other popular ML resources, noting its superior balance between theory and practice. A few commenters also shared specific chapters or sections they found particularly insightful, such as the treatment of PAC learning and the VC dimension. There was a brief discussion on the book's coverage (or lack thereof) of certain advanced topics like deep learning, but the overall sentiment remained strongly positive.
Summary of Comments ( 1 )
https://news.ycombinator.com/item?id=43764101
HN users discuss potential downsides of polynomial features, particularly in the context of overfitting and interpretability issues. Some argue against their broad categorization as "evil," suggesting they can be valuable when applied judiciously and with proper regularization techniques. One commenter points out their usefulness in approximating non-linear functions and highlights the importance of understanding the underlying data and model behavior. Others discuss alternatives like splines, which offer more local control and flexibility, and the role of feature scaling in mitigating potential problems with polynomial features. The trade-off between complexity and interpretability is a recurring theme, with commenters emphasizing the importance of selecting the right tool for the specific problem and dataset.
The Hacker News post "Are polynomial features the root of all evil? (2024)" with ID 43764101 sparked a discussion with several interesting comments. The overall theme revolves around the author's claim that polynomial features often lead to overfitting and proposes Bernstein polynomials as a superior alternative. Commenters generally agree that overfitting is a valid concern with polynomial features but offer diverse perspectives on the proposed solution and the nuances of feature engineering in general.
One compelling comment points out that the core issue isn't polynomial features themselves, but rather the unchecked growth of the hypothesis space they create. This commenter argues that any basis expansion, including Bernstein polynomials, can lead to overfitting if not properly regularized. They suggest techniques like L1 or L2 regularization as effective ways to mitigate this risk, regardless of the specific polynomial basis used.
Another insightful comment highlights the importance of understanding the underlying data generating process. The commenter argues that if the true relationship is indeed polynomial, using polynomial features is perfectly reasonable. However, they caution against blindly applying polynomial transformations without considering the nature of the data. They propose exploring other basis functions, like trigonometric functions or splines, depending on the specific problem.
Several comments discuss the practical implications of using Bernstein polynomials. One commenter questions their computational efficiency, particularly for high-degree polynomials and large datasets. Another points out that while Bernstein polynomials might offer better extrapolation properties near the boundaries of the input space, they might not necessarily improve interpolation performance within the observed data range.
One commenter provides a more theoretical perspective, suggesting that the benefits of Bernstein polynomials might stem from their ability to form a partition of unity. This property ensures that the sum of the basis functions equals one, which can lead to more stable and predictable behavior, especially in the context of interpolation and approximation.
Finally, a recurring theme in the comments is the importance of cross-validation and proper evaluation metrics. Commenters emphasize that the effectiveness of any feature engineering technique, whether polynomial features or Bernstein polynomials, should be empirically assessed using robust evaluation procedures. Simply observing a good fit on the training data is not sufficient to guarantee generalization performance. Therefore, rigorous cross-validation is crucial for selecting the best approach and avoiding the pitfalls of overfitting.