The blog post "Kelly Can't Fail," authored by John Mount and published on the Win-Vector LLC website, delves into the oft-misunderstood concept of the Kelly criterion, a formula used to determine optimal bet sizing in scenarios with known probabilities and payoffs. The author meticulously dismantles the common misconception that the Kelly criterion guarantees success, emphasizing that its proper application merely optimizes the long-run growth rate of capital, not its absolute preservation. He accomplishes this by rigorously demonstrating, through mathematical derivation and illustrative simulations coded in R, that even when the Kelly criterion is correctly applied, the possibility of experiencing substantial drawdowns, or losses, remains inherent.
Mount begins by meticulously establishing the mathematical foundations of the Kelly criterion, illustrating how it maximizes the expected logarithmic growth rate of wealth. He then proceeds to construct a series of simulations involving a biased coin flip game with favorable odds. These simulations vividly depict the stochastic nature of Kelly betting, showcasing how even with a statistically advantageous scenario, significant capital fluctuations are not only possible but also probable. The simulations graphically illustrate the wide range of potential outcomes, including scenarios where the wealth trajectory exhibits substantial declines before eventually recovering and growing, emphasizing the volatility inherent in the strategy.
The core argument of the post revolves around the distinction between maximizing expected logarithmic growth and guaranteeing absolute profits. While the Kelly criterion excels at the former, it offers no safeguards against the latter. This vulnerability to large drawdowns, Mount argues, stems from the criterion's inherent reliance on leveraging favorable odds, which, while statistically advantageous in the long run, exposes the bettor to the risk of significant short-term losses. He further underscores this point by contrasting Kelly betting with a more conservative fractional Kelly strategy, demonstrating how reducing the bet size, while potentially slowing the growth rate, can significantly mitigate the severity of drawdowns.
In conclusion, Mount's post provides a nuanced and technically robust explanation of the Kelly criterion, dispelling the myth of its infallibility. He meticulously illustrates, using both mathematical proofs and computational simulations, that while the Kelly criterion provides a powerful tool for optimizing long-term growth, it offers no guarantees against substantial, and potentially psychologically challenging, temporary losses. This clarification serves as a crucial reminder that even statistically sound betting strategies are subject to the inherent volatility of probabilistic outcomes and require careful consideration of risk tolerance alongside potential reward.
This blog post presents a different perspective on deriving Shannon entropy, distinct from the traditional axiomatic approach. Instead of starting with desired properties and deducing the entropy formula, it begins with a fundamental problem: quantifying the average number of bits needed to optimally represent outcomes from a probabilistic source. The author argues this approach provides a more intuitive and grounded understanding of why the entropy formula takes the shape it does.
The post meticulously constructs this derivation. It starts by considering a source emitting symbols from a finite alphabet, each with an associated probability. The core idea is to group these symbols into sets based on their probabilities, specifically targeting sets where the cumulative probability is a power of two. This allows for efficient representation using binary codes, as each set can be uniquely identified by a binary prefix.
The process begins with the most probable symbol and continues iteratively, grouping less probable symbols into progressively larger sets until all symbols are assigned. The author demonstrates how this grouping mirrors the process of building a Huffman code, a well-known algorithm for creating optimal prefix-free codes.
The post then carefully analyzes the expected number of bits required to encode a symbol using this method. This expectation involves summing the product of the number of bits assigned to a set (which relates to the negative logarithm of the cumulative probability of that set) and the cumulative probability of the symbols within that set.
Through a series of mathematical manipulations and approximations, leveraging the properties of logarithms and the behavior of probabilities as the number of samples increases, the author shows that this expected number of bits converges to the familiar Shannon entropy formula: the negative sum of each symbol's probability multiplied by the logarithm base 2 of that probability.
Crucially, the derivation highlights the relationship between optimal coding and entropy. It demonstrates that Shannon entropy represents the theoretical lower bound on the average number of bits needed to encode messages from a given source, achievable through optimal coding schemes like Huffman coding. This construction emphasizes that entropy is not just a measure of uncertainty or information content, but intrinsically linked to efficient data compression and representation. The post concludes by suggesting this alternative construction offers a more concrete and less abstract understanding of Shannon entropy's significance in information theory.
The Hacker News post titled "An alternative construction of Shannon entropy," linking to an article exploring a different way to derive Shannon entropy, has generated a moderate discussion with several interesting comments.
One commenter highlights the pedagogical value of the approach presented in the article. They appreciate how it starts with desirable properties for a measure of information and derives the entropy formula from those, contrasting this with the more common axiomatic approach where the formula is presented and then shown to satisfy the properties. They believe this method makes the concept of entropy more intuitive.
Another commenter focuses on the historical context, mentioning that Shannon's original derivation was indeed based on desired properties. They point out that the article's approach is similar to the one Shannon employed, further reinforcing the pedagogical benefit of seeing the formula emerge from its intended properties rather than the other way around. They link to a relevant page within a book on information theory which seemingly discusses Shannon's original derivation.
A third commenter questions the novelty of the approach, suggesting that it seems similar to standard treatments of the topic. They wonder if the author might be overselling the "alternative construction" aspect. This sparks a brief exchange with another user who defends the article, arguing that while the fundamental ideas are indeed standard, the specific presentation and the emphasis on the grouping property could offer a fresh perspective, especially for educational purposes.
Another commenter delves into more technical details, discussing the concept of entropy as a measure of average code length and relating it to Kraft's inequality. They connect this idea to the article's approach, demonstrating how the desired properties lead to a formula that aligns with the coding interpretation of entropy.
Finally, a few comments touch upon related concepts like cross-entropy and Kullback-Leibler divergence, briefly extending the discussion beyond the scope of the original article. One commenter mentions an example of how entropy is useful, by stating how optimizing for log-loss in a neural network can be interpreted as an attempt to make the predicted distribution very similar to the true distribution.
Overall, the comments section provides a valuable supplement to the article, offering different perspectives on its significance, clarifying some technical points, and connecting it to broader concepts in information theory. While not groundbreaking, the discussion reinforces the importance of pedagogical approaches that derive fundamental formulas from their intended properties.
Summary of Comments ( 120 )
https://news.ycombinator.com/item?id=42466676
The Hacker News comments discuss the limitations and practical challenges of applying the Kelly criterion. Several commenters point out that the Kelly criterion assumes perfect knowledge of the probability distribution of outcomes, which is rarely the case in real-world scenarios. Others emphasize the difficulty of estimating the "edge" accurately, and how even small errors can lead to substantial drawdowns. The emotional toll of large swings, even if theoretically optimal, is also discussed, with some suggesting fractional Kelly strategies as a more palatable approach. Finally, the computational complexity of Kelly for portfolios of correlated assets is brought up, making its implementation challenging beyond simple examples. A few commenters defend Kelly, arguing that its supposed failures often stem from misapplication or overlooking its long-term nature.
The Hacker News post "Kelly Can't Fail" (linking to a Win-Vector blog post about the Kelly Criterion) generated several comments discussing the nuances and practical applications of the Kelly Criterion.
One commenter highlighted the importance of understanding the difference between "fraction of wealth" and "fraction of bankroll," particularly in situations involving leveraged bets. They emphasize that Kelly Criterion calculations should be based on the total amount at risk (bankroll), not just the portion of wealth allocated to a specific betting or investment strategy. Ignoring leverage can lead to overbetting and potential ruin, even if the Kelly formula is applied correctly to the initial capital.
Another commenter raised concerns about the practical challenges of estimating the parameters needed for the Kelly Criterion (specifically, the probabilities of winning and losing). They argued that inaccuracies in these estimates can drastically affect the Kelly fraction, leading to suboptimal or even dangerous betting sizes. This commenter advocates for a more conservative approach, suggesting reducing the calculated Kelly fraction to mitigate the impact of estimation errors.
Another point of discussion revolves around the emotional difficulty of adhering to the Kelly Criterion. Even when correctly applied, Kelly can lead to significant drawdowns, which can be psychologically challenging for investors. One commenter notes that the discomfort associated with these drawdowns can lead people to deviate from the strategy, thus negating the long-term benefits of Kelly.
A further comment thread delves into the application of Kelly to a broader investment context, specifically index funds. Commenters discuss the difficulties in estimating the parameters needed to apply Kelly in such a scenario, given the complexities of market behavior and the long time horizons involved. They also debate the appropriateness of using Kelly for investments with correlated returns.
Finally, several commenters share additional resources for learning more about the Kelly Criterion, including links to academic papers, books, and online simulations. This suggests a general interest among the commenters in understanding the concept more deeply and exploring its practical implications.