The author, initially enthusiastic about AI's potential to revolutionize scientific discovery, realized that current AI/ML tools are primarily useful for accelerating specific, well-defined tasks within existing scientific workflows, rather than driving paradigm shifts or independently generating novel hypotheses. While AI excels at tasks like optimizing experiments or analyzing large datasets, its dependence on existing data and human-defined parameters limits its capacity for true scientific creativity. The author concludes that focusing on augmenting scientists with these powerful tools, rather than replacing them, is a more realistic and beneficial approach, acknowledging that genuine scientific breakthroughs still rely heavily on human intuition and expertise.
The claim that kerosene saved sperm whales from extinction is a myth. While kerosene replaced sperm whale oil in lamps and other applications, this shift occurred after whale populations had already drastically declined due to overhunting. The demand for whale oil, not its eventual replacement, drove whalers to hunt sperm whales to near-extinction. Kerosene's rise simply made continued whaling less profitable, not less damaging up to that point. The article emphasizes that technological replacements rarely save endangered species; rather, conservation efforts are crucial.
HN users generally agree with the author's debunking of the "kerosene saved the sperm whales" myth. Several commenters provide further details on whale oil uses beyond lighting, such as lubricants and industrial processes, reinforcing the idea that declining demand was more complex than a single replacement. Some discuss the impact of petroleum on other industries and the historical context of resource transitions. A few express appreciation for the well-researched article and the author's clear writing style, while others point to additional resources and related historical narratives, including the history of whaling and the environmental impacts of different industries. A small side discussion touches on the difficulty of predicting technological advancements and their impact on existing markets.
The blog post argues that SQLite, often perceived as a lightweight embedded database, is surprisingly well-suited for large-scale server deployments, even outperforming traditional client-server databases in certain scenarios. It posits that SQLite's simplicity, file-based nature, and lack of a separate server process translate to reduced operational overhead, easier scaling through horizontal sharding, and superior performance for read-heavy workloads, especially when combined with efficient caching mechanisms. While acknowledging limitations for complex joins and write-heavy applications, the author contends that SQLite's strengths make it a compelling, often overlooked option for modern web backends, particularly those focusing on serving static content or leveraging serverless functions.
Hacker News users discussed the practicality and nuance of using SQLite as a server-side database, particularly at scale. Several commenters challenged the author's assertion that SQLite is better at hyper-scale than micro-scale, pointing out that its single-writer nature introduces bottlenecks in heavily write-intensive applications, precisely the kind often found at smaller scales. Some argued the benefits of SQLite, like simplicity and ease of deployment, are more valuable in microservices and serverless architectures, where scale is addressed through horizontal scaling and data sharding. The discussion also touched on the benefits of SQLite's reliability and its suitability for read-heavy workloads, with some users suggesting its effectiveness for data warehousing and analytics. Several commenters offered their own experiences, some highlighting successful use cases of SQLite at scale, while others pointed to limitations encountered in production environments.
The blog post "Vpternlog: When three is 100% more than two" explores the confusion surrounding ternary logic's perceived 50% increase in information capacity compared to binary. The author argues that while a ternary digit (trit) can hold three values versus a bit's two, this represents a 100% increase (three being twice as much as 1.5, which is the midpoint between 1 and 2) in potential values, not 50%. The post delves into the logarithmic nature of information capacity and uses the example of how many bits are needed to represent the same range of values as a given number of trits, demonstrating that the increase in capacity is closer to 63%, calculated using log base 2 of 3. The core point is that measuring increases in information capacity requires logarithmic comparison, not simple subtraction or division.
Hacker News users discuss the nuances of ternary logic's efficiency compared to binary. Several commenters point out that the article's claim of ternary being "100% more" than binary is misleading. They argue that the relevant metric is information density, calculated using log base 2, which shows ternary as only about 58% more efficient. Discussions also revolved around practical implementation challenges of ternary systems, citing issues with noise margins and the relative ease and maturity of binary technology. Some users mention the historical use of ternary computers, like Setun, while others debate the theoretical advantages and whether these outweigh the practical difficulties. A few also explore alternative bases beyond ternary and binary.
Cosine similarity, while popular for comparing vectors, can be misleading when vector magnitudes carry significant meaning. The blog post demonstrates how cosine similarity focuses solely on the angle between vectors, ignoring their lengths. This can lead to counterintuitive results, particularly in scenarios like recommendation systems where a small, highly relevant vector might be ranked lower than a large, less relevant one simply due to magnitude differences. The author advocates for considering alternatives like dot product or Euclidean distance, especially when vector magnitude represents important information like purchase count or user engagement. Ultimately, the choice of similarity metric should depend on the specific application and the meaning encoded within the vector data.
Hacker News users generally agreed with the article's premise, cautioning against blindly applying cosine similarity. Several commenters pointed out that the effectiveness of cosine similarity depends heavily on the specific use case and data distribution. Some highlighted the importance of normalization and feature scaling, noting that cosine similarity is sensitive to these factors. Others offered alternative methods, such as Euclidean distance or Manhattan distance, suggesting they might be more appropriate in certain situations. One compelling comment underscored the importance of understanding the underlying data and problem before choosing a similarity metric, emphasizing that no single metric is universally superior. Another emphasized how important preprocessing is, highlighting TF-IDF and BM25 as helpful techniques for text analysis before using cosine similarity. A few users provided concrete examples where cosine similarity produced misleading results, further reinforcing the author's warning.
Summary of Comments ( 200 )
https://news.ycombinator.com/item?id=44037941
Several commenters on Hacker News agreed with the author's sentiment about the hype surrounding AI in science, pointing out that the "low-hanging fruit" has already been plucked and that significant advancements are becoming increasingly difficult. Some highlighted the importance of domain expertise and the limitations of relying solely on AI, emphasizing that AI should be a tool used by experts rather than a replacement for them. Others discussed the issue of reproducibility and the "black box" nature of some AI models, making scientific validation challenging. A few commenters offered alternative perspectives, suggesting that AI still holds potential but requires more realistic expectations and a focus on specific, well-defined problems. The misleading nature of visualizations generated by AI was also a point of concern, with commenters noting the potential for misinterpretations and the need for careful validation.
The Hacker News post titled "I got fooled by AI-for-science hype–here's what it taught me" generated a moderate discussion with several insightful comments. Many commenters agreed with the author's core premise that AI hype in science, particularly regarding drug discovery and materials science, often oversells the current capabilities.
Several users highlighted the distinction between using AI for discovery versus optimization. One commenter pointed out that AI excels at optimizing existing solutions, making incremental improvements based on vast datasets. However, they argued it's less effective at genuine discovery, where novel concepts and breakthroughs are needed. This was echoed by another who mentioned that drug discovery often involves an element of "luck" and creative leaps that AI struggles to replicate.
Another recurring theme was the "garbage in, garbage out" problem. Commenters stressed that AI models are only as good as the data they're trained on. In scientific domains, this can be problematic due to limited, biased, or noisy datasets. One user specifically discussed materials science, explaining that the available data is often incomplete or inconsistent, hindering the effectiveness of AI models. Another mentioned that even within drug discovery, datasets are often proprietary and not shared, further limiting the potential of large-scale AI applications.
Some commenters offered a more nuanced perspective, acknowledging the hype while also recognizing the potential of AI. One suggested that AI could be a valuable tool for scientists, particularly for automating tedious tasks and analyzing complex data, but it shouldn't be seen as a replacement for human expertise and intuition. Another commenter argued that AI's role in science is still evolving, and while current applications may be overhyped, future breakthroughs are possible as the technology matures and datasets improve.
A few comments also touched on the economic incentives driving the AI hype. One user suggested that venture capital and media attention create pressure to exaggerate the potential of AI, leading to unrealistic expectations and inflated claims. Another mentioned the "publish or perish" culture in academia, which can incentivize researchers to oversell their results to secure funding and publications.
Overall, the comments section presents a generally skeptical view of the current state of AI-for-science, highlighting the limitations of existing approaches and cautioning against exaggerated claims. However, there's also a recognition that AI holds promise as a scientific tool, provided its limitations are acknowledged and expectations are tempered.