During its early beta phase, Spotify reportedly used unlicensed MP3 files sourced from various locations, including The Pirate Bay, according to TorrentFreak. The files were apparently utilized as placeholders while the company secured proper licensing agreements with rights holders. This practice allegedly allowed Spotify to quickly build a vast music library for testing and development purposes before its official launch. While the company later replaced these files with licensed versions, the revelation sheds light on the challenges faced by nascent streaming services in navigating complex copyright issues.
Meta is arguing that its platform hosting pirated books isn't illegal because they claim there's no evidence they're "seeding" (actively uploading and distributing) the copyrighted material. They contend they're merely "leeching" (downloading), which they argue isn't copyright infringement. This defense comes as publishers sue Meta for hosting and facilitating access to vast quantities of pirated books on platforms like Facebook and Instagram, claiming significant financial harm. Meta asserts that publishers haven't demonstrated that the company is contributing to the distribution of the infringing content beyond simply allowing users to access it.
Hacker News users discuss Meta's defense against accusations of book piracy, with many expressing skepticism towards Meta's "we're just a leech" argument. Several commenters point out the flaw in this logic, arguing that downloading constitutes an implicit form of seeding, as portions of the file are often shared with other peers during the download process. Others highlight the potential hypocrisy of Meta's position, given their aggressive stance against copyright infringement on their own platforms. Some users also question the article's interpretation of the legal arguments, and suggest that Meta's stance may be more nuanced than portrayed. A few commenters draw parallels to previous piracy cases involving other companies. Overall, the consensus leans towards disbelief in Meta's defense and anticipates further legal challenges.
A US judge ruled in favor of Thomson Reuters, establishing a significant precedent in AI copyright law. The ruling affirmed that Westlaw, Reuters' legal research platform, doesn't infringe copyright by using data from rival legal databases like Casetext to train its generative AI models. The judge found the copied material constituted fair use because the AI uses the data differently than the original databases, transforming the information into new formats and features. This decision indicates that using copyrighted data for AI training might be permissible if the resulting AI product offers a distinct and transformative function compared to the original source material.
HN commenters generally agree that Westlaw's terms of service likely prohibit scraping, regardless of copyright implications. Several point out that training data is generally considered fair use, and question whether the judge's decision will hold up on appeal. Some suggest the ruling might create a chilling effect on open-source LLMs, while others argue that large companies will simply absorb the licensing costs. A few commenters see this as a positive outcome, forcing AI companies to pay for the data they use. The discussion also touches upon the potential for increased competition and innovation if smaller players can access data more affordably than licensing Westlaw's content.
OpenAI alleges that DeepSeek AI, a Chinese AI company, improperly used its large language model, likely GPT-3 or a related model, to train DeepSeek's own competing large language model called "DeepSeek Coder." OpenAI claims to have found substantial code overlap and distinctive formatting patterns suggesting DeepSeek scraped outputs from OpenAI's model and used them as training data. This suspected unauthorized use violates OpenAI's terms of service, and OpenAI is reportedly considering legal action. The incident highlights growing concerns around intellectual property protection in the rapidly evolving AI field.
Several Hacker News commenters express skepticism of OpenAI's claims against DeepSeek, questioning the strength of their evidence and suggesting the move is anti-competitive. Some argue that reproducing the output of a model doesn't necessarily imply direct copying of the model weights, and point to the possibility of convergent evolution in training large language models. Others discuss the difficulty of proving copyright infringement in machine learning models and the broader implications for open-source development. A few commenters also raise concerns about the legal precedent this might set and the chilling effect it could have on future AI research. Several commenters call for OpenAI to release more details about their investigation and evidence.
Summary of Comments ( 189 )
https://news.ycombinator.com/item?id=43169461
Hacker News users discuss the implications of Spotify using pirated MP3s during its beta phase. Some commenters downplay the issue, suggesting it was a pragmatic approach in a pre-streaming era, using readily available files for testing functionality, and likely involving low-quality, variable bitrate MP3s unsuitable for a final product. Others express skepticism that Spotify didn't know the files' source, highlighting the easily identifiable metadata associated with Pirate Bay releases. Several users question the legal ramifications, particularly if Spotify benefited commercially from using these pirated files, even temporarily. The possibility of embedded metadata revealing the piracy is also raised, leading to discussions about user privacy implications. A few commenters point out that the article doesn't accuse Spotify of serving pirated content to users, focusing instead on their internal testing.
The Hacker News thread discussing the TorrentFreak article about Spotify's beta allegedly using pirated MP3 files has a moderate number of comments, offering various perspectives on the situation.
Several commenters express skepticism about the TorrentFreak article's claims, questioning the evidence presented. One commenter points out the lack of specific details in the article, such as the exact number of pirated files allegedly used or how they were identified. They argue that the article relies heavily on speculation and doesn't provide concrete proof. Another user echoes this sentiment, suggesting the article's phrasing is designed to be sensationalist rather than factual. They propose alternative, more plausible explanations for the findings, such as Spotify using third-party libraries that might have inadvertently included some pirated content.
Some comments discuss the technical aspects of audio fingerprinting and how it might be prone to errors, leading to false positives. They explain how slight variations in encoding or metadata could cause a file to be misidentified as pirated, even if it's from a legitimate source. This raises questions about the reliability of the methods used to identify the allegedly pirated files within Spotify's beta.
A few commenters delve into the legal ramifications of the situation, discussing the potential consequences for Spotify if the allegations are proven true. They mention copyright infringement and the possibility of lawsuits from rights holders. Others discuss the complexities of music licensing and the challenges faced by streaming services in ensuring all their content is legally obtained.
Other commenters express a cynical view of the music industry, suggesting that such practices might be more common than acknowledged. They speculate about the pressures faced by streaming platforms to acquire content quickly and cheaply, which might lead them to cut corners.
Finally, a handful of comments offer more lighthearted takes, making jokes about the irony of a music streaming service using pirated content or reminiscing about the early days of file sharing.
Overall, the comments section reflects a mixture of skepticism, technical analysis, legal considerations, and cynical humor. While some accept the article's claims at face value, many express reservations about the evidence and offer alternative interpretations. The thread provides a valuable platform for discussing the complexities of digital music distribution and the challenges of copyright enforcement in the online age.