hackslash dot org

Long Read: Lessons from Building Semantic Search for GitHub and Why I Failed

Posted: 2025-03-08 12:23:46

The author attempted to build a free, semantic search engine for GitHub using a Sentence-BERT model and FAISS for vector similarity search. While initial results were promising, scaling proved insurmountable due to the massive size of the GitHub codebase and associated compute costs. Indexing every repository became computationally and financially prohibitive, particularly as the model struggled with context fragmentation from individual code snippets. Ultimately, the project was abandoned due to the unsustainable balance between cost, complexity, and the limited resources of a solo developer. Despite the failure, the author gained valuable experience in large-scale data processing, vector databases, and the limitations of current semantic search technology when applied to a vast and diverse codebase like GitHub.

This extensive blog post chronicles the author's ambitious journey to create and launch a free, publicly available semantic search engine specifically designed for GitHub repositories, ultimately culminating in the project's discontinuation. The author meticulously details the various stages of development, from the initial spark of inspiration – a desire to improve upon keyword-based searches and leverage the wealth of code and documentation available on GitHub – through the intricate technical challenges encountered and the eventual reasons for its failure.

The project's core functionality revolved around utilizing advanced natural language processing techniques, specifically transformer models, to understand the semantic meaning behind search queries and match them with relevant code snippets, repositories, and documentation. The author explains the process of selecting and fine-tuning pre-trained models, including experimenting with different model architectures and datasets to optimize search performance. This included meticulous data preparation involving cleaning, filtering, and transforming GitHub data into a suitable format for training and indexing. A significant portion of the post delves into the complexities of vector embedding generation, a crucial step in enabling semantic search by representing code and text as numerical vectors that capture their underlying meaning.

The author transparently discusses the infrastructure challenges faced in building and maintaining such a computationally intensive service. Hosting and scaling the search index, managing the computational resources required for inference, and handling the anticipated query load proved to be significant hurdles. The blog post details the various cloud computing platforms and technologies explored, the associated costs, and the trade-offs considered in attempting to balance performance and affordability.

A major contributing factor to the project's downfall was the unexpected and substantial financial burden. The author candidly shares the escalating costs of cloud computing resources, particularly the expenses associated with storing and querying the vast vector embeddings database required for semantic search. Despite exploring various optimization strategies, the financial strain became unsustainable, ultimately forcing the decision to discontinue the project.

Beyond the financial constraints, the author also reflects on other lessons learned throughout the process. These include the complexities of managing large-scale data processing pipelines, the challenges of achieving optimal search relevance and performance, and the importance of considering long-term sustainability and cost-effectiveness from the outset. The post concludes with a thoughtful analysis of the project's shortcomings and offers valuable insights for anyone embarking on similar endeavors in the realm of semantic search and large language model applications. The author also expresses gratitude for the support received from the open-source community and acknowledges the valuable experience gained despite the project's ultimate outcome.

Summary of Comments ( 4 )
https://news.ycombinator.com/item?id=43299659

HN commenters largely praised the author's transparency and detailed write-up of their project. Several pointed out the inherent difficulties and nuances of semantic search, particularly within the vast and diverse codebase of GitHub. Some suggested alternative approaches, like focusing on a smaller, more specific domain within GitHub or utilizing existing tools like Elasticsearch with careful tuning. The cost of running such a service and the challenges of monetization were also discussed, with some commenters skeptical of the free model. A few users shared their own experiences with similar projects, echoing the author's sentiments about the complexity and resource intensity of semantic search. Overall, the comments reflected an appreciation for the author's journey and the lessons learned, contributing further insights into the challenges of building and scaling a semantic search engine.

The Hacker News post discussing the article "What I Learned Building a Free Semantic Search Tool for GitHub and Why I Failed" has generated a number of comments exploring different facets of the author's experience.

Several commenters discuss the challenges of building and maintaining free products. One commenter points out the often unsustainable nature of offering free services, especially when substantial infrastructure costs are involved. They highlight the difficulty of balancing the desire to provide a valuable tool to the community with the financial realities of operating such a service. Another commenter echoes this sentiment, emphasizing the considerable effort required to handle scaling and infrastructure for a free product, often leading to burnout for the developer. This commenter suggests alternative models like a "sponsorware" approach where users are encouraged to contribute financially if they find the tool valuable.

The conversation also delves into the technical aspects of semantic search. One commenter questions the choice of using Sentence-BERT embeddings, suggesting that other embedding methods might be more suitable for code search, particularly those that understand the structure and syntax of code rather than just the natural language elements. They also suggest that fine-tuning a more general model on code-specific data would likely yield better results. Another comment thread discusses the difficulties of achieving high accuracy and relevance in semantic search, especially in the context of code where specific terminology and context are crucial.

The business model and potential paths to monetization are also discussed. Some suggest exploring options like paid tiers with enhanced features or focusing on a niche market within the developer community. One commenter mentions the success of GitHub's own code search, which leverages significant resources and data, highlighting the competitive landscape for such a tool. Another commenter proposes partnering with a company that could benefit from such a search tool, potentially integrating it into their existing platform or workflow.

Finally, several commenters express appreciation for the author's transparency and willingness to share their learnings, acknowledging the value of such post-mortems for the broader developer community. They commend the author for documenting the challenges and insights gained from the project, even though it ultimately didn't achieve its initial goals.

Show HN: Transform Your Codebase into a Single Markdown Doc for Feeding into AI

permalink

Posted: 2025-02-14 13:23:23

CodeWeaver is a tool that transforms an entire codebase into a single, navigable markdown document designed for AI interaction. It aims to improve code analysis by providing AI models with comprehensive context, including directory structures, filenames, and code within files, all linked for easy navigation. This approach enables large language models (LLMs) to better understand the relationships within the codebase, perform tasks like code summarization, bug detection, and documentation generation, and potentially answer complex queries that span multiple files. CodeWeaver also offers various formatting and filtering options for customizing the generated markdown to suit specific LLM needs and optimize token usage.

The Hacker News post titled "Show HN: Transform Your Codebase into a Single Markdown Doc for Feeding into AI" introduces a new tool called CodeWeaver designed to facilitate improved interaction between large codebases and Large Language Models (LLMs). The author posits that current methods of feeding code to LLMs, such as providing snippets or limited files, are insufficient for tasks requiring comprehensive codebase understanding. These limitations, they argue, prevent LLMs from effectively performing complex tasks like comprehensive refactoring, accurate code analysis, and the generation of meaningful documentation.

CodeWeaver addresses this problem by converting an entire codebase into a single, structured Markdown document. This document meticulously organizes the code's components, including files, classes, functions, and their associated documentation, into a hierarchical and interconnected representation. The structure leverages Markdown's inherent hierarchy with headings, subheadings, and lists to delineate the relationships between different code elements. Crucially, the tool also incorporates crucial metadata, such as file paths and function signatures, within the Markdown structure, ensuring that the LLM receives a complete and contextualized understanding of the codebase. This approach aims to provide the LLM with a holistic view, enabling it to grasp the intricate connections and dependencies within the code.

The post highlights several potential use cases for CodeWeaver, emphasizing its ability to empower LLMs to perform more sophisticated tasks. These include tasks such as generating comprehensive project documentation, performing in-depth code analysis to identify potential bugs or areas for improvement, and executing substantial code refactoring across the entire codebase. The author suggests that this holistic representation allows LLMs to analyze and manipulate code with a level of understanding previously unattainable using traditional, fragmented input methods.

Finally, the post presents a live demo of CodeWeaver hosted on their website, tesserato.web.app, inviting users to explore the functionality and test its capabilities. The demo allows users to process their own codebases and visualize the resulting Markdown output. The author encourages feedback and contributions, suggesting a keen interest in community involvement in further development and refinement of the tool.

Summary of Comments ( 61 )
https://news.ycombinator.com/item?id=43048027

HN users discussed the practical applications and limitations of converting a codebase into a single Markdown document for AI processing. Some questioned the usefulness for large projects, citing potential context window limitations and the loss of structural information like file paths and module dependencies. Others suggested alternative approaches like using embeddings or tree-based structures for better code representation. Several commenters expressed interest in specific use cases, such as generating documentation, code analysis, and refactoring suggestions. Concerns were also raised about the computational cost and potential inaccuracies of processing large Markdown files. There was some skepticism about the "one giant markdown file" approach, with suggestions to explore other methods for feeding code to LLMs. A few users shared their own experiences and alternative tools for similar tasks.

The Hacker News post "Show HN: Transform Your Codebase into a Single Markdown Doc for Feeding into AI" generated a moderate amount of discussion, with a focus on the practicality and potential pitfalls of the approach.

Several commenters questioned the usefulness of converting an entire codebase into a single Markdown document for AI consumption. One commenter argued that this approach loses valuable structural information inherent in the code's organization and relationships between files, which are crucial for accurate analysis by Large Language Models (LLMs). They suggested that preserving the directory structure and using tools designed for code analysis would be more beneficial. Another user expressed concern about the potential for exceeding context limits of LLMs with such large documents, leading to truncated or inaccurate analyses. They also raised the issue of losing context between disparate files when they're flattened into a single document.

Other comments highlighted alternative approaches that might be more effective. One commenter suggested leveraging tools specifically designed for code comprehension and querying, such as tree-sitter, which can parse code into an abstract syntax tree (AST). This structured representation maintains the code's organization and relationships, enabling more precise and insightful AI-driven analysis. Another commenter pointed out that many LLMs are already capable of interacting directly with codebases in their native format, making the Markdown conversion step potentially redundant.

There was also skepticism regarding the scalability and maintainability of the proposed solution. One user questioned the feasibility of managing and updating such a large Markdown document as the codebase evolves, suggesting that it would quickly become unwieldy. Another comment suggested that existing documentation tools and practices, combined with targeted AI queries, might be a more pragmatic approach.

While some commenters expressed interest in exploring the concept further or suggested potential use cases for specific scenarios like documentation generation, the overall sentiment leaned towards skepticism. Many felt the proposed method was not the optimal way to leverage AI for code analysis and offered alternative, potentially more robust and scalable solutions.

Phind 2: AI search with visual answers and multi-step reasoning

permalink

Posted: 2025-02-13 18:20:29

Phind 2, a new AI search engine, significantly upgrades its predecessor with enhanced multi-step reasoning capabilities and the ability to generate visual answers, including diagrams and code flowcharts. It utilizes a novel method called "grounded reasoning" which allows it to access and process information from multiple sources to answer complex questions, offering more comprehensive and accurate responses. Phind 2 also features an improved conversational mode and an interactive code interpreter, making it a more powerful tool for both technical and general searches. This new version aims to provide clearer, more insightful answers than traditional search engines, moving beyond simply listing links.

Phind, an AI-powered search engine, has announced a significant upgrade with the release of Phind 2. This new iteration boasts substantial advancements in several key areas, pushing the boundaries of what's possible with AI-driven information retrieval. The core enhancements focus on providing more comprehensive, visually rich, and logically reasoned responses to user queries.

One of the most striking new features is the incorporation of visual answers. Phind 2 can now generate diagrams, charts, graphs, and other visual aids directly within the search results, enriching the user experience and facilitating a deeper understanding of complex topics. This visual component is not merely decorative; it's designed to provide substantive information, clarifying intricate concepts and presenting data in an easily digestible format. Imagine searching for the differences between various sorting algorithms; Phind 2 might present a visual animation of each algorithm in action, showcasing their distinct approaches and efficiencies.

Beyond visual enhancements, Phind 2 introduces advanced multi-step reasoning capabilities. This means the AI can now tackle complex questions requiring multiple logical steps or calculations to arrive at a solution. It can break down intricate problems, process information from various sources, and synthesize a coherent and accurate answer. For example, a user could inquire about the optimal trajectory for a rocket launch considering specific atmospheric conditions, and Phind 2 could perform the necessary calculations and present a detailed explanation alongside visual representations.

The underlying architecture of Phind 2 has also undergone substantial refinement. Leveraging recent advancements in large language models (LLMs), Phind 2 incorporates a modified version of the powerful Gemini Pro model, further optimized for information retrieval and complex reasoning tasks. This allows for more nuanced understanding of user intent and the ability to synthesize information from vast datasets with greater accuracy and efficiency. The improvements are not limited to the model itself; the entire system, including the indexing and retrieval mechanisms, has been meticulously optimized to provide faster and more relevant results.

Phind emphasizes a commitment to providing authoritative and trustworthy information. The platform prioritizes sourcing information from reputable sources and actively combats the spread of misinformation. This dedication to accuracy is reflected in the rigorous testing and validation processes employed during the development of Phind 2.

Furthermore, Phind 2 demonstrates improved code generation capabilities, able to produce more accurate and efficient code snippets in various programming languages. This feature is invaluable for developers seeking solutions to coding challenges or looking for examples of specific functionalities. This improvement also extends to explaining complex code, making it easier for users to understand the logic and purpose behind specific code segments.

In essence, Phind 2 represents a significant leap forward in AI-powered search, offering a more intuitive, comprehensive, and visually engaging experience for users seeking information, understanding complex topics, and solving intricate problems. The combination of visual answers, multi-step reasoning, and an enhanced underlying architecture positions Phind 2 as a powerful tool for navigating the ever-expanding landscape of digital information.

Summary of Comments ( 21 )
https://news.ycombinator.com/item?id=43039308

Hacker News users discussed Phind 2's potential, expressing both excitement and skepticism. Some praised its ability to synthesize information and provide visual aids, especially for coding-related queries. Others questioned the reliability of its multi-step reasoning and cited instances where it hallucinated or provided incorrect code. Concerns were also raised about the lack of source citations and the potential for over-reliance on AI tools, hindering deeper learning. Several users compared it favorably to other AI search engines like Perplexity AI, noting its cleaner interface and improved code generation capabilities. The closed-source nature of Phind 2 also drew criticism, with some advocating for open-source alternatives. The pricing model and potential for future monetization were also points of discussion.

The Hacker News post titled "Phind 2: AI search with visual answers and multi-step reasoning" generated a significant discussion with a variety of comments. Several users focused on the apparent improvements in Phind's ability to handle complex, multi-step reasoning problems, often comparing it favorably to other search engines and AI chatbots like Google, Bing, and ChatGPT. Some users shared specific examples of queries where Phind excelled, demonstrating its capacity for coding tasks, explanations of complex topics, and providing visual aids.

A prominent theme in the comments was the perceived superiority of Phind's coding-related capabilities. Users reported that Phind could generate, debug, and explain code more effectively than alternatives. This led to speculation about the underlying model and training data used by Phind, with some suggesting a heavier emphasis on code compared to other models.

Several commenters discussed the potential impact of tools like Phind on the future of search and software development. Some envisioned a shift away from traditional search engines toward AI-powered tools that offer more comprehensive and interactive answers. Others discussed the implications for programmers, suggesting that these tools could automate certain coding tasks, increasing productivity and potentially changing the nature of software development work.

The quality of Phind's visual answers was also a topic of conversation. Users appreciated the inclusion of diagrams and visuals, finding them helpful for understanding complex information. However, there were also mentions of occasional inaccuracies or limitations in the visuals, indicating that this aspect of Phind is still under development.

While many praised Phind 2, some commenters expressed caution and skepticism. Some questioned the long-term viability of the platform, mentioning the high computational costs associated with running such a powerful AI model. Others raised concerns about the potential for bias in the answers and the need for transparency in the underlying workings of the system. The discussion also touched on the broader societal implications of advanced AI, including the potential for job displacement and the importance of responsible development and deployment of these technologies.

Finally, some users shared their personal experiences with Phind, offering anecdotal evidence of its usefulness for various tasks. These personal accounts provided valuable insights into the practical applications of the tool and contributed to a more nuanced understanding of its strengths and weaknesses. Overall, the comments reflected a mixture of excitement, curiosity, and caution about the potential of Phind 2 and the broader implications of advancements in AI-powered search.

Evaluating Code Embedding Models

permalink

Posted: 2025-02-01 02:06:08

Voyage's blog post details their evaluation of various code embedding models for code retrieval tasks. They emphasize the importance of using realistic datasets and evaluation metrics like Mean Reciprocal Rank (MRR) tailored for code search scenarios. Their experiments demonstrate that retrieval performance varies significantly across datasets and model architectures, with specialized models like CodeT5 consistently outperforming general-purpose embedding models. They also found that retrieval effectiveness plateaus as embedding dimensionality increases beyond a certain point, suggesting diminishing returns for larger embeddings. Finally, they introduce a novel evaluation dataset derived from Voyage's internal codebase, aimed at providing a more practical benchmark for code retrieval models in real-world settings.

The Voyage AI blog post, "Evaluating Code Embedding Models," delves into the complexities of assessing the effectiveness of code embedding models, particularly for the task of code retrieval. Code embedding models transform code snippets into vector representations, allowing for semantic similarity searches. This is crucial for tasks like finding relevant code examples, identifying duplicated code, or suggesting potential fixes. The post emphasizes the importance of robust evaluation methodologies to accurately gauge the performance of these models.

The authors argue that commonly used metrics like Mean Average Precision (MAP) or Normalized Discounted Cumulative Gain (NDCG), while useful, can be insufficient for capturing the nuances of code retrieval. They highlight the issue of "easy negatives" – code examples that are trivially dissimilar to the query – which can inflate performance metrics. These metrics might indicate high accuracy even if the model isn't truly understanding the semantic meaning of the code.

To address this, Voyage AI introduces a novel evaluation framework centered around two key concepts: "hard negative mining" and "domain adaptation." Hard negative mining involves specifically selecting negative examples that are semantically similar to the query but not the correct answer. This forces the model to distinguish between subtly different code snippets and thus demonstrates a deeper understanding of code semantics. The blog post details how they generate these hard negatives using a combination of techniques, including leveraging abstract syntax trees (ASTs) and identifying code snippets with similar functionalities but different implementations.

Domain adaptation, the second core element of their framework, tackles the challenge of evaluating models on diverse coding styles and conventions found across different codebases or projects. The post explains that a model trained on one type of code might not perform well on another. Therefore, they advocate for evaluating models on multiple datasets representing different domains, providing a more holistic and realistic assessment of performance.

The post further elucidates the practical implications of their evaluation framework by showcasing its application in comparing different code embedding models. They demonstrate how their approach reveals performance disparities that would be obscured by traditional metrics alone. This nuanced evaluation allows for more informed decisions when selecting or developing code embedding models for specific tasks and codebases. Ultimately, the post champions a more rigorous and comprehensive approach to evaluating code embedding models, emphasizing the importance of considering both hard negatives and domain adaptation for a truly insightful understanding of model performance and its real-world applicability.

Summary of Comments ( 0 )
https://news.ycombinator.com/item?id=42894939

Hacker News users discussed the methodology of Voyage's code retrieval evaluation, particularly questioning the reliance on HumanEval and MBPP benchmarks. Some argued these benchmarks don't adequately reflect real-world code retrieval scenarios, suggesting alternatives like retrieving code from a large corpus based on natural language queries. The lack of open-sourcing for Voyage's evaluated models and datasets also drew criticism, hindering reproducibility and broader community engagement. There was a brief discussion on the usefulness of keyword search as a strong baseline and the potential benefits of integrating semantic search techniques. Several commenters expressed interest in seeing evaluations based on more realistic use cases, including bug fixing or adding new features within existing codebases.

Why DeepSeek had to be open source

permalink

Posted: 2025-01-29 15:37:31

DeepSeek, a platform offering encoder APIs for developers, chose to open-source its core technology due to the inherent difficulty in building trust with users regarding data privacy and security when handling sensitive information like codebases and internal documentation. By open-sourcing, DeepSeek aims to foster transparency and allow users to self-host, ensuring complete control over their data. This approach mitigates concerns around vendor lock-in and allows the community to contribute to the project's development and security, ultimately building greater trust and fostering wider adoption.

The blog post "Why DeepSeek had to be open source," published by Lago, details the strategic rationale behind DeepSeek's decision to embrace an open-source model for their encoder technology. DeepSeek, a company specializing in AI-powered code search, faced the formidable challenge of establishing trust and widespread adoption within the developer community, a group known for its preference for open and transparent tools. The closed-source approach presented a significant obstacle to achieving this goal, as developers are often hesitant to entrust proprietary systems with access to their valuable and often sensitive codebases.

The blog post articulates that open-sourcing the DeepSeek encoder allows developers to thoroughly inspect and understand the underlying mechanisms of the code search technology, fostering trust and confidence in its operation. This transparency eliminates the "black box" effect inherent in closed-source solutions, allowing developers to verify the encoder's security, efficiency, and accuracy firsthand. By providing full visibility into the code, DeepSeek empowers the community to actively contribute to the project, identifying potential vulnerabilities or areas for improvement, leading to a more robust and reliable system. This collaborative development model also benefits DeepSeek directly by leveraging the collective expertise of the open-source community, accelerating the pace of innovation and refinement.

Furthermore, the open-source approach directly addresses the critical issue of data privacy, a major concern for developers when utilizing third-party code analysis tools. By making the encoder's source code publicly available, DeepSeek demonstrates a commitment to transparency and allows developers to verify that the encoder does not exfiltrate sensitive data or intellectual property. This reassurance is essential for gaining the trust of organizations and individual developers, paving the way for wider adoption of the technology.

The post also emphasizes the strategic advantage of open-sourcing the encoder while maintaining the proprietary nature of the vector database technology. This approach allows DeepSeek to offer a commercially viable product while simultaneously benefiting from the open-source community's contributions to the encoder. This dual approach strikes a balance between fostering community engagement and ensuring the long-term sustainability of the business.

Finally, the blog post positions the open-sourcing of the DeepSeek encoder as a crucial step in establishing a robust ecosystem around their technology. By encouraging community involvement and contributions, DeepSeek aims to cultivate a vibrant and active developer ecosystem, driving further innovation and accelerating the adoption of AI-powered code search tools. The open-source model is presented as a catalyst for growth and collaboration, laying the foundation for a thriving community that benefits both developers and DeepSeek.

Summary of Comments ( 242 )
https://news.ycombinator.com/item?id=42866201

Hacker News users discussed the open-sourcing of DeepSeek, primarily focusing on the challenges of monetizing open-source AI infrastructure. Many commenters were skeptical of Lago's business model, questioning how they could successfully build a proprietary offering on top of an open-source core, especially given the intense competition in the vector database space. Some suggested that open-sourcing DeepSeek was a necessary move due to the difficulty of attracting paying customers for a closed-source product. Others pointed out potential advantages, such as faster iteration and community contributions, but remained unconvinced of long-term viability. Several users expressed a desire for more technical details about DeepSeek's implementation and performance compared to existing solutions. The most compelling comments revolved around the inherent tension between open-sourcing and profitability in the current AI landscape.

The Hacker News post "Why DeepSeek had to be open source" (linking to a blog post about the open-sourcing of a vector database called DeepSeek) generated a moderate amount of discussion, with several commenters focusing on the challenges and tradeoffs inherent in open-sourcing complex infrastructure software.

One compelling line of discussion revolved around the difficulty of monetizing open-source infrastructure projects. A commenter pointed out the "challenging economics" of open-sourcing core infrastructure, noting that "it's hard to build a business on top of open core, especially for infrastructure software" and suggested that open-sourcing could be a last resort due to difficulties in acquiring customers. This spurred further discussion about the potential downsides of "open-core" business models, with some expressing skepticism about their long-term viability.

Another commenter highlighted the specific complexities of vector databases, stating that they are "notoriously hard to operate" and require significant expertise. This raises the question of whether open-sourcing DeepSeek might actually hinder its adoption due to the increased burden on users to manage and maintain the database themselves. They further suggested that a managed service offering would likely be more appealing to many potential users, echoing the sentiment about the difficulties of the open-core model in this space.

Several comments touched upon the competitive landscape of vector databases, mentioning alternatives like Pinecone, Weaviate, and Qdrant. One commenter expressed surprise that DeepSeek hadn't already been acquired, suggesting that the vector database space is attracting significant interest and investment.

Finally, a few commenters questioned the blog post's premise that DeepSeek "had to be" open-sourced, suggesting that this framing might be a marketing tactic rather than a genuine necessity. They proposed alternative explanations, such as the possibility that the company was struggling to attract paying customers or that open-sourcing was a way to gain community contributions and improve the software's quality.

In summary, the comments on Hacker News primarily focused on the business implications of open-sourcing DeepSeek, the technical challenges of running vector databases, and the competitive dynamics of the market. Several commenters expressed skepticism about the viability of open-sourcing complex infrastructure software and suggested that a managed service might be a more successful approach.

Stories with Tag Code Search

Long Read: Lessons from Building Semantic Search for GitHub and Why I Failed

Summary of Comments ( 4 ) https://news.ycombinator.com/item?id=43299659

Show HN: Transform Your Codebase into a Single Markdown Doc for Feeding into AI

Summary of Comments ( 61 ) https://news.ycombinator.com/item?id=43048027

Phind 2: AI search with visual answers and multi-step reasoning

Summary of Comments ( 21 ) https://news.ycombinator.com/item?id=43039308

Evaluating Code Embedding Models

Summary of Comments ( 0 ) https://news.ycombinator.com/item?id=42894939

Why DeepSeek had to be open source

Summary of Comments ( 242 ) https://news.ycombinator.com/item?id=42866201

Summary of Comments ( 4 )
https://news.ycombinator.com/item?id=43299659

Summary of Comments ( 61 )
https://news.ycombinator.com/item?id=43048027

Summary of Comments ( 21 )
https://news.ycombinator.com/item?id=43039308

Summary of Comments ( 0 )
https://news.ycombinator.com/item?id=42894939

Summary of Comments ( 242 )
https://news.ycombinator.com/item?id=42866201