Support this and other development on Patreon

Stories with Tag text processing

Show HN: Defuddle, an HTML-to-Markdown alternative to Readability

permalink

Posted: 2025-05-22 21:40:54

Defuddle is an open-source command-line tool that converts HTML to Markdown, aiming to be a simpler and more robust alternative to Readability. It focuses on extracting the main content from web pages while preserving basic formatting like headings, lists, and code blocks, outputting clean Markdown suitable for archiving, note-taking, or further processing. Unlike Readability, which primarily targets article-like content, Defuddle attempts to handle a wider variety of HTML structures. It's written in Go and prioritizes speed and predictable output.

Introducing Defuddle, a novel command-line tool presented on Hacker News as an alternative to Readability for converting HTML content into Markdown. Unlike Readability, which focuses on extracting the main readable content of a webpage for a cleaner reading experience, Defuddle prioritizes faithfully reproducing the structure and formatting of the original HTML document in Markdown format. This makes it particularly suitable for archiving web pages and preserving their original layout, as close to the original HTML as possible within the constraints of Markdown.

Defuddle is written in Go and leverages the power of the Goldmark Markdown parser. It operates by parsing the provided HTML input and then systematically transforming it into Markdown elements. This includes converting HTML headings (h1, h2, etc.) into their Markdown equivalents (#, ##, etc.), paragraphs into Markdown paragraphs, lists (ordered and unordered) into their Markdown counterparts, and links into Markdown link syntax. The tool aims to handle a wide range of HTML elements and attributes, striving to retain the original document's structure and semantic meaning within the Markdown output.

While Readability excels at creating a distilled reading experience by removing clutter and focusing on core content, Defuddle fills a different niche. Its primary objective is not readability optimization, but rather accurate HTML-to-Markdown conversion for purposes such as archiving, documentation, or any situation where preserving the original document's structure is paramount. This approach offers a distinct advantage for users who need a reliable method to convert HTML to Markdown while maintaining the original formatting as accurately as possible, offering a more comprehensive representation of the source material than a readability-focused tool.
Summary of Comments ( 55 )
https://news.ycombinator.com/item?id=44067409

HN commenters generally praised Defuddle for its simplicity and effectiveness in converting HTML to Markdown, particularly for archiving web pages. Several appreciated its focus on content extraction over perfect formatting, finding the resulting Markdown more usable. Some suggested improvements like better image handling, code block formatting, and handling of certain HTML elements. One commenter highlighted its usefulness for researchers and academics, while others compared it favorably to other similar tools, noting Defuddle's speed and accuracy. The project's open-source nature and reliance on a single Go binary were also lauded.

The Hacker News post about "Defuddle, an HTML-to-Markdown alternative to Readability" generated a moderate number of comments, mostly focused on comparing Defuddle to existing tools, discussing potential use cases, and exploring technical aspects.

Several commenters compared Defuddle to Readability, noting that while Readability aims to create a clean reading experience, Defuddle focuses on preserving the original structure and converting it to Markdown. This distinction was highlighted as potentially useful for archiving web pages and making them easily editable. One user specifically mentioned preferring Markdown over the output of Readability for archiving purposes.

The discussion also touched upon alternative tools like pandoc and its limitations with complex HTML. Some commenters suggested that Defuddle might be a better choice for certain websites where pandoc struggles. Another user proposed combining lynx (a text-based web browser) with pandoc as a potential alternative workflow.

The technical implementation of Defuddle was also a topic of interest. One commenter inquired about the choice of Python over Javascript for the project, to which the author (kepano) responded by explaining their preference for Python's ecosystem and the availability of robust HTML parsing libraries. The author also highlighted their choice of Beautiful Soup 4 for HTML parsing and addressed questions regarding the handling of specific elements like <pre> tags and code blocks.

One commenter explored the potential use case of integrating Defuddle into a note-taking workflow, envisioning a scenario where web content could be easily converted to Markdown and incorporated into notes. They also suggested exploring the use of Readability's API to improve the cleaning process, while acknowledging potential cost implications.

Finally, some users shared their positive experiences with Defuddle, praising its simplicity and effectiveness. One commenter even reported successful usage on a challenging website where other tools had failed.

In summary, the comments section offered a valuable discussion around Defuddle, comparing it to existing tools, exploring its potential uses, and delving into some of its technical aspects. The comments generally highlighted the potential of Defuddle as a useful tool for converting HTML to Markdown, especially for archiving and editing web content.
Far – Sublime Inspired Find and Replace

permalink

Posted: 2025-05-21 11:07:48

Far is a command-line find and replace tool inspired by Sublime Text's powerful search functionality. It allows for regular expression searches and replacements across multiple files and directories, offering features like case sensitivity toggling, whole word matching, and previewing changes before applying them. Far aims to provide a fast, intuitive, and versatile command-line experience for efficiently manipulating text within files, similar to the ease and control offered by Sublime Text's editor.

The GitHub repository titled "Far – Sublime Inspired Find and Replace" introduces a command-line utility named far that aims to replicate the efficient and intuitive find-and-replace functionality found in the popular text editor Sublime Text. This tool is specifically designed for use within a terminal environment and focuses on providing a streamlined workflow for searching and modifying text within files. It leverages regular expressions, offering the flexibility and power to perform complex pattern matching and substitutions.

far distinguishes itself through its user-friendly interface, mimicking Sublime Text's approach by presenting a clear and concise display of search results. This presentation allows users to quickly preview the context of each match and selectively choose which instances to modify. This interactive selection process enhances precision and control, minimizing the risk of unintended changes. Furthermore, far boasts an exceptionally fast search engine, enabling near-instantaneous results even when operating on large codebases or extensive text files. This speed significantly improves productivity, particularly for tasks involving frequent searches and replacements.

The project's documentation emphasizes the tool's ease of use and minimal setup, suggesting a straightforward integration into existing development workflows. The provided examples illustrate how far simplifies common text manipulation tasks, such as renaming variables or refactoring code. While inspired by Sublime Text's find-and-replace feature, far is implemented as a standalone command-line utility, offering a dedicated and optimized solution specifically for terminal-based text processing. Its focus on speed, combined with its interactive and selective replacement capabilities, positions it as a powerful tool for developers and anyone working with text files in a terminal environment.
Summary of Comments ( 5 )
https://news.ycombinator.com/item?id=44050211

Hacker News users generally praised far for its speed and minimalist design, drawing favorable comparisons to Sublime Text's search functionality. Several commenters appreciated its keyboard-centric approach and the ability to easily integrate it into existing workflows. Some suggested improvements like adding support for regular expressions, while others noted potential conflicts with existing tools using the same name. The discussion also touched upon the benefits of using Rust for such tools, highlighting its performance characteristics. Some users expressed interest in similar tools for other operating systems besides Linux.

The Hacker News post "Far – Sublime Inspired Find and Replace" discussing the far command-line tool spawned a modest discussion with several insightful comments.

Several users expressed enthusiasm for the tool and shared their appreciation for its features. One user mentioned their fondness for Sublime Text's search functionality, stating they "absolutely adore Sublime's approach to find and replace," and expressing excitement over a command-line tool replicating this experience. They also noted the tool's potential to be a valuable addition to their workflow. Another user echoed this sentiment, calling far a "nice tool," while pointing out its similarity to another tool called fzf. This comparison prompted further discussion about the advantages and disadvantages of each.

The conversation then delved into a more technical aspect, with a user highlighting far's dependence on ripgrep, a popular command-line search tool. They praised ripgrep for its speed and efficiency, stating it's "blazing fast" and expressing their satisfaction with far leveraging its capabilities. This comment led to a brief discussion on the benefits of using ripgrep as a foundation for search-based tools.

One user expressed a desire for specific features, suggesting the inclusion of an option to perform replacements across multiple files. This suggestion prompted another user to provide a concrete command-line example demonstrating how to achieve this functionality using far in conjunction with sed. This helpful interaction showcased the practical application of the tool and its extensibility through piping with other command-line utilities.

Finally, the original poster (OP) of the Hacker News submission chimed in, responding to some of the comments and addressing user feedback. They acknowledged the comparison to fzf, explaining their motivation for creating far stemmed from wanting a more integrated and seamless experience for find and replace operations, particularly in the context of coding and project-wide searches. They also responded to the feature request for multi-file replacement, thanking the user for the suggestion and expressing their intention to consider it for future development.

This summary encompasses the core points of the discussion on Hacker News. The comments reflect a general positive reception for the far tool, appreciation for its underlying use of ripgrep, and a constructive dialogue around its features and potential improvements.
The emoji problem (2022)

permalink

Posted: 2025-05-20 10:18:15

The "emoji problem" describes the difficulty of reliably rendering emoji across different platforms and devices. Due to variations in emoji fonts, operating systems, and even software versions, the same emoji codepoint can appear drastically different, potentially leading to miscommunication or altered meaning. This inconsistency stems from the fact that Unicode only defines the meaning of an emoji, not its specific visual representation, leaving individual vendors to design their own glyphs. The post emphasizes the complexity this introduces for developers, particularly when trying to ensure consistent experiences or accurately interpret user input containing emoji.

The blog post, "The Emoji Problem (2022)," delves into a complex issue arising from the increasing prevalence of emojis in online communication, specifically within the context of mathematical discussions on the Art of Problem Solving (AoPS) online community. The author meticulously outlines the challenges posed by the rendering inconsistencies of emojis across different platforms and browsers. This variability, the author argues, leads to a breakdown in clear communication, especially when emojis are incorporated into mathematical expressions or logical arguments where precise interpretation is paramount.

The core of the problem lies in the fact that emojis are not standardized in the same way that traditional mathematical symbols are. While a symbol like "+" universally represents addition, an emoji's appearance can vary significantly depending on the user's operating system, browser, or even the specific version of that software. This creates a situation where what one user intends to convey with a specific emoji might be visually interpreted differently by another user, leading to potential miscommunication or confusion. The author emphasizes the importance of unambiguous communication in mathematical discourse, pointing out how even minor discrepancies in the rendering of an emoji can alter the intended meaning of an equation or logical statement.

The post further elaborates on the technical underpinnings of this issue, explaining that emojis are essentially encoded as Unicode characters. While the Unicode standard defines the underlying meaning of each emoji, it does not dictate its visual representation. This visual rendering is left up to the individual platforms and software implementations, creating the observed inconsistencies. This decentralized approach to emoji rendering, while offering flexibility in design, introduces a significant obstacle for contexts requiring precise and universally understood symbology, such as mathematics.

The author illustrates the problem with concrete examples, demonstrating how the varying appearances of seemingly simple emojis, like arrows or checkmarks, can lead to different interpretations of mathematical expressions or logical statements. These examples highlight the potential for miscommunication and the subsequent difficulties in collaborative problem-solving within the AoPS community. The post ultimately underscores the need for a more standardized approach to emoji rendering, particularly in environments where precise communication is crucial, to ensure that the intended meaning is effectively conveyed regardless of the platform or browser used. It implicitly raises the question of whether emojis, in their current state, are suitable for use in formal mathematical discourse given their inherent rendering inconsistencies.
Summary of Comments ( 23 )
https://news.ycombinator.com/item?id=44039864

HN commenters generally found the "emoji problem" interesting and well-presented. Several appreciated the clear explanation of the mathematical concepts, even for those without a strong math background. Some discussed the practical implications, particularly regarding Unicode complexity and potential performance issues arising from combinatorial explosions when handling emoji modifiers. One commenter pointed out the connection to the "billion laughs" XML attack, highlighting the potential for abuse of such combinatorial systems. Others debated the merits of the proposed solutions, focusing on complexity and performance trade-offs. A few users shared their own experiences with emoji-related programming challenges, including issues with rendering and parsing.

The Hacker News post titled "The emoji problem (2022)" has several comments discussing the linked article about emoji identifiers and their potential issues.

One commenter points out the complexity and overhead introduced by using sequences of emojis, especially when considering different vendors and platforms. They highlight the challenges in parsing and rendering these sequences correctly and suggest that plain text might be a more efficient approach.

Another commenter focuses on the technical aspects of Unicode and how emoji are encoded, drawing parallels with the complexities of handling different character encodings in the past. They question the long-term viability of the current emoji system, especially as it continues to expand and evolve.

A different comment thread discusses the potential for ambiguity and misinterpretation of emoji sequences, particularly across different cultural contexts. The lack of a standardized meaning for all emoji combinations raises concerns about effective communication.

Several commenters express frustration with the increasing use of emojis in professional communication, arguing that they can be unprofessional and detract from clarity. They express a preference for plain text communication in formal settings.

One commenter sarcastically suggests that the complexity of emoji rendering and parsing could be used as a challenging interview question for software engineers.

Another commenter humorously observes how the evolution of emoji and their associated problems mirrors the historical development of other technologies, where initial simplicity gives way to increasing complexity over time.

A recurring theme in the comments is the tension between the expressive potential of emojis and the technical and interpretative challenges they introduce. While acknowledging the usefulness of emojis in certain contexts, many commenters express concerns about their overuse and potential for miscommunication.

Some commenters suggest alternative solutions, such as using shortcodes or standardized keywords to represent complex concepts, rather than relying on potentially ambiguous emoji sequences. They argue that this approach could offer the benefits of emoji-like expression while mitigating the technical and interpretive challenges.
The Turkish İ Problem and Why You Should Care (2012)

permalink

Posted: 2025-05-06 08:34:17

The "Turkish İ Problem" arises from the difference in how the Turkish language handles the lowercase "i" and its uppercase counterpart. Unlike many languages, Turkish has two distinct uppercase forms: "İ" (with a dot) corresponding to lowercase "i," and "I" (without a dot) corresponding to the lowercase undotted "ı". This causes problems in string comparisons and other operations, especially in software that assumes a one-to-one mapping between uppercase and lowercase letters. Failing to account for this linguistic nuance can lead to bugs, data corruption, and security vulnerabilities, particularly when dealing with user authentication, sorting, or database lookups involving Turkish text. The post highlights the importance of proper Unicode handling and culturally-aware programming to avoid such issues and create truly internationalized applications.

Phil Haack, in his 2012 blog post titled "The Turkish İ Problem and Why You Should Care," delves into a seemingly minor yet impactful internationalization issue stemming from the intricacies of the Turkish language. He elucidates how the seemingly simple act of converting a string to uppercase or lowercase can lead to unexpected and problematic results, particularly when dealing with the Turkish dotted and dotless 'I' characters.

The core of the problem lies in the non-one-to-one mapping between uppercase and lowercase letters in Turkish. Unlike many languages where a single lowercase letter has a single uppercase counterpart, and vice-versa, Turkish possesses two distinct representations of the letter 'I': one with a dot (İ/i) and one without (I/ı). This duality introduces complexity when performing case conversions. Simply applying standard uppercase and lowercase functions can yield incorrect results. For example, the lowercase 'i' becomes 'İ' (capital I with a dot) when uppercased, and the uppercase 'I' becomes 'ı' (lowercase i without a dot) when lowercased. This behavior, while correct according to the Turkish language rules, can be surprising and problematic for developers accustomed to the more conventional one-to-one mappings of other languages.

Haack meticulously explains how this seemingly insignificant detail can wreak havoc in various software applications. He uses concrete examples, such as searching and sorting, to illustrate how case-insensitive comparisons can fail when the Turkish 'I' characters are involved. Imagine a user searching for "Illinois" in a database that contains the entry "İllinois" (with a dotted capital I). A naive case-insensitive comparison, which simply converts both strings to lowercase using standard functions, would result in "illinois" and "ıllinois" (with a dotless lowercase I), causing the search to fail despite the intended match.

Furthermore, Haack discusses the broader implications for internationalization and localization, emphasizing the importance of considering language-specific rules when developing software intended for a global audience. He highlights the need for cultural awareness and the utilization of appropriate libraries and frameworks that handle these linguistic nuances correctly. He specifically mentions the use of culture-aware string comparison methods provided by .NET and other frameworks, which allow developers to specify the culture context for accurate case conversions and comparisons.

Ultimately, Haack's post serves as a cautionary tale for developers, underscoring the importance of understanding and addressing the nuances of different languages and cultures. He advocates for proactive consideration of internationalization from the outset of the development process, rather than treating it as an afterthought, to avoid potential pitfalls and ensure that software functions correctly and inclusively for users around the world. The Turkish 'İ' problem, while seemingly specific, represents a broader lesson about the complexities of global software development and the need for meticulous attention to linguistic detail.
Summary of Comments ( 105 )
https://news.ycombinator.com/item?id=43902869

Hacker News users discuss various aspects of the Turkish İ problem. Several commenters highlight how this issue exemplifies broader Unicode and character encoding challenges faced by developers. One points out the importance of understanding normalization and case folding for correct string comparisons, referencing Python's locale.strxfrm() as a useful tool. Others share anecdotes of encountering similar problems with other languages, emphasizing the need for robust Unicode handling. The discussion also touches on the role of language-specific sorting rules and the complexities they introduce, with one commenter specifically mentioning issues with the German "ß" character. A few users suggest using libraries that handle Unicode correctly, emphasizing that these problems underscore the importance of proper internationalization and localization practices in software development.

The Hacker News post linking to "The Turkish İ Problem and Why You Should Care" has a moderate number of comments, discussing various aspects of the topic, primarily focusing on Unicode, character encoding, and the challenges of internationalization.

Several commenters share personal anecdotes of encountering similar issues with other languages, highlighting the broader problem of character encoding and its impact on software development. One commenter mentions problems with German umlauts, while another discusses issues with the character sets of various Slavic languages. These anecdotes reinforce the article's point about the importance of proper Unicode handling.

A significant portion of the discussion revolves around the technical details of Unicode and different character encodings. Commenters delve into the specifics of UTF-8, ASCII, and other encoding schemes, explaining how these systems represent characters and the potential pitfalls of misinterpreting or incorrectly converting between them. One comment specifically discusses the importance of normalizing Unicode strings to a consistent form to avoid comparison issues arising from different representations of the same character.

Some comments explore the practical implications of the Turkish İ problem, such as difficulties in sorting and searching text. This reinforces the article's argument that seemingly minor character encoding issues can have significant real-world consequences.

A few commenters offer solutions and best practices for handling Unicode correctly. They recommend using UTF-8 consistently throughout the entire software stack and emphasizing the importance of understanding the nuances of character encoding. One comment points out the value of libraries and tools specifically designed for handling Unicode correctly, minimizing the risk of encountering these types of issues.

A couple of comments offer a more humorous perspective, highlighting the absurdity of the situation and the frustration developers experience when dealing with character encoding problems.

Overall, the comments section provides valuable context and expands upon the article's main points. It reinforces the importance of proper Unicode handling in software development and offers practical advice for avoiding common pitfalls, while also showcasing the challenges and frustrations that developers face when dealing with the complexities of internationalization.
Show HN: TextQuery – Query CSV, JSON, XLSX Files with SQL

permalink

Posted: 2025-05-05 16:59:15

TextQuery is a web application that allows users to query CSV, JSON, and XLSX files using SQL. It simplifies data analysis by providing a familiar SQL interface to explore and filter data directly within the browser, eliminating the need for specialized software or complex scripting. Users can upload their files, write SQL queries against them, and instantly view the results in a tabular format. The service aims to be a quick and easy way to analyze structured data, particularly for those already comfortable with SQL.

A new web application called TextQuery has been launched, offering a streamlined way to query various text-based data formats, including CSV, JSON, XLSX, and delimited text files, using familiar SQL syntax. This tool aims to simplify the process of data exploration and analysis without requiring users to import their data into a traditional database system or write complex scripting code. The application boasts a user-friendly interface where users can directly paste their data, upload a file, or fetch data from a URL. Once the data is loaded, TextQuery automatically infers the schema, allowing users to immediately begin writing and executing SQL queries against the data. The results of the queries are then displayed in a clear, tabular format within the application. This eliminates the need for specialized software or extensive coding knowledge, making data analysis more accessible to a wider audience. TextQuery is positioned as a convenient tool for quick data exploration, ad-hoc analysis, and prototyping, particularly for tasks involving moderately sized datasets that don't necessitate the full capabilities of a dedicated database management system. It offers a lightweight and efficient solution for those seeking to leverage the power of SQL for simple data manipulation and retrieval directly within their web browser.
Summary of Comments ( 44 )
https://news.ycombinator.com/item?id=43897129

HN users generally expressed interest in TextQuery, praising its simplicity and potential usefulness for quick data analysis. Some compared it to other similar tools like q and visidata, suggesting TextQuery differentiates itself with a more approachable SQL interface beneficial for non-technical users. Several commenters brought up potential improvements, including support for larger files, more advanced SQL features like joins, and the ability to handle different delimiters in CSV files. One commenter highlighted the licensing model as a potential drawback, preferring a self-hosted or open-source option. Concerns about privacy and data security for cloud-based solutions were also raised.

The Hacker News post for TextQuery, a tool for querying CSV, JSON, and XLSX files using SQL, has generated a moderate amount of discussion. Several commenters express interest and appreciation for the tool, finding the concept useful and well-executed.

One commenter points out the potential benefits of using a standard like SQL for querying various data formats, simplifying the process and eliminating the need to learn different tools or libraries for each format. They see this as a significant advantage, especially for those already familiar with SQL.

Another commenter praises the project's simplicity and ease of use, particularly highlighting the user-friendly web interface. They appreciate the ability to quickly load and query data without complex setup or configuration.

A few commenters raise questions about the project's underlying implementation and performance, specifically inquiring about the database used to process the queries. The creator clarifies that DuckDB is used, which is known for its efficiency in handling analytical queries.

There's also a discussion about potential use cases, with some commenters suggesting applications in data analysis, exploration, and transformation. One user specifically mentions using it for cleaning and preparing data for further processing.

A point of concern raised by one commenter is the lack of support for larger files, potentially limiting the tool's applicability in certain scenarios. However, the creator acknowledges this limitation and indicates plans for future improvements, including support for server-side processing to handle larger datasets.

Finally, several commenters express their intention to try out the tool, indicating a positive reception from the Hacker News community. Overall, the comments reflect a general appreciation for the project's concept, simplicity, and potential, while also acknowledging some areas for potential improvement.
XAN: A Modern CSV-Centric Data Manipulation Toolkit for the Terminal

permalink

Posted: 2025-03-27 15:50:08

Xan is a command-line tool designed for efficient manipulation of CSV and tabular data. It focuses on speed and simplicity, leveraging Rust's performance for tasks like searching, filtering, transforming, and aggregating. Xan aims to be a modern alternative to traditional tools like awk and sed, offering a more intuitive syntax specifically geared toward working with structured data in a terminal environment. Its features include column selection, filtering based on various criteria, data type conversion, statistical computations, and outputting in various formats, including JSON.

The GitHub repository introduces XAN, a command-line tool meticulously crafted for manipulating CSV (Comma-Separated Values) data directly within the terminal environment. XAN aims to provide a modern, streamlined, and efficient alternative to traditional command-line utilities like awk, sed, and cut, which can often be cumbersome for complex CSV operations. It leverages the power and expressiveness of Python, coupled with a user-friendly interface designed specifically for CSV manipulation.

XAN's core functionality revolves around selecting, filtering, transforming, and analyzing tabular data stored in CSV format. It boasts features such as row and column selection using intuitive syntax, enabling users to quickly isolate specific data subsets. Data transformation capabilities include operations like adding, deleting, renaming, and modifying columns, facilitating flexible data restructuring. XAN also incorporates powerful filtering mechanisms, allowing users to refine data based on specific criteria, using logical expressions and comparisons.

Furthermore, XAN supports aggregation and statistical computations, providing a means to calculate sums, averages, counts, and other summary statistics on selected data. This feature enhances its data analysis capabilities, enabling users to gain insights directly from the command line. Output formatting is another key aspect, offering options to control the presentation of results, including custom delimiters and headers.

The tool's design prioritizes ease of use and readability. It employs a clear and concise syntax, making it accessible even to users with limited command-line experience. The reliance on Python as the underlying engine provides access to a rich ecosystem of libraries and functions, expanding XAN's potential for complex data manipulation tasks. The GitHub repository provides comprehensive documentation, including installation instructions, usage examples, and a detailed explanation of XAN's features and syntax, further contributing to its user-friendliness. In essence, XAN aims to be a powerful, versatile, and accessible tool for anyone working with CSV data in a terminal environment.
Summary of Comments ( 23 )
https://news.ycombinator.com/item?id=43494894

Hacker News users discuss XAN's potential, particularly its speed and ease of use for data manipulation tasks compared to traditional tools like awk and sed. Some express excitement about its CSV parsing capabilities and the ability to leverage Python's power. Concerns are raised regarding the dependency on Python, potential performance bottlenecks, and the limited feature set compared to more established data wrangling tools like Pandas. The discussion also touches upon the project's early stage of development, with some users interested in contributing and others suggesting potential improvements like better documentation and integration with other command-line tools. Several comments compare XAN favorably to other similar tools like jq and miller, emphasizing its niche in CSV manipulation.

The Hacker News post titled "XAN: A Modern CSV-Centric Data Manipulation Toolkit for the Terminal" (https://news.ycombinator.com/item?id=43494894) has generated several comments discussing the merits and potential drawbacks of the XAN tool.

Several commenters express enthusiasm for XAN, praising its seemingly intuitive syntax and potential for simplifying common data manipulation tasks. One commenter highlights the apparent ease of use, suggesting it could be a more accessible alternative to more complex command-line tools like awk or jq. Another appreciates its CSV-centric approach, noting that CSV is a ubiquitous format and a tool specifically designed for it could be quite useful. The ability to perform calculations and filtering within XAN is also mentioned as a positive feature.

However, other comments raise concerns and offer alternative perspectives. Some users question the need for another specialized tool when existing solutions like awk, jq, Miller, xsv, and Python's pandas library already provide similar functionality. They argue that learning yet another tool might not be worthwhile, especially if the existing tools can accomplish the same tasks with comparable or even greater flexibility. The "not invented here" syndrome is also mentioned in this context.

One commenter specifically mentions the power and versatility of jq, emphasizing its ability to handle various data formats beyond CSV, including JSON, and its extensive feature set. They suggest that jq might be a more robust solution for those working with diverse data types.

Another point of discussion revolves around the choice of Rust as the implementation language for XAN. While some applaud the use of Rust for its performance characteristics, others question whether its complexity might make contributing to the project more challenging. There's also a brief discussion about the potential overhead associated with Rust and whether it's significant enough to outweigh its benefits in this specific use case.

Finally, some commenters express interest in trying out XAN and exploring its capabilities firsthand, while others remain skeptical but acknowledge its potential. The overall sentiment seems to be one of cautious curiosity, with some users excited about the prospect of a new CSV-centric tool but others remaining unconvinced of its necessity given the existing alternatives.
Show HN: Krep a High-Performance String Search Utility Written in C

permalink

Posted: 2025-03-11 16:12:43

Krep is a fast string search utility written in C, designed for performance-sensitive tasks. It utilizes SIMD instructions and optimized algorithms to achieve speeds significantly faster than grep and other similar tools, especially when searching large files or codebases. Krep supports regular expressions via PCRE2, various output formats including JSON and CSV, and features like ignoring binary files and following symbolic links. The project is open-source and aims to provide a robust and efficient alternative for command-line text searching.

Davide Santangelo has introduced Krep, a new command-line utility meticulously crafted in C for executing high-performance string searches within files. Designed as a potential alternative to tools like grep and ripgrep, Krep prioritizes speed and efficiency, particularly when dealing with large datasets or frequent search operations.

The project leverages several strategies to achieve its performance goals. A core component is its utilization of SIMD (Single Instruction, Multiple Data) instructions, enabling parallel processing of characters within search strings. This significantly accelerates the matching process compared to traditional sequential approaches. Krep employs a specific SIMD algorithm known as "AVX2," further enhancing its ability to handle multiple characters concurrently.

Furthermore, Krep integrates memory mapping techniques (specifically, mmap) to streamline file access. By mapping the file contents directly into memory, Krep minimizes the overhead associated with traditional read operations, leading to faster search execution. This is especially beneficial when repeatedly searching within the same file.

Beyond raw speed, Krep aims for practical usability. It features support for regular expressions, allowing users to perform more complex pattern matching beyond simple literal strings. The tool also provides options for case-insensitive searches, recursive directory traversal, and displaying line numbers alongside matching results, mirroring the functionality of established search utilities. While prioritizing performance, Krep still strives to offer a comprehensive set of features for versatile string search tasks.

The project is open-source, available on GitHub, and actively maintained by its creator. Davide Santangelo encourages community involvement and contributions to further refine and extend Krep's capabilities. The project page includes documentation outlining usage instructions, available options, and building procedures, along with benchmark results demonstrating its performance advantages compared to other similar tools. While still under active development, Krep presents a promising alternative for users seeking a high-performance string search solution, especially in scenarios involving large datasets and demanding search requirements.
- string search
- C
- performance
- utility
- krep
- Algorithm
- text processing
- Open Source
- command-line
- cli
- grep
- ripgrep
- fast search
- efficient search
Summary of Comments ( 44 )
https://news.ycombinator.com/item?id=43333946

HN users generally praised Krep for its speed and clean implementation. Several commenters compared it favorably to other popular search tools like ripgrep and grep, with some noting its superior performance in specific scenarios. One user suggested incorporating SIMD instructions for potential further speed improvements. Discussion also touched on the nuances of benchmarking and the importance of real-world test cases, with one commenter sharing their own benchmark results where krep excelled. A few users inquired about specific features, like support for PCRE (Perl Compatible Regular Expressions) or Unicode character classes. Overall, the reception was positive, acknowledging krep as a promising tool for efficient string searching.

The Hacker News post about Krep, a high-performance string search utility, sparked a discussion with several interesting comments.

One user questioned the performance comparison methodology, pointing out that ripgrep defaults to searching hidden files and uses memory mapping, potentially skewing the benchmarks. They suggested that a more accurate comparison would involve disabling these features in ripgrep to match krep's behavior. This comment highlighted the importance of fair and consistent benchmarking practices when comparing tools.

Another commenter noted that krep lacks support for regular expressions, a significant limitation compared to other search utilities. They acknowledged the potential performance benefits of a simpler string search but questioned its practical usefulness without regex functionality. This comment underscored the trade-off between speed and features.

A subsequent reply elaborated on the regex point, stating that the lack of this feature greatly reduces krep's versatility, especially in code searching scenarios. The commenter emphasized that regex support is essential for many real-world use cases.

One commenter praised krep's speed, particularly in simpler search scenarios. They described a situation where they needed to search extensive log files and found krep significantly faster than other tools. This comment highlighted the niche where krep might excel: situations where pure string searching without regex is sufficient.

The creator of krep also participated in the discussion, acknowledging the feedback regarding regex support and explaining the rationale behind its exclusion. They mentioned plans to potentially implement a separate tool for regex searching built upon some of the underlying techniques used in krep. This response demonstrated engagement with the community and a willingness to consider future development based on user feedback.

One comment highlighted the value of specialized tools like krep, even with their limitations. The commenter argued that having a dedicated tool for fast literal string searches can be beneficial, even if it doesn't replace fully featured tools like ripgrep in all scenarios.

Finally, a commenter raised a point about the documentation, suggesting an improvement to clarify the handling of non-UTF-8 encoded files. This comment emphasized the importance of clear and comprehensive documentation for user experience.

In summary, the comments section primarily revolved around krep's performance, its lack of regex support, its potential use cases, and some suggestions for improvements. While some users lauded its speed, others found the absence of regex a significant drawback. The discussion highlighted the importance of benchmarking methodology, the trade-offs between speed and functionality, and the value of specialized tools.
Ask HN: Where are the good Markdown to PDF tools (that meet these requirements)?

permalink

Posted: 2025-03-02 16:18:44

The author is seeking recommendations for a Markdown to PDF conversion tool that handles complex formatting well, specifically callouts (like admonitions), diagrams using Mermaid or PlantUML, and math using LaTeX or KaTeX. They require a command-line interface for automation and prefer open-source solutions or at least freely available ones for non-commercial use. Existing tools like Pandoc are falling short in areas like callout styling and consistent rendering across different environments. Ideally, the tool would offer a high degree of customizability and produce clean, visually appealing PDFs suitable for documentation.

A Hacker News user has posed a question to the community seeking recommendations for high-quality Markdown to PDF conversion tools that fulfill a specific set of requirements. The user is dissatisfied with existing solutions, citing issues with inconsistent rendering across different platforms and operating systems, particularly concerning typography and spacing. They are searching for a tool that offers precise control over the final PDF output.

Their ideal tool would prioritize consistent rendering across various environments, ensuring the PDF appears identical regardless of where it's viewed. This consistency is especially crucial for typography, where slight variations in font rendering can significantly impact the document's appearance. Furthermore, they require fine-grained control over margins and spacing, allowing them to define precise dimensions for the layout. Ideally, this control would extend to setting custom page sizes as well. While not strictly mandatory, the ability to incorporate features like syntax highlighting for code blocks and proper rendering of mathematical equations using LaTeX would be considered advantageous. Ultimately, the user seeks a robust and reliable solution capable of producing professional-quality PDFs directly from Markdown source, offering a high degree of control over the final presentation.
Summary of Comments ( 9 )
https://news.ycombinator.com/item?id=43231964

The Hacker News comments discuss various Markdown to PDF conversion tools, focusing on the original poster's requirements of handling code blocks, math, and images well while being ideally open-source and CLI-based. Pandoc is overwhelmingly recommended as the most powerful and flexible option, though some users caution about its complexity. Several commenters suggest simpler alternatives like md-to-pdf, glow, and Typora for less demanding use cases. Some discussion revolves around specific features, like LaTeX integration for math rendering and the challenges of perfectly replicating web-based Markdown rendering in a PDF. A few users mention using custom scripts or web services, while others highlight the benefits of tools like Marked 2 for macOS. The overall consensus seems to be that while a perfect solution might not exist, Pandoc with custom templates or simpler dedicated tools can often meet specific needs.

The Hacker News post "Ask HN: Where are the good Markdown to PDF tools (that meet these requirements)?" generated a robust discussion with several commenters offering suggestions and insights based on the original poster's (OP) specific needs. The OP was looking for a tool capable of handling complex Markdown, including admonitions (like notes, warnings, etc.), footnotes, cross-references, and internal links, while also maintaining a clean and professional appearance. They specifically mentioned Pandoc as falling short of their expectations.

Several commenters championed Typora, praising its visually appealing rendering of Markdown and ease of use for writing and previewing. However, some acknowledged its limitations regarding more advanced features like cross-references, and some mentioned its recent transition to a paid model.

Pandoc was also discussed extensively, despite the OP's initial dismissal. Commenters pointed out that its power lies in its customizability, suggesting that with sufficient tweaking through custom templates and filters using LaTeX or other formatting engines, Pandoc could likely meet the OP's requirements, albeit with a steeper learning curve. Several users provided specific examples of command-line options and workflows to achieve specific styling and formatting results.

A few users suggested Marked 2, primarily for its preview capabilities and its compatibility with custom CSS styling for controlling the final PDF output.

MultiMarkdown Composer was also mentioned, although with less enthusiasm. Its support for MultiMarkdown syntax, a superset of standard Markdown, was highlighted as a potential benefit, but users pointed out the lack of recent updates and potential compatibility issues.

Some commenters recommended exploring static site generators like Hugo or Jekyll. While not strictly Markdown to PDF converters, these tools can generate HTML which can then be converted to PDF, offering more flexibility in styling and layout.

A couple of more niche suggestions included Zettlr and a Python library called WeasyPrint. Zettlr was praised for its academic writing features, while WeasyPrint was mentioned for its ability to generate PDFs directly from HTML, allowing for a highly customizable workflow.

The overall consensus seemed to be that while a perfect out-of-the-box solution may not exist, several tools could meet the OP's needs with some configuration or by combining different tools in a workflow. Several commenters encouraged the OP to share their specific Pandoc setup for better troubleshooting and more tailored recommendations. The discussion highlighted the trade-offs between ease of use and customizability, with simpler tools like Typora offering a streamlined writing experience but potentially lacking advanced features, and more powerful tools like Pandoc requiring more effort to configure but ultimately offering greater control over the final output.
Show HN: Jq-Like Tool for Markdown

permalink

Posted: 2025-02-23 20:05:49

mdq is a command-line tool, inspired by jq, that allows users to process and manipulate Markdown files using CSS-like selectors. It can extract specific elements from Markdown, such as headings, paragraphs, or code blocks, and output them in various formats, including Markdown, HTML, and text. This facilitates tasks like extracting specific sections of a document, reformatting content, and generating summaries, offering a powerful way to automate Markdown workflows.

A new command-line tool called mdq, heavily inspired by the popular JSON processor jq, has been introduced. mdq aims to provide a similar flexible and powerful way to process Markdown files, allowing users to filter, extract, modify, and reformat Markdown content using a declarative query language. Just as jq uses a syntax rooted in JSON's structure to manipulate JSON data, mdq leverages a syntax that understands the hierarchical structure of Markdown – elements like headings, paragraphs, lists, code blocks, and inline formatting. This enables users to target specific parts of a Markdown document based on their structural context. For instance, one could use mdq to extract all level 2 headings, change the formatting of emphasized text, or remove all code blocks from a Markdown file. The tool is written in Go, offering potential performance advantages, and aims to be a versatile addition to the toolkit of anyone working frequently with Markdown, including writers, developers, and content creators. The project is open-source and available on GitHub, inviting community contributions and further development. While drawing clear inspiration from jq, mdq focuses specifically on Markdown processing and does not aim to be a general-purpose data manipulation tool. The stated goal is to provide a dedicated, efficient, and expressive way to manipulate Markdown content directly from the command line, filling a perceived gap in the available tooling for working with this ubiquitous markup language.
- markdown
- JSON
- jq
- command-line
- Tool
- cli
- text processing
- data manipulation
- filtering
- querying
- GitHub
- open-source
- utility
- shell
Summary of Comments ( 10 )
https://news.ycombinator.com/item?id=43152704

Hacker News users generally praised mdq for its potential usefulness, comparing it favorably to jq for JSON. Several commenters expressed interest in using it for tasks like extracting links or reformatting Markdown files. Some suggested improvements, such as adding support for YAML frontmatter and improving error handling. Others highlighted the complexities of parsing Markdown reliably due to its flexible nature and the potential challenges of handling variations and edge cases. One user pointed out the limitations of existing markdown parsers and the difficulties in accurately representing markdown as a data structure, while another cautioned against over-engineering the tool for simple tasks that could be accomplished with grep, sed, or awk.

The Hacker News post "Show HN: Jq-Like Tool for Markdown" at https://news.ycombinator.com/item?id=43152704 generated a modest number of comments, primarily focusing on the tool's utility and potential use cases, as well as comparisons to existing tools.

Several commenters expressed interest in the tool, particularly for tasks like extracting specific sections of markdown files or modifying metadata within markdown documents. One user highlighted its potential for automating tasks related to managing a large collection of markdown files, a sentiment echoed by others who saw value in its ability to streamline workflows.

A significant portion of the discussion centered around comparing mdq to existing tools like pandoc. Some users pointed out pandoc's broader functionality and established ecosystem, suggesting it might be a more versatile solution for complex markdown manipulation. However, others argued that mdq's focused approach and simpler syntax could be advantageous for specific, targeted tasks, especially where the full power of pandoc isn't necessary. The lighter weight of mdq was also mentioned as a potential benefit.

There was some discussion about the specific implementation details of mdq, with one commenter noting the use of regular expressions and raising concerns about potential limitations or edge cases. Another user inquired about handling markdown variations, such as those used by different platforms like GitHub.

One commenter suggested a potential integration with other command-line tools like ripgrep for more complex file searching and filtering scenarios. Another user expressed a desire for additional features, specifically the ability to merge multiple markdown files.

Overall, the comments reflect a generally positive reception of mdq, recognizing its potential as a helpful tool for specific markdown-related tasks. However, the discussion also acknowledges the existing landscape of markdown tools and the need for mdq to carve out a distinct niche with its specialized functionality and streamlined approach.
Show HN: Kreuzberg – Modern async Python library for document text extraction

permalink

Posted: 2025-02-15 10:07:23

Kreuzberg is a new Python library designed for efficient and modern asynchronous document text extraction. It leverages asyncio and supports various file formats including PDF, DOCX, and various image types through integration with OCR engines like Tesseract. The library aims for a clean and straightforward API, enabling developers to easily extract text from multiple documents concurrently, thereby significantly improving processing speed. It also offers features like automatic OCR language detection and integrates seamlessly with existing async Python codebases.

A new Python library named Kreuzberg has been introduced as a modern, asynchronous solution for extracting text from documents. It leverages the power of asyncio, a core Python library for writing concurrent code using the async/await syntax, making it highly efficient for I/O-bound tasks like document processing. Kreuzberg aims to provide a simple, flexible, and robust API for developers who need to extract textual content from various document formats.

The library boasts support for a range of popular document types, including PDF, DOCX, and TXT files. This broad compatibility allows users to work with a diverse collection of document formats without needing to switch between different tools or libraries. Furthermore, Kreuzberg is designed to be extensible, meaning that support for additional document formats can be added relatively easily.

Kreuzberg's asynchronous nature allows it to handle multiple document extractions concurrently, significantly speeding up processing time, particularly when dealing with a large number of documents. This asynchronous design is a key differentiator, providing performance benefits over traditional synchronous libraries. It avoids blocking operations, allowing the program to continue working on other tasks while waiting for document I/O.

The library’s creator emphasizes its modern design principles, focusing on a clean and intuitive API. This focus on usability aims to make the library easy to integrate into existing Python projects and simple to learn for new users. While the library is relatively new, it promises to be a valuable tool for developers working with document processing tasks in Python. The project is hosted on GitHub and encourages community contributions and feedback.
Summary of Comments ( 38 )
https://news.ycombinator.com/item?id=43057375

Hacker News users discussed Kreuzberg's potential, praising its modern, async approach and clean API. Several questioned its advantages over existing libraries like unstructured and langchain, prompting the author to clarify Kreuzberg's focus on smaller documents and ease of use for specific tasks like title and metadata extraction. Some expressed interest in benchmarks and broader language support, while others appreciated its minimalist design and MIT license. The small size of the library and its reliance on readily available packages like beautifulsoup4 and selectolax were also highlighted as positive aspects. A few commenters pointed to the lack of support for complex layouts and OCR, suggesting areas for future development.

The Hacker News post about Kreuzberg, a modern async Python library for document text extraction, has several comments discussing its merits and potential drawbacks.

One commenter expresses enthusiasm for the project, praising its modern approach utilizing async and the promising performance improvements it suggests. They also appreciate the clear and well-structured documentation, making it easy to understand and use.

Another commenter questions the necessity of another text extraction library, given the existing options like textract and Apache Tika. They wonder if Kreuzberg offers any significant advantages over these established tools, asking for specific examples where it outperforms them. This prompts a discussion about the limitations of existing libraries, particularly regarding handling large files or specific document formats. The author of Kreuzberg responds, explaining that existing tools struggled with large PDF files containing scanned images in their tests. Kreuzberg was developed to address these shortcomings, offering better performance and memory efficiency in these scenarios by using OCR and processing documents asynchronously. They acknowledge that textract might be sufficient for simpler use cases, but emphasize Kreuzberg's focus on handling complex and large documents more efficiently.

Further discussion revolves around the benchmark comparisons provided. One commenter suggests incorporating Tesseract's page segmentation modes into the benchmarks to provide a more comprehensive performance evaluation. Another user points out the lack of benchmarks for common file types like DOCX and emphasizes the importance of including these in future comparisons.

The conversation also touches upon the practical implications of asynchronous processing for text extraction, with commenters discussing the scenarios where it offers the most significant benefits. Some suggest that async is particularly useful for processing multiple documents concurrently, leading to substantial time savings.

Finally, a few commenters express interest in the underlying technologies used in Kreuzberg, specifically the OCR engine and PDF parsing libraries. The author clarifies their choice of Tesseract OCR and explains the rationale behind using a specific Python library for PDF handling.
Ropey – A UTF8 text rope for manipulating and editing large texts. in Rust

permalink

Posted: 2025-01-15 15:27:55

Ropey is a Rust library providing a "text rope" data structure optimized for efficient manipulation and editing of large UTF-8 encoded text. It represents text as a tree of smaller strings, enabling operations like insertion, deletion, and slicing to be performed in logarithmic time complexity rather than the linear time of traditional string representations. This makes Ropey particularly well-suited for applications dealing with large text documents, code editors, and other text-heavy tasks where performance is critical. It also provides convenient methods for indexing and iterating over grapheme clusters, ensuring correct handling of Unicode characters.

The Rust crate ropey provides a highly efficient and performant data structure called a "rope" specifically designed for handling large UTF-8 encoded text strings. Unlike traditional string representations that store text contiguously in memory, a rope represents text as a tree-like structure of smaller strings. This structure allows for significantly faster performance in operations that modify text, particularly insertions, deletions, and slicing, especially when dealing with very long strings where copying large chunks of memory becomes a bottleneck.

ropey aims to be a robust and practical solution for text manipulation, offering not only performance but also a comprehensive set of features. It correctly handles complex grapheme clusters and provides accurate character indexing and slicing, respecting the nuances of UTF-8 encoding. The library also supports efficient splitting and concatenation of ropes, further enhancing its ability to manage large text documents. Furthermore, it provides functionality for finding character and line boundaries, iterating over lines and graphemes, and determining line breaks.

Memory efficiency is a key design consideration. ropey minimizes memory overhead and avoids unnecessary allocations by sharing data between ropes where possible, using copy-on-write semantics. This means that operations like slicing create new rope structures that share the underlying data with the original rope until a modification is made. This efficient memory management makes ropey particularly well-suited for applications dealing with substantial amounts of text, such as text editors, code editors, and other text-processing tools.

The crate's API is designed for ease of use and integrates well with the Rust ecosystem. It aims to offer a convenient and idiomatic way to work with ropes in Rust programs, providing a level of abstraction that simplifies complex text manipulation tasks while retaining performance benefits. The API provides methods for building ropes from strings, appending and prepending text, inserting and deleting text at specific positions, and accessing slices of the rope.

In summary, ropey provides a high-performance, memory-efficient, and user-friendly rope data structure implementation in Rust for manipulating and editing large UTF-8 encoded text, making it a valuable tool for developers working with substantial text data. Its careful handling of UTF-8, along with its efficient memory management and comprehensive API, makes it a compelling alternative to traditional string representations for applications requiring fast and efficient text manipulation.
Summary of Comments ( 3 )
https://news.ycombinator.com/item?id=42711966

HN commenters generally praise Ropey's performance and design, particularly its handling of UTF-8 and its focus on efficient editing of large text files. Some compare it favorably to alternatives like String and ropes in other languages, noting Ropey's speed and lower memory footprint. A few users discuss its potential applications in text editors and IDEs, highlighting its suitability for tasks involving syntax highlighting and code completion. One commenter suggests improvements to the documentation, while another inquires about the potential for adding support for bidirectional text. Overall, the comments express appreciation for the library's functionality and its potential value for projects requiring performant text manipulation.

The Hacker News post discussing the Ropey crate for Rust has several comments exploring its use cases, performance, and comparisons to other text manipulation libraries.

One commenter expresses interest in Ropey for use in a text editor they are developing, highlighting the need for efficient handling of large text files and complex editing operations. They specifically mention the desire for a data structure that can manage millions of lines without performance degradation. This commenter's focus on practical application demonstrates a real-world need for libraries like Ropey.

Another commenter points out that Ropey doesn't handle Unicode bidirectional text properly. They note that correctly implementing bidirectional text support is complex and might necessitate using a different crate specifically designed for that purpose. This comment raises a crucial consideration for developers working with multilingual text, emphasizing the importance of choosing the right tool for specific requirements.

Another comment discusses the potential benefits and drawbacks of using a rope data structure compared to a gap buffer. The commenter argues that while gap buffers can be simpler to implement for certain use cases, ropes offer better performance for more complex operations, particularly insertions and deletions in the middle of large texts. This comment provides valuable insight into the trade-offs involved in selecting the appropriate data structure for text manipulation.

Someone else compares Ropey to the text manipulation library used in the Xi editor, suggesting that Ropey might offer comparable performance. This comparison draws a connection between the library and a popular, high-performance text editor, suggesting Ropey's suitability for similar applications.

A subsequent comment adds to this comparison by noting that Xi's implementation differs slightly by storing rope chunks in contiguous memory. This nuance adds technical depth to the discussion, illustrating the different approaches possible when implementing rope data structures.

Finally, one commenter raises the practical issue of serialization and deserialization with Ropey. They acknowledge that while the library is excellent for in-memory manipulation, persisting the rope structure efficiently might require careful consideration. This comment brings up the important aspect of data storage and retrieval when working with large text data, highlighting a potential area for future development or exploration.

In summary, the comments section explores Ropey's practical applications, compares its performance and implementation to other libraries, and delves into specific technical details such as Unicode support and serialization. The discussion provides a comprehensive overview of the library's strengths and limitations, highlighting its relevance to developers working with large text data.

Page 1 of 1.

Stories with Tag text processing

Summary of Comments ( 55 ) https://news.ycombinator.com/item?id=44067409

Summary of Comments ( 5 ) https://news.ycombinator.com/item?id=44050211

Summary of Comments ( 23 ) https://news.ycombinator.com/item?id=44039864

Summary of Comments ( 105 ) https://news.ycombinator.com/item?id=43902869

Summary of Comments ( 44 ) https://news.ycombinator.com/item?id=43897129

Summary of Comments ( 23 ) https://news.ycombinator.com/item?id=43494894

Summary of Comments ( 44 ) https://news.ycombinator.com/item?id=43333946

Summary of Comments ( 9 ) https://news.ycombinator.com/item?id=43231964

Summary of Comments ( 10 ) https://news.ycombinator.com/item?id=43152704

Summary of Comments ( 38 ) https://news.ycombinator.com/item?id=43057375

Summary of Comments ( 3 ) https://news.ycombinator.com/item?id=42711966

Summary of Comments ( 55 )
https://news.ycombinator.com/item?id=44067409

Summary of Comments ( 5 )
https://news.ycombinator.com/item?id=44050211

Summary of Comments ( 23 )
https://news.ycombinator.com/item?id=44039864

Summary of Comments ( 105 )
https://news.ycombinator.com/item?id=43902869

Summary of Comments ( 44 )
https://news.ycombinator.com/item?id=43897129

Summary of Comments ( 23 )
https://news.ycombinator.com/item?id=43494894

Summary of Comments ( 44 )
https://news.ycombinator.com/item?id=43333946

Summary of Comments ( 9 )
https://news.ycombinator.com/item?id=43231964

Summary of Comments ( 10 )
https://news.ycombinator.com/item?id=43152704

Summary of Comments ( 38 )
https://news.ycombinator.com/item?id=43057375

Summary of Comments ( 3 )
https://news.ycombinator.com/item?id=42711966