hackslash dot org

Disk I/O bottlenecks in GitHub Actions

Posted: 2025-03-28 15:22:36

GitHub Actions workflows, especially those involving Node.js projects, can suffer from significant disk I/O bottlenecks, primarily during dependency installation (npm install). These bottlenecks stem from the limited I/O performance of the virtual machines used by GitHub Actions runners. This leads to dramatically slower execution times compared to local machines with faster disks. The blog post explores this issue by benchmarking npm install operations across various runner types and demonstrates substantial performance improvements when using self-hosted runners or alternative CI/CD platforms with better I/O capabilities. Ultimately, developers should be aware of these potential bottlenecks and consider optimizing their workflows, exploring different runner options, or utilizing caching strategies to mitigate the performance impact.

This blog post by Depot details the author's experience troubleshooting and resolving performance bottlenecks stemming from disk I/O limitations within their GitHub Actions CI/CD pipelines. The author initially observed inexplicably slow build times for their Rust project, specifically during the cargo build phase. Suspecting resource constraints within the GitHub Actions virtual environment, they began investigating various possibilities, including CPU, memory, and network limitations. However, through systematic experimentation and profiling using tools like iostat, they pinpointed the root cause to be sluggish disk I/O performance.

The author meticulously describes their investigation process, showcasing the data they collected and the reasoning behind their conclusions. They initially ruled out CPU and memory bottlenecks as the primary culprits due to consistently low utilization during the slow builds. Network limitations were also discounted after observing consistent network performance. This led them to focus on disk I/O, where iostat revealed exceptionally high "await" times, indicating that processes were spending significant time waiting for disk operations to complete.

Having identified disk I/O as the bottleneck, the author explored several mitigation strategies. They experimented with utilizing tmpfs, a RAM-based file system, to hold parts of the build process, effectively bypassing the slower physical disk. Mounting the project's target directory (where build artifacts are stored) within tmpfs yielded significant performance improvements, drastically reducing build times.

Further investigation revealed that the performance discrepancy was primarily due to the differing I/O characteristics between the self-hosted runner used for local testing and the GitHub-hosted runner used for CI. The self-hosted runner likely utilized an SSD, providing significantly faster random read/write speeds compared to the potentially slower storage used by the GitHub-hosted runner. The author emphasizes the importance of considering these environmental differences when optimizing CI pipelines.

The blog post concludes with a recommendation to consider tmpfs as a valuable tool for addressing I/O bottlenecks in CI environments, particularly for scenarios involving frequent disk access, such as compilation processes. It emphasizes the importance of profiling and understanding resource utilization to pinpoint performance bottlenecks accurately. The author also acknowledges that tmpfs may not be a universal solution, particularly for very large projects where RAM capacity might become a limiting factor. However, they suggest it as a valuable optimization technique for many projects running in constrained CI environments.

Summary of Comments ( 16 )
https://news.ycombinator.com/item?id=43506574

HN users discussed the surprising performance disparity between GitHub-hosted and self-hosted runners, with several suggesting network latency as a significant factor beyond raw disk I/O. Some pointed out the potential impact of ephemeral runner environments and the overhead of network file systems. Others highlighted the benefits of using actions/cache or alternative CI providers with better I/O performance for specific workloads. A few users shared their experiences, with one noting significant improvements from self-hosting and another mentioning the challenges of optimizing build processes within GitHub Actions. The general consensus leaned towards self-hosting for I/O-bound tasks, while acknowledging the convenience of GitHub's hosted runners for less demanding workflows.

The Hacker News post titled "Disk I/O bottlenecks in GitHub Actions" (https://news.ycombinator.com/item?id=43506574) has generated a moderate number of comments, discussing various aspects of the linked blog post about disk I/O performance issues in GitHub Actions.

Several commenters corroborate the author's findings, sharing their own experiences with slow disk I/O in GitHub Actions. One user mentions observing significantly improved performance after switching to self-hosted runners, highlighting the potential benefits of having more control over the execution environment. They specifically mention the use of tmpfs for build directories as a contributing factor to the improved speeds.

Another commenter points out that the observed I/O bottlenecks are likely not unique to GitHub Actions, suggesting that similar issues might exist in other CI/CD environments that rely on virtualized or containerized runners. They argue that understanding the underlying hardware and storage configurations is crucial for optimizing performance in any CI/CD pipeline.

A more technically inclined commenter discusses the potential impact of different filesystem layers and virtualization technologies on I/O performance. They suggest that the choice of filesystem within the runner's container, as well as the virtualization technology used by the underlying infrastructure, could play a significant role in the observed performance differences.

One commenter questions the methodology used in the original blog post, specifically regarding the use of dd for benchmarking. They argue that dd might not accurately reflect real-world I/O patterns encountered in typical CI/CD workloads. They propose alternative benchmarking tools and techniques that might provide more relevant insights into the performance characteristics of the storage system.

Finally, some commenters discuss potential workarounds and mitigation strategies for dealing with slow disk I/O in GitHub Actions, including using RAM disks, optimizing build processes to minimize disk access, and leveraging caching mechanisms to reduce the amount of data that needs to be read from or written to disk. They also discuss the trade-offs associated with each of these approaches, such as the limited size of RAM disks and the potential complexity of implementing custom caching solutions.

Whose code am I running in GitHub Actions?

permalink

Posted: 2025-03-25 17:17:05

GitHub Actions' opaque nature makes it difficult to verify the provenance of the code being executed in your workflows. While Actions marketplace listings link to source code, the actual runner environment often uses pre-built distributions hosted by GitHub, with no guarantee they precisely match the public repository. This discrepancy creates a potential security risk, as malicious actors could alter the distributed code without updating the public source. Therefore, auditing the integrity of Actions is crucial, but currently complex. The post advocates for reproducible builds and improved transparency from GitHub to enhance trust and security within the Actions ecosystem.

Alex Chan's blog post, "Whose code am I running in GitHub Actions?", delves into the critical issue of supply chain security within the context of GitHub Actions, a popular CI/CD platform. The central question posed is how much trust users implicitly place in the various actions they integrate into their workflows, and what mechanisms exist to verify the integrity and provenance of these actions.

The post begins by highlighting the convenience and extensibility offered by GitHub Actions' marketplace, enabling developers to incorporate pre-built functionalities into their workflows with minimal effort. However, this convenience comes with an inherent security risk. By incorporating third-party actions, developers essentially grant those actions access to their codebase and potentially sensitive secrets, opening up avenues for malicious actors.

Chan emphasizes the potential vulnerability stemming from compromised accounts of action maintainers. If an attacker gains access to an action maintainer's account, they could modify the action's code to perform malicious activities, impacting all repositories utilizing that action. Even seemingly innocuous actions could be weaponized to exfiltrate data or inject vulnerabilities into the software being built.

The blog post then explores various strategies for mitigating these risks. One approach discussed is pinning actions to specific commit SHAs. This ensures that a known, vetted version of the action is used, preventing automatic updates that might introduce malicious code. However, this approach introduces the overhead of manually updating actions and potentially missing out on beneficial updates and bug fixes.

Another method is using a private registry for actions. This allows organizations to host and control the actions used within their workflows, providing greater assurance over their security and provenance. While offering increased control, this approach requires more setup and maintenance.

Furthermore, the post discusses leveraging OpenID Connect (OIDC) to establish trust between GitHub Actions and cloud providers. This allows actions to access cloud resources without needing long-lived secrets, thereby minimizing the potential damage from compromised actions.

Chan also touches on the importance of auditing the actions used in workflows, including understanding their dependencies and scrutinizing their code for potential security flaws. This involves actively reviewing the action's source code, understanding its permissions, and considering the reputation and trustworthiness of the action maintainer.

The post concludes by emphasizing the need for a multi-layered approach to security in GitHub Actions workflows. This includes combining various mitigation strategies, such as pinning actions, using private registries, employing OIDC, and performing regular audits, to minimize the risk of running potentially malicious code. The ultimate goal is to establish a robust security posture that balances the convenience of using third-party actions with the critical need to protect sensitive data and maintain the integrity of the software development lifecycle.

Summary of Comments ( 19 )
https://news.ycombinator.com/item?id=43473623

HN users largely agreed with the author's concerns about the opacity of third-party GitHub Actions. Several highlighted the potential security risks of blindly trusting external code, with some suggesting that reviewing the source of each action should be standard practice, despite the impracticality. Some argued for better tooling or built-in mechanisms within GitHub Actions to improve transparency and security. The potential for malicious actors to introduce vulnerabilities through seemingly benign actions was also a recurring theme, with users pointing to the risk of supply chain attacks and the difficulty in auditing complex dependencies. Some suggested using self-hosted runners or creating internal action libraries for sensitive projects, although this introduces its own management overhead. A few users countered that similar trust issues exist with any third-party library and that the benefits of using pre-built actions often outweigh the risks.

The Hacker News post "Whose code am I running in GitHub Actions?" (linking to an article about auditing GitHub Actions for security risks) generated a moderate amount of discussion with several compelling points raised.

Several commenters focused on the inherent trust issues with third-party actions. One commenter highlighted the risk of malicious actors gaining control of popular actions and injecting malicious code, potentially impacting numerous repositories. They underscored the importance of auditing dependencies, even within trusted actions, as they can pull in other less-vetted actions.

Another thread discussed the difficulty of thoroughly auditing actions. Even simple actions can be complex under the hood, and reviewing them requires significant time and expertise. The analogy to npm packages was drawn, with the observation that security issues in widely used packages can have cascading effects. The point was made that a comprehensive audit of GitHub Actions is a non-trivial task.

A commenter mentioned a tool called actionlint, which helps in catching potential security vulnerabilities in GitHub Actions workflows. This provided a concrete solution for users looking to improve the security posture of their CI/CD pipelines.

The trade-off between convenience and security was also a recurring theme. While pre-built actions streamline workflows, they come with inherent risks. One commenter advocated for building custom actions for critical tasks whenever feasible, despite the increased overhead, to maintain greater control over the code being executed.

The feasibility of self-hosting runners was discussed, presenting it as a method to mitigate some of the security concerns around third-party actions. However, commenters acknowledged the added complexity and maintenance overhead associated with this approach, suggesting it's not a universally applicable solution.

One user suggested using a tool like act for local testing, which allows developers to run their workflows locally before pushing them to GitHub, offering an additional layer of security.

Finally, the importance of pinning action versions was emphasized to prevent unexpected updates from introducing breaking changes or vulnerabilities. This practice allows for more controlled and predictable CI/CD execution.

Overall, the comments paint a picture of a complex ecosystem where convenience often comes at the cost of security. While tools and strategies exist to mitigate risks, the responsibility ultimately falls on developers to carefully consider the implications of the actions they use.

Tj-actions/changed-files GitHub Action Compromised – used by over 23K repos

permalink

Posted: 2025-03-14 22:29:46

The popular GitHub Action tj-actions/changed-files was compromised and used to inject malicious code into projects that utilized it. The attacker gained access to the action's repository and added code that exfiltrated environment variables, secrets, and other sensitive information during workflow runs. This action, used by over 23,000 repositories, became a supply chain vulnerability, potentially affecting numerous downstream projects. The maintainers have since regained control and removed the malicious code, but users are urged to review their workflows and rotate any potentially compromised secrets.

The blog post "Tj-actions/changed-files GitHub Action Compromised – used by over 23K repos" by Step Security details a security incident involving a popular GitHub Action, tj-actions/changed-files, which was compromised and used to inject malicious code into the build processes of repositories that depended on it. This action, utilized by over 23,000 repositories, provided a convenient way to identify files changed in a given commit, a common requirement for optimizing workflows.

The attacker gained control of the tj-actions/changed-files repository by compromising the maintainer's account. They then introduced a malicious commit, specifically version 3.6.0, that included a backdoor. This backdoor injected a malicious script, ./setup.sh, disguised as a harmless setup script. When the compromised action was executed, this script would download and execute a second-stage payload from a remote server controlled by the attacker. This second payload was capable of exfiltrating sensitive information, such as environment variables, secrets, and repository contents, from the affected workflows.

The blog post emphasizes the severe security implications of this type of supply chain attack, where a compromised dependency can affect a large number of downstream users. It highlights the importance of verifying the integrity and provenance of third-party actions before incorporating them into CI/CD pipelines.

The post outlines several mitigation strategies for those who may have used the compromised version. These include immediately reviewing workflow logs for suspicious activity, rotating any potentially exposed secrets, and pinning the action to a known safe version (such as 3.5.1 or earlier) or migrating to an alternative solution. The post specifically suggests the dorny/paths-filter action as a secure replacement.

Furthermore, the article discusses broader best practices for securing GitHub Actions workflows. It recommends using OpenID Connect (OIDC) to limit the need for long-lived secrets, enabling branch protection rules to prevent unauthorized modifications to workflows, and adopting least-privilege access for GitHub Actions tokens. It also advocates for the development of mechanisms to detect anomalous behavior in workflows and alert administrators to potential compromises, such as unexpected network connections or the execution of unfamiliar scripts. The incident underlines the critical need for continuous vigilance and proactive security measures in the software supply chain.

Summary of Comments ( 205 )
https://news.ycombinator.com/item?id=43367987

Hacker News users discussed the implications of the tj-actions/changed-files compromise, focusing on the surprising longevity of the vulnerability (2 years) and the potential impact on the 23,000+ repositories using it. Several commenters questioned the security practices of relying on third-party GitHub Actions without thorough vetting, emphasizing the need for auditing dependencies and using pinned versions. The ease with which a seemingly innocuous action could be compromised highlighted the broader security risks within the software supply chain. Some users pointed out the irony of a security-focused action being the source of vulnerability, while others discussed the challenges of maintaining open-source projects and the pressure to keep dependencies updated. A few commenters also suggested alternative approaches for achieving similar functionality without relying on third-party actions.

The Hacker News post discussing the compromise of the tj-actions/changed-files GitHub Action has a substantial number of comments exploring various facets of the incident and its implications.

Several commenters discuss the nature of the vulnerability and the attacker's approach. Some point out the seemingly unsophisticated nature of the attack, highlighting the injected code's straightforward execution of a malicious payload without much obfuscation or stealth. Others analyze the specific mechanism of the compromise, noting how the attacker gained access to the maintainer's account and modified the action to include the malicious script. The relative ease with which this popular action was compromised raises concerns about the security of the software supply chain and the potential for similar attacks in the future.

A significant thread of discussion revolves around the responsibilities of open-source maintainers and the challenges they face. Commenters acknowledge the substantial workload often borne by maintainers who volunteer their time and expertise. The incident underscores the difficulty of securing open-source projects, particularly those with a large user base, and the potential consequences of compromised dependencies. Some suggest potential solutions, including increased scrutiny of third-party actions, improved security practices for maintainers, and the development of tools to detect malicious code in actions. The discussion also touches upon the financial aspect of open-source maintenance and the need for better support and resources for maintainers.

Several users express frustration with GitHub's handling of the situation, particularly the perceived lack of communication and transparency. Some commenters criticize the platform's security measures and suggest improvements, such as stricter verification processes for actions and more robust mechanisms for detecting and responding to compromises. Others point out the difficulty of verifying the integrity of third-party code and the need for better tools and practices to address this challenge.

The impact of the compromise on the broader software ecosystem is also a topic of discussion. Commenters speculate on the potential consequences for the thousands of repositories that rely on the affected action, including the possibility of data breaches and other security incidents. The incident highlights the interconnected nature of the software supply chain and the potential for widespread damage from a single compromised component. Some users share their experiences dealing with the fallout from the incident, including the steps they took to mitigate the risk and ensure the security of their projects.

Finally, there's a thread discussing the importance of supply chain security in general and the need for increased vigilance and proactive measures to protect against similar attacks in the future. Commenters suggest various strategies, including code signing, multi-factor authentication, and regular security audits. The incident serves as a wake-up call for the software development community, emphasizing the need for a more robust and secure approach to managing dependencies and protecting the software supply chain.

I'll think twice before using GitHub Actions again

permalink

Posted: 2025-01-20 03:41:27

The author details a frustrating experience with GitHub Actions where a seemingly simple workflow to build and deploy a static website became incredibly complex and time-consuming due to caching issues. Despite attempting various caching strategies and workarounds, builds remained slow and unpredictable, ultimately leading to increased costs and wasted developer time. The author concludes that while GitHub Actions might be suitable for straightforward tasks, its caching mechanism's unreliability makes it a poor choice for more complex projects, especially those involving static site generation. They ultimately opted to migrate to a self-hosted solution for improved control and predictability.

The blog post "I'll think twice before using GitHub Actions again" by Nikola Ninković details a frustrating experience with GitHub Actions, ultimately leading the author to reconsider its use for future projects. Ninković begins by acknowledging the initial appeal of GitHub Actions – its tight integration with GitHub, purported ease of use, and the availability of a free tier. He then proceeds to describe how these initial perceived advantages ultimately transformed into significant drawbacks during his attempt to create a CI/CD pipeline for a static website.

The core issue revolved around caching. While GitHub Actions offers caching mechanisms, Ninković found them to be unreliable and opaque. He explains how he attempted to cache dependencies for his Hugo-based website, anticipating this would significantly speed up build times. However, the cache frequently failed to hit, leading to repeated downloads of dependencies and negating the intended time savings. The author diligently tried various approaches to troubleshoot and rectify the caching issues, meticulously adjusting his workflow file and experimenting with different caching strategies suggested in the documentation and community forums. Despite these efforts, the caching behavior remained erratic and unpredictable, significantly hampering the efficiency of his CI/CD pipeline.

Further compounding the problem was the lack of clear visibility into the caching process. Ninković highlights the difficulty in understanding why the cache was being invalidated or missed. The limited debugging tools and opaque nature of the caching system made it challenging to diagnose the root cause of the failures. This lack of transparency ultimately led to a considerable investment of time and effort in attempting to resolve an issue that ultimately proved intractable.

Ninković contrasts his experience with GitHub Actions to his prior usage of Netlify, a platform specifically designed for hosting and deploying static websites. He emphasizes the seamless and intuitive deployment process offered by Netlify, highlighting its inherent understanding of website deployment workflows and its reliable caching mechanisms. This comparison serves to underscore the perceived shortcomings of GitHub Actions, particularly its complexity and unreliability when applied to specific use cases like static website deployment.

Finally, Ninković concludes by expressing his disillusionment with GitHub Actions. He acknowledges that the platform may be suitable for other types of projects, but asserts that its current implementation presents significant challenges for static website deployment. He states his intention to explore alternative CI/CD solutions in the future, prioritizing platforms that offer greater reliability, transparency, and ease of use when it comes to caching and deployment workflows for his static websites. The overall tone of the post reflects a sentiment of frustration stemming from the significant time and effort invested in attempting to overcome the limitations encountered with GitHub Actions.

Summary of Comments ( 148 )
https://news.ycombinator.com/item?id=42764762

Hacker News users generally agreed with the author's sentiment about GitHub Actions' complexity and unreliability. Many shared similar experiences with flaky builds, obscure error messages, and difficulty debugging. Several commenters suggested exploring alternatives like GitLab CI, Drone CI, or self-hosted runners for more control and predictability. Some pointed out the benefits of GitHub Actions, such as its tight integration with GitHub and the availability of pre-built actions, but acknowledged the frustrations raised in the article. The discussion also touched upon the trade-offs between convenience and control when choosing a CI/CD solution, with some arguing that the ease of use initially offered by GitHub Actions can be overshadowed by the difficulties encountered as projects grow more complex. A few users offered specific troubleshooting tips or workarounds for common issues, highlighting the community-driven nature of problem-solving around GitHub Actions.

The Hacker News post "I'll think twice before using GitHub Actions again" (linking to an article criticizing GitHub Actions) generated a significant discussion with a variety of viewpoints.

Several commenters agreed with the author's sentiments, sharing their own frustrating experiences with GitHub Actions. These included complaints about opaque pricing, unexpected cost overruns (especially with third-party actions), difficulty debugging complex workflows, and a lack of adequate support from GitHub. One commenter described being "nickeled and dimed" by hidden costs. Another highlighted the frustration of debugging issues across multiple nested actions, with limited visibility into the execution environment. The unpredictable nature of build times and the resulting variability in costs were also mentioned as major downsides.

Some suggested alternatives to GitHub Actions, citing platforms like GitLab CI, Jenkins, and Drone CI as offering more transparency, control, and better value. Specifically, self-hosting runners was brought up as a way to gain more control and potentially reduce costs.

However, other commenters defended GitHub Actions, emphasizing its convenience and tight integration with the GitHub ecosystem. They argued that for smaller projects or individual developers, the benefits of simplicity and ease of use outweigh the potential cost concerns. Several pointed out that the free tier is often sufficient for many use cases, and that with careful planning and monitoring, costs can be managed effectively. One commenter suggested that the author's issues stemmed from a lack of familiarity with the platform rather than inherent flaws in GitHub Actions itself. They emphasized the importance of understanding the pricing structure and utilizing best practices to optimize workflows.

The discussion also touched upon the broader trend of vendor lock-in within the developer ecosystem. Some expressed concern about relying too heavily on GitHub's integrated toolset, making it difficult to migrate to other platforms in the future.

Finally, some commenters offered practical advice for mitigating the issues raised in the original article, such as using self-hosted runners for computationally intensive tasks, carefully reviewing the pricing of third-party actions, and leveraging GitHub Actions' built-in caching mechanisms to optimize performance and reduce costs. One commenter even shared a link to a community-maintained list of free and open-source GitHub Actions.

In summary, the comments section reveals a mixed reception to GitHub Actions. While some users have had negative experiences related to cost, complexity, and debugging, others find it a convenient and valuable tool. The discussion highlights the importance of carefully considering project requirements, understanding the pricing model, and exploring alternative CI/CD solutions before committing to a specific platform.

Stories with Tag GitHub Actions

Disk I/O bottlenecks in GitHub Actions

Summary of Comments ( 16 ) https://news.ycombinator.com/item?id=43506574

Whose code am I running in GitHub Actions?

Summary of Comments ( 19 ) https://news.ycombinator.com/item?id=43473623

Tj-actions/changed-files GitHub Action Compromised – used by over 23K repos

Summary of Comments ( 205 ) https://news.ycombinator.com/item?id=43367987

I'll think twice before using GitHub Actions again

Summary of Comments ( 148 ) https://news.ycombinator.com/item?id=42764762

Summary of Comments ( 16 )
https://news.ycombinator.com/item?id=43506574

Summary of Comments ( 19 )
https://news.ycombinator.com/item?id=43473623

Summary of Comments ( 205 )
https://news.ycombinator.com/item?id=43367987

Summary of Comments ( 148 )
https://news.ycombinator.com/item?id=42764762