Reading view

There are new articles available, click to refresh the page.

Infosec Tools

A list of information security tools I use for assessments, investigations and other cybersecurity tasks.

Also worth checking out is CISA’s list of free cybersecurity services and tools.

Jump to Section


OSINT / Reconnaissance

Network Tools (IP, DNS, WHOIS)

Breaches, Incidents & Leaks

FININT (Financial Intelligence)

  • GSA eLibrary - Source for the latest GSA contract award information

GEOINT (Geographical Intelligence)

HUMINT (Human & Corporate Intelligence)

  • No-Nonsense Intel - List of keywords which you can use to screen for adverse media, military links, political connections, sources of wealth, asset tracing etc
  • CheckUser - Check desired usernames across social network sites
  • CorporationWiki - Find and explore relationships between people and companies
  • Crunchbase - Discover innovative companies and the people behind them
  • Find Email - Find email addresses from any company
  • Info Sniper - Search property owners, deeds & more
  • Library of Leaks - Search documents, companies and people
  • LittleSis - Who-knows-who at the heights of business and government
  • Minerva - Find TRACES of anyone’s email
  • NAMINT - Shows possible name and login search patterns
  • OpenCorporates - Legal-entity database
  • That’s Them - Find addresses, phones, emails and much more
  • TruePeopleSearch - People search service
  • WhatsMyName - Enumerate usernames across many websites
  • Whitepages - Find people, contact info & background checks

IMINT (Imagery/Maps Intelligence)

MASINT (Measurement and Signature Intelligence)

SOCMINT (Social Media Intelligence)

Email

Code Search

  • grep.app - Search across a half million git repos
  • PublicWWW - Find any alphanumeric snippet, signature or keyword in the web pages HTML, JS and CSS code
  • searchcode - Search 75 billion lines of code from 40 million projects

Scanning / Enumeration / Attack Surface


Offensive Security

Exploits

  • Bug Bounty Hunting Search Engine - Search for writeups, payloads, bug bounty tips, and more…
  • BugBounty.zip - Your all-in-one solution for domain operations
  • CP-R Evasion Techniques
  • CVExploits - Comprehensive database for CVE exploits
  • DROPS - Dynamic CheatSheet/Command Generator
  • Exploit Notes - Hacking techniques and tools for penetration testings, bug bounty, CTFs
  • ExploitDB - Huge repository of exploits from Offensive Security
  • files.ninja - Upload any file and find similar files
  • Google Hacking Database (GHDB) - A list of Google search queries used in the OSINT phase of penetration testing
  • GTFOArgs - Curated list of Unix binaries that can be manipulated for argument injection
  • GTFOBins - Curated list of Unix binaries that can be used to bypass local security restrictions in misconfigured systems
  • Hijack Libs - Curated list of DLL Hijacking candidates
  • Living Off the Living Off the Land - A great collection of resources to thrive off the land
  • Living Off the Pipeline - CI/CD lolbin
  • Living Off Trusted Sites (LOTS) Project - Repository of popular, legitimate domains that can be used to conduct phishing, C2, exfiltration & tool downloading while evading detection
  • LOFLCAB - Living off the Foreign Land Cmdlets and Binaries
  • LoFP - Living off the False Positive
  • LOLBAS - Curated list of Windows binaries that can be used to bypass local security restrictions in misconfigured systems
  • LOLC2 - Collection of C2 frameworks that leverage legitimate services to evade detection
  • LOLESXi - Living Off The Land ESXi
  • LOLOL - A great collection of resources to thrive off the land
  • LOLRMM - Remote Monitoring and Management (RMM) tools that could potentially be abused by threat actors
  • LOOBins - Living Off the Orchard: macOS Binaries (LOOBins) is designed to provide detailed information on various built-in macOS binaries and how they can be used by threat actors for malicious purposes
  • LOTTunnels - Living Off The Tunnels
  • Microsoft Patch Tuesday Countdown
  • offsec.tools - A vast collection of security tools
  • Shodan Exploits
  • SPLOITUS - Exploit search database
  • VulnCheck XDB - An index of exploit proof of concept code in git repositories
  • XSSed - Information on and an archive of Cross-Site-Scripting (XSS) attacks

Red Team

  • ArgFuscator - Generates obfuscated command lines for common system tools
  • ARTToolkit - Interactive cheat sheet, containing a useful list of offensive security tools and their respective commands/payloads, to be used in red teaming exercises
  • Atomic Red Team - A library of simple, focused tests mapped to the MITRE ATT&CK matrix
  • C2 Matrix - Select the best C2 framework for your needs based on your adversary emulation plan and the target environment
  • ExpiredDomains.net - Expired domain name search engine
  • Living Off The Land Drivers - Curated list of Windows drivers used by adversaries to bypass security controls and carry out attacks
  • Unprotect Project - Search Evasion Techniques
  • WADComs - Curated list of offensive security tools and their respective commands, to be used against Windows/AD environments

Web Security

  • Invisible JavaScript - Execute invisible JavaScript by abusing Hangul filler characters
  • INVISIBLE.js - A super compact (116-byte) bootstrap that hides JavaScript using a Proxy trap to run code

Security Advisories

  • CISA Alerts - Providing information on current security issues, vulnerabilities and exploits
  • ICS Advisory Project - DHS CISA ICS Advisories data visualized as a Dashboard and in Comma Separated Value (CSV) format to support vulnerability analysis for the OT/ICS community

Attack Libraries

A more comprehensive list of Attack Libraries can be found here.

  • ATLAS - Adversarial Threat Landscape for Artificial-Intelligence Systems is a knowledge base of adversary tactics and techniques based on real-world attack observations and realistic demonstrations from AI red teams and security groups
  • ATT&CK
  • Risk Explorer for Software Supply Chains - A taxonomy of known attacks and techniques to inject malicious code into open-source software projects.

Vulnerability Catalogs & Tools

Risk Assessment Models

A more comprehensive list of Risk Assessment Models and tools can be found here.


Blue Team

CTI & IoCs

  • Alien Vault OTX - Open threat intelligence community
  • BAD GUIDs EXPLORER
  • Binary Edge - Real-time threat intelligence streams
  • CLOAK - Concealment Layers for Online Anonymity and Knowledge
  • Cloud Threat Landscape - A comprehensive threat intelligence database of cloud security incidents, actors, tools and techniques. Powered by Wiz Research
  • CTI AI Toolbox - AI-assisted CTI tooling
  • CTI.fyi - Content shamelessly scraped from ransomwatch
  • CyberOwl - Stay informed on the latest cyber threats
  • Dangerous Domains - Curated list of malicious domains
  • HudsonRock Threat Intelligence Tools - Cybercrime intelligence tools
  • InQuest Labs - Indicator Lookup
  • IOCParser - Extract Indicators of Compromise (IOCs) from different data sources
  • Malpuse - Scan, Track, Secure: Proactive C&C Infrastructure Monitoring Across the Web
  • ORKL - Library of collective past achievements in the realm of CTI reporting.
  • Pivot Atlas - Educational pivoting handbook for cyber threat intelligence analysts
  • Pulsedive - Threat intelligence
  • ThreatBook TI - Search for IP address, domain
  • threatfeeds.io - Free and open-source threat intelligence feeds
  • ThreatMiner - Data mining for threat intelligence
  • TrailDiscover - Repository of CloudTrail events with detailed descriptions, MITRE ATT&CK insights, real-world incidents references, other research references and security implications
  • URLAbuse - Open URL abuse blacklist feed
  • urlquery.net - Free URL scanner that performs analysis for web-based malware

URL Analysis

Static / File Analysis

  • badfiles - Enumerate bad, malicious, or potentially dangerous file extensions
  • CyberChef - The cyber swiss army knife
  • DocGuard - Static scanner and has brought a unique perspective to static analysis, structural analysis
  • dogbolt.org - Decompiler Explorer
  • EchoTrail - Threat hunting resource used to search for a Windows filename or hash
  • filescan.io - File and URL scanning to identify IOCs
  • filesec.io - Latest file extensions being used by attackers
  • Kaspersky TIP
  • Manalyzer - Static analysis on PE executables to detect undesirable behavior
  • PolySwarm - Scan Files or URLs for threats
  • VirusTotal - Analyze suspicious files and URLs to detect malware

Dynamic / Malware Analysis

Forensics

  • DFIQ - Digital Forensics Investigative Questions and the approaches to answering them

Phishing / Email Security


Assembly / Reverse Engineering


OS / Scripting / Programming

Regex


Password


AI

  • OWASP AI Exchange - Comprehensive guidance and alignment on how to protect AI against security threats

Assorted

OpSec / Privacy

  • Awesome Privacy - Find and compare privacy-respecting alternatives to popular software and services
  • Device Info - A web browser security testing, privacy testing, and troubleshooting tool
  • Digital Defense (Security List) - Your guide to securing your digital life and protecting your privacy
  • DNS Leak Test
  • EFF | Tools from EFF’s Tech Team - Solutions to the problems of sneaky tracking, inconsistent encryption, and more
  • Privacy Guides - Non-profit, socially motivated website that provides information for protecting your data security and privacy
  • Privacy.Sexy - Privacy related configurations, scripts, improvements for your device
  • PrivacyTests.org - Open-source tests of web browser privacy
  • switching.software - Ethical, easy-to-use and privacy-conscious alternatives to well-known software
  • What’s My IP Address? - A number of interesting tools including port scanners, traceroute, ping, whois, DNS, IP identification and more
  • WHOER - Get your IP

Jobs

  • infosec-jobs - Find awesome jobs and talents in InfoSec / Cybersecurity

Conferences / Meetups

Infosec / Cybersecurity Research & Blogs

Funny

Walls of Shame

  • Audit Logs Wall of Shame - A list of vendors that don’t prioritize high-quality, widely-available audit logs for security and operations teams
  • Dumb Password Rules - A compilation of sites with dumb password rules
  • The SSO Wall of Shame - A list of vendors that treat single sign-on as a luxury feature, not a core security requirement
  • ssotax.org - A list of vendors that have SSO locked up in an subscription tier that is more than 10% more expensive than the standard price
  • Why No IPv6? - Wall of shame for IPv6 support

Other

Dynamization of Jekyll

Jekyll is a framework for creating websites/blogs using static plain-text files. Jekyll is used by GitHub Pages, which is also the current hosting provider for Shellsharks.com. I’ve been using Git Pages since the inception of my site and for the most part have no complaints. With that said, a purely static site has some limitations in terms of the types of content one can publish/expose.

I recently got the idea to create a dashboard-like page which could display interesting quantitative data points (and other information) related to the site. Examples of these statistic include, total number of posts, the age of my site, when my blog was last updated, overall word count across all posts, etc… Out of the box, Jekyll is limited in its ability to generate this information in a dynamic fashion. The Jekyll-infused GitHub pages engine generates the site via an inherent pages-build-deployment Git Action (more on this later) upon commit. The site will then stay static until the next build. As such, it has limited native ability to update content in-between builds/manual-commits.

To solve for this issue, I’ve started using a variety of techniques/technologies (listed below) to introduce more dynamic functionality to my site (and more specificially, the aforementioned statboard).

Jekyll Liquid

Though not truly “dynamic”, Liquid* templating language is an easy, Jekyll-native way to generate static content in a quasi-dynamic way at site build time. As an example, if I wanted to denote the exact date and time that a blog post was published I might first try to use the Liquid template {{ site.time }}. What this actually ends up giving me is a time stamp for when the site was built (e.g. 2025-05-14 20:13:13 -0400), rather than the last updated date of the post itself. So instead, I can harness the posts custom front matter, such as “updated:”, and access that value using the tag {{ page.updated }} (so we get, __).

One component on the (existing) Shellsharks statboard calculates the age of the site using the last updated date of the site (maintained in the change log), minus the publish date of the first-ever Shellsharks post. Since a static, Jekyll-based, GitHub Pages site is only built (and thus only updated) when I actually physically commit an update, this component will be out of date if I do not commit atleast daily. So how did I solve for this? Enter Git Actions.

* Learn more about the tags, filters and other capabilities of Liquid here.

JavaScript & jQuery

Before we dive into the power of Git Actions, it’s worth mentioning the ability to add dynamism by simply dropping straight up, in-line JavaScript directly into the page/post Markdown (.md) files. Remember, Jekyll produces .html files directly from static, text-based files (like Markdown). So the inclusion of raw JS syntax will translate into embdedded, executable JS code in the final, generated HTML files. The usual rules for in-page JS apply here.

One component idea I had for the statboard was to have a counter of named vulnerabilities. So how could I grab that value from the page? At first, I tried fetching the DOM element with the id in which the count was exposed. However this failed because fetching that element alone meant not fetching the JS and other HTML content that was used to actually generate that count. To solve for this, I used jQuery to load the entire page into a temporary <div> tag, then iterated through the list (<li>) elements within that div (similar to how I calculate it on the origin page), and then finally set the dashboard component to the calculated count!

$('<div></div>').load('/infosec-blogs', function () {
  var blogs = $("li",this).length;
  $("#iblogs").html(blogs);
});
Additional notes on the use of JS and jQuery
  • I used Google’s Hosted Libraries to reference jQuery <script src="https://ajax.googleapis.com/ajax/libs/jquery/3.6.0/jquery.min.js"></script>.
  • Be weary of adding JS comments // in Markdown files as I noticed the Jekyll parsing engine doesn’t do a great job of new-lining, and thus everything after a comment will end up being commented.
  • When using Liquid tags in in-line JS, ensure quotes (‘’,””) are added around the templates so that the JS code will recognize those values as strings (where applicable).
  • The ability to add raw, arbitrary JS means there is a lot of untapped capability to add dynamic content to an otherwise static page. Keep in mind though that JS code is client-side, so you are still limited in that typical server-side functionality is not available in this context.

Git Actions

Thanks to the scenario I detailed in the Jekyll Liquid section, I was introduced to the world of Git Actions. Essentially, I needed a way to force an update / regeneration of my site such that one of my staticly generated Liquid tags would update at some minimum frequency (in this case, at least daily). After some Googling, I came across this action which allowed me to do just that! Essentially, it forces a blank build using a user-defined schedule as the trigger.

# File: .github/workflows/refresh.yml
name: Refresh

on:
  schedule:
    - cron:  '0 3 * * *' # Runs every day at 3am

jobs:
  refresh:
    runs-on: ubuntu-latest
    steps:
      - name: Trigger GitHub pages rebuild
        run: |
          curl --fail --request POST \
            --url https://api.github.com/repos/${{ github.repository }}/pages/builds \
            --header "Authorization: Bearer $USER_TOKEN"
        env:
          # You must create a personal token with repo access as GitHub does
          # not yet support server-to-server page builds.
          USER_TOKEN: ${{ secrets.USER_TOKEN }}

In order to get this Action going, follow these steps…

  1. Log into your GitHub account and go to Settings (in the top right) –> Developer settings –> Personal access tokens.
  2. Generate new token and give it full repo access scope (More on OAuth scopes). I set mine to never expire, but you can choose what works best for you.
  3. Navigate to your GitHub Pages site repo, ***.github.io –> Settings –> Secrets –> Actions section. Here you can add a New repository secret where you give it a unique name and set the value to the personal access token generated earlier.
  4. In the root of your local site repository, create a .github/workflows/ folder (if one doesn’t already exist).
  5. Create a <name of your choice>.yml file where you will have the actual Action code (like what was provided above).
  6. Commit this Action file and you should be able to see run details in your repo –> Actions section within GitHub.
Additional Considerations for GitHub Actions
  • When using the Liquid tag {{ site.time }} with a Git Action triggered build, understand that it will use the time of the server which is generating the HTML, in this case the GitHub servers themselves, which means the date will be in UTC (Conversion help).
  • Check out this reference for informaton on how to specify the time zone in the front matter of a page or within the Jekyll config file.
  • GitHub Actions are awesome and powerful, but their are limitations to be aware of. Notably, it is important to understand the billing considerations. Free tier accounts get 2,000 minutes/month while Pro tier accounts (priced at about $44/user/year) get 3,000.
  • For reference, the refresh action (provided above) was running (for me) at about 13 seconds per trigger. This means you could run that action over 9,000 times without exceeding the minute cap for a Free-tier account.
  • With the above said, also consider that the default pages-build-deployment Action used by GitHub Pages to actually generate and deploy your site upon commit will also consume those allocated minutes. Upon looking at my Actions pane, I am seeing about 1m run-times for each build-and-deploy action trigger.

What’s Next

I’ve only just started to scratch the surface of how I can further extend and dynamize my Jekyll-based site. In future updates to this guide (or in future posts), I plan to cover more advanced GitHub Action capabilities as well as how else to add server-side functionality (maybe through serverless!) to the site. Stay tuned!

GitHub Issues search now supports nested queries and boolean operators: Here’s how we (re)built it

Originally, Issues search was limited by a simple, flat structure of queries. But with advanced search syntax, you can now construct searches using logical AND/OR operators and nested parentheses, pinpointing the exact set of issues you care about.

Building this feature presented significant challenges: ensuring backward compatibility with existing searches, maintaining performance under high query volume, and crafting a user-friendly experience for nested searches. We’re excited to take you behind the scenes to share how we took this long-requested feature from idea to production.

Here’s what you can do with the new syntax and how it works behind the scenes

Issues search now supports building queries with logical AND/OR operators across all fields, with the ability to nest query terms. For example is:issue state:open author:rileybroughten (type:Bug OR type:Epic) finds all issues that are open AND were authored by rileybroughten AND are either of type bug or epic.

Screenshot of an Issues search query involving the logical OR operator.

How did we get here?

Previously, as mentioned, Issues search only supported a flat list of query fields and terms, which were implicitly joined by a logical AND. For example, the query assignee:@me label:support new-project translated to “give me all issues that are assigned to me AND have the label support AND contain the text new-project.

But the developer community has been asking for more flexibility in issue search, repeatedly, for nearly a decade now. They wanted to be able to find all issues that had either the label support or the label question, using the query label:support OR label:question. So, we shipped an enhancement towards this request in 2021, when we enabled an OR style search using a comma-separated list of values.

However, they still wanted the flexibility to search this way across all issue fields, and not just the labels field. So we got to work. 

Technical architecture and implementation

The architecture of the Issues search system (and the changes needed to build this feature).

From an architectural perspective, we swapped out the existing search module for Issues (IssuesQuery), with a new search module (ConditionalIssuesQuery), that was capable of handling nested queries while continuing to support existing query formats.

This involved rewriting IssueQuery, the search module that parsed query strings and mapped them into Elasticsearch queries.

Search Architecture

To build a new search module, we first needed to understand the existing search module, and how a single search query flowed through the system. At a high level, when a user performs a search, there are three stages in its execution:

  1. Parse: Breaking the user input string into a structure that is easier to process (like a list or a tree)
  2. Query: Transforming the parsed structure into an Elasticsearch query document, and making a query against Elasticsearch.
  3. Normalize: Mapping the results obtained from Elasticsearch (JSON) into Ruby objects for easy access and pruning the results to remove records that had since been removed from the database.

Each stage presented its own challenges, which we’ll explore in more detail below. The Normalize step remained unchanged during the re-write, so we won’t dive into that one.

Parse stage

The user input string (the search phrase) is first parsed into an intermediate structure. The search phrase could include:

  • Query terms: The relevant words the user is trying to find more information about (ex: “models”)
  • Search filters: These restrict the set of returned search documents based on some criteria (ex: “assignee:Deborah-Digges”)

 Example search phrase: 

  • Find all issues assigned to me that contain the word “codespaces”:
    • is:issue assignee:@me codespaces
  • Find all issues with the label documentation that are assigned to me:
    • assignee:@me label:documentation

The old parsing method: flat list

When only flat, simple queries were supported, it was sufficient to parse the user’s search string into a list of search terms and filters, which would then be passed along to the next stage of the search process.

The new parsing method: abstract syntax tree

As nested queries may be recursive, parsing the search string into a list was no longer sufficient. We changed this component to parse the user’s search string into an Abstract Syntax Tree (AST) using the parsing library parslet.

We defined a grammar (a PEG or Parsing Expression Grammar) to represent the structure of a search string. The grammar supports both the existing query syntax and the new nested query syntax, to allow for backward compatibility.

A simplified grammar for a boolean expression described by a PEG grammar for the parslet parser is shown below:

class Parser < Parslet::Parser
  rule(:space)  { match[" "].repeat(1) }
  rule(:space?) { space.maybe }

  rule(:lparen) { str("(") >> space? }
  rule(:rparen) { str(")") >> space? }

  rule(:and_operator) { str("and") >> space? }
  rule(:or_operator)  { str("or")  >> space? }

  rule(:var) { str("var") >> match["0-9"].repeat(1).as(:var) >> space? }

  # The primary rule deals with parentheses.
  rule(:primary) { lparen >> or_operation >> rparen | var }

  # Note that following rules are both right-recursive.
  rule(:and_operation) { 
    (primary.as(:left) >> and_operator >> 
      and_operation.as(:right)).as(:and) | 
    primary }
    
  rule(:or_operation)  { 
    (and_operation.as(:left) >> or_operator >> 
      or_operation.as(:right)).as(:or) | 
    and_operation }

  # We start at the lowest precedence rule.
  root(:or_operation)
end

For example, this user search string:
is:issue AND (author:deborah-digges OR author:monalisa ) 
would be parsed into the following AST:

{
  "root": {
    "and": {
      "left": {
        "filter_term": {
          "attribute": "is",
          "value": [
            {
              "filter_value": "issue"
            }
          ]
        }
      },
      "right": {
        "or": {
          "left": {
            "filter_term": {
              "attribute": "author",
              "value": [
                {
                  "filter_value": "deborah-digges"
                }
              ]
            }
          },
          "right": {
            "filter_term": {
              "attribute": "author",
              "value": [
                {
                  "filter_value": "monalisa"
                }
              ]
            }
          }
        }
      }
    }
  }
}

Query

Once the query is parsed into an intermediate structure, the next steps are to:

  1. Transform this intermediate structure into a query document that Elasticsearch understands
  2. Execute the query against Elasticsearch to obtain results

Executing the query in step 2 remained the same between the old and new systems, so let’s only go over the differences in building the query document below.

The old query generation: linear mapping of filter terms using filter classes

Each filter term (Ex: label:documentation) has a class that knows how to convert it into a snippet of an Elasticsearch query document. During query document generation, the correct class for each filter term is invoked to construct the overall query document.

The new query generation: recursive AST traversal to generate Elasticsearch bool query

We recursively traversed the AST generated during parsing to build an equivalent Elasticsearch query document. The nested structure and boolean operators map nicely to Elasticsearch’s boolean query with the AND, OR, and NOT operators mapping to the must, should, and should_not clauses.

We re-used the building blocks for the smaller pieces of query generation to recursively construct a nested query document during the tree traversal.

Continuing from the example in the parsing stage, the AST would be transformed into a query document that looked like this:

{
  "query": {
    "bool": {
      "must": [
        {
          "bool": {
            "must": [
              {
                "bool": {
                  "must": {
                    "prefix": {
                      "_index": "issues"
                    }
                  }
                }
              },
              {
                "bool": {
                  "should": {
                    "terms": {
                      "author_id": [
                        "<DEBORAH_DIGGES_AUTHOR_ID>",
                        "<MONALISA_AUTHOR_ID>"
                      ]
                    }
                  }
                }
              }
            ]
          }
        }
      ]
    }
    // SOME TERMS OMITTED FOR BREVITY
  }
}

With this new query document, we execute a search against Elasticsearch. This search now supports logical AND/OR operators and parentheses to search for issues in a more fine-grained manner.

Considerations

Issues is one of the oldest and most heavily -used features on GitHub. Changing core functionality like Issues search, a feature with an average of  nearly 2000 queries per second (QPS)—that’s almost 160M queries a day!—presented a number of challenges to overcome.

Ensuring backward compatibility

Issue searches are often bookmarked, shared among users, and linked in documents, making them important artifacts for developers and teams. Therefore, we wanted to introduce this new capability for nested search queries without breaking existing queries for users. 

We validated the new search system before it even reached users by:

  • Testing extensively: We ran our new search module against all unit and integration tests for the existing search module. To ensure that the GraphQL and REST API contracts remained unchanged, we ran the tests for the search endpoint both with the feature flag for the new search system enabled and disabled.
  • Validating correctness in production with dark-shipping: For 1% of issue searches, we ran the user’s search against both the existing and new search systems in a background job, and logged differences in responses. By analyzing these differences we were able to fix bugs and missed edge cases before they reached our users.
    • We weren’t sure at the outset how to define “differences,” but we settled on “number of results” for the first iteration. In general, it seemed that we could determine whether a user would be surprised by the results of their search against the new search capability if a search returned a different number of results when they were run within a second or less of each other.

Preventing performance degradation

We expected more complex nested queries to use more resources on the backend than simpler queries, so we needed to establish a realistic baseline for nested queries, while ensuring no regression in the performance of existing, simpler ones.

For 1% of Issue searches, we ran equivalent queries against both the existing and the new search systems. We used scientist, GitHub’s open source Ruby library, for carefully refactoring critical paths, to compare the performance of equivalent queries to ensure that there was no regression.

Preserving user experience

We didn’t want users to have a worse experience than before just because more complex searches were possible

We collaborated closely with product and design teams to ensure usability didn’t decrease as we added this feature by:

  • Limiting the number of nested levels in a query to five. From customer interviews, we found this to be a sweet spot for both utility and usability.
  • Providing helpful UI/UX cues: We highlight the AND/OR keywords in search queries, and provide users with the same auto-complete feature for filter terms in the UI that they were accustomed to for simple flat queries.

Minimizing risk to existing users

For a feature that is used by millions of users a day, we needed to be intentional about rolling it out in a way that minimized risk to users.

We built confidence in our system by:

  • Limiting blast radius: To gradually build confidence, we only integrated the new system in the GraphQL API and the Issues tab for a repository in the UI to start. This gave us time to collect, respond to, and incorporate feedback without risking a degraded experience for all consumers. Once we were happy with its performance, we rolled it out to the Issues dashboard and the REST API.
  • Testing internally and with trusted partners: As with every feature we build at GitHub, we tested this feature internally for the entire period of its development by shipping it to our own team during the early days, and then gradually rolling it out to all GitHub employees. We then shipped it to trusted partners to gather initial user feedback.

And there you have it, that’s how we built, validated, and shipped the new and improved Issues search!

Feedback

Want to try out this exciting new functionality? Head to our docs to learn about how to use boolean operators and parentheses to search for the issues you care about!

If you have any feedback for this feature, please drop us a note on our community discussions.

Acknowledgements

Special thanks to AJ Schuster, Riley Broughten, Stephanie Goldstein, Eric Jorgensen Mike Melanson and Laura Lindeman for the feedback on several iterations of this blog post!

The post GitHub Issues search now supports nested queries and boolean operators: Here’s how we (re)built it appeared first on The GitHub Blog.

Documentation done right: A developer’s guide

With all the work involved in creating and maintaining a project, sometimes writing documentation can slip through the cracks. However, good docs are a huge asset to any project. Consider the benefits:

  • Better collaboration: Clear, consistent documentation ensures everyone’s on the same page, from your immediate team to outside stakeholders. Additionally, docs promote independent problem solving, saving core contributors the time and effort of answering every question.
  • Smoother onboarding: By providing ways to get started, explaining core concepts, and including tutorial-style content, good documentation allows new team members to ramp up quickly.
  • Increased adoption: The easier it is to understand, set up, and run your project, the more likely someone will use it.

With these benefits in mind, let’s take a look at some important principles of documentation, then dive into how you can quickly create effective docs for your project.

Key tenets of documentation

There are three key principles you should follow as you document your project.

Key tenets of documentation

Use plain language that’s easy to understand. The goal is to make your documentation as accessible as possible. A good guideline is to ask yourself if there are any acronyms or technical terms in your documentation that some folks in your target audience won’t understand. If that’s the case, either swap them for simpler language, or make sure they’re defined in your document.

Keep it concise

Document only necessary information. Trying to cover every possible edge case will overwhelm your readers. Instead, write docs that help the vast majority of readers get started, understand core concepts, and use your project.

Additionally, keep each document focused on a particular topic or task. If you find yourself including information that isn’t strictly necessary, move it into separate, smaller documents and link to them when it’s helpful.

Keep it structured

Consider the structure of each document as you write it to make sure it is easy to scan and understand:

  • Put the most important information first to help readers quickly understand if a document is relevant to them.
  • Use headings and a table of contents to tell your readers where to find specific information. We suggest using documentation templates with common headings to quickly and consistently create structured content.
  • Use text highlighting like boldface and formatting elements like bulleted lists to help readers scan content. Aim for 10% or less text highlighting to make sure emphasized text stands out.
  • Be consistent with your styling. For example, if you put important terminology in bold in one document, do the same in your other content.

Organizing your documentation

Just as there are principles to follow when writing individual documents, you should also follow a framework for organizing documents in your repo. 

There are many approaches to organizing documentation in your repo, but one that we’ve used for several projects and recommend is the Diátaxis framework. This is a systematic approach to organizing all the documents relevant to your project. 

Applying a systematic approach to documenting your repositories can make it easier for users to know where to go to find the information that they need. This reduces frustration and gets folks contributing to your project faster. 

Diátaxis divides documents based on their purpose into four categories: 

  • Tutorials: Learning-oriented documents
  • How-to guides: Goal-oriented instructions for specific tasks
  • Explanation: Discussions providing understanding of the project
  • Reference: Technical specifications and information

Each document in your repository should fit into one of these categories. This helps users quickly find the appropriate resource for their current situation, whether they need to learn a new concept, solve a specific problem, understand underlying principles, or look up technical details.

This can also be a helpful guide to identify which documentation your repository is missing. Is there a tool your repository uses that doesn’t have a reference document? Are there enough tutorials for contributors to get started with your repository? Are there how-to guides to explain some of the common tasks that need to be accomplished within your repository? 

Organizing your documentation according to this framework helps ensure you’re taking a holistic approach to building and maintaining key content for your project.

The post Documentation done right: A developer’s guide appeared first on The GitHub Blog.

Managing music with rclone

I've been listening to my music via Navidrome for a bit now and it's working quite well. To manage my music, I use rclone locally and on my host machine.

When I add new music, I update the tags in Mp3tag, save the art to the same directory as the audio files and maintain a directory structure of Artist/Release Year-Album-Name/Encoding format/Files. These files reside on a drive I've creatively named Storage at /Volumes/Storage/Music which is backed up to Backblaze B2 via Arq. My music bucket is mounted to the server running Navidrome and made available to Navidrome as the music volume for the container, for example:

services:
  navidrome:
    image: deluan/navidrome:latest
    user: 1000:1000 # should be owner of volumes
    ports:
      - "4533:4533"
    restart: unless-stopped
    environment:
      # Optional: put your config options customization here. Examples:
      # ND_LOGLEVEL: debug
    volumes:
      - "/path/to/data:/data"
      - "/mnt/music:/music:ro" # I'm a mount that has your music in it

The B2 mount is run as a systemd service:

[Unit]
Description=Mount B2 /Music with Rclone
After=network-online.target
Wants=network-online.target

[Service]
Type=simple
ExecStartPre=/bin/sh -c 'if mountpoint -q /mnt/music; then fusermount -u /mnt/music; fi'
ExecStart=/usr/bin/rclone mount b2:cdransf-music /mnt/music \
    --allow-other \
    --async-read=true \
    --dir-cache-time=72h \
    --vfs-cache-mode full \
    --vfs-cache-max-size 50G \
    --vfs-read-chunk-size 512M \
    --vfs-read-chunk-size-limit 5G \
    --buffer-size 1G \
    --vfs-read-ahead 4G \
    --vfs-cache-poll-interval 5m \
    --tpslimit 10 \
    --tpslimit-burst 20 \
    --poll-interval 5m \
    --no-modtime
ExecStop=/bin/fusermount -uz /mnt/music
Restart=always
User=root
Group=root

[Install]
WantedBy=multi-user.target

We've got a description, directives telling the service when to start (e.g. when the network is up and that it wants the network up but not to fail if it's not).

A check to see if the mount is already mounted and unmounts it if so. And then, then, we start the mount.

--allow-other: let other users access the mount. --async-read: read asynchronously for better performance. --dir-cach-time=72h: cache the directory structure for 3 days. --vfs-cache-mode full: allows full file caching and helps with seeking and access. --vfs-cache-max-size 50G: a generous cache but safely within the limits of the host machine. --vfs-read-chunk-size 512M: initial chunk read size. --vfs-read-chunk-size-limit 5G: max chunk read size. --buffer-size 1G: the amount of data to buffer in memory per file. --vfs-read-ahead 4G: how far to read ahead for streaming. --vs-cache-poll-interval 5m: how often to check the cache and clean it up. --tpslimit 10 and --tpslimit-burst 20: request throttling so things don't get out of hand. --poll-interval 5m: check B2 every 5 minutes for changes. --no-modtime: disable modtime support because B2 doesn't support it particularly well.

Finally, we unmount when the service stops and restart the service automatically (handy when I reboot the host).

The final lines run the service as root and ensure the service is started at system boot.


Uploading music

To upload music from my local machine to B2 I also use rclone via a function sourced into my zsh config. This — again — creatively titled upload_music function does a few things:

local src="$1" is the path to the local directory I want to upload. I typically run this using upload_music . after navigating to the directory that contains the files I'm uploading.

base is the hardcoded root directory for my stored music.

Next, if I pass an empty or invalid directory, the script errors out with a message explaining how to run it properly.

We use realpath to resolve our $src and $base paths, allowing us to determine that the former is within the latter (and erroring out if not). We then compute the relative path and use all of the above to upload the music to my B2 bucket, preserving a directory structure that mirrors what I have locally.[1]

# upload music to navidrome b2 bucket
upload_music() {
    local src="$1"
    local base="/Volumes/Storage/Media/Music"

    if [[ -z "$src" || ! -d "$src" ]]; then
        echo "Usage: upload_music <local_directory>"
        return 1
    fi

    # resolve absolute paths
    src="$(realpath "$src")"
    base="$(realpath "$base")"

    if [[ "$src" != "$base"* ]]; then
        echo "Error: '$src' is not inside base directory '$base'"
        return 1
    fi

    # compute relative path
    local rel_path="${src#$base/}"

    echo "Syncing '$src' to 'b2:cdransf-music/$rel_path'..."
    rclone sync "$src" "b2:cdransf-music/$rel_path" --progress --transfers=8 --checkers=16 --bwlimit=off
}

Finally, now that the music has been uploaded, the service is defined and running, I'll run a script from the host machine to prompt the whole thing to read in the new music:

#!/bin/bash

echo "Restarting the rclone mount..."
systemctl restart rclone-b2-music.service

echo "Restarting the Navidrome container..."

# find container name running from the navidrome image
navidrome_container=$(docker ps -a --filter "ancestor=deluan/navidrome:latest" --format "{{.Names}}" | head -n 1)

if [[ -n "$navidrome_container" ]]; then
    docker restart "$navidrome_container"
    echo "Navidrome container '$navidrome_container' restarted."
else
    echo "Error: Could not find a container using the image 'deluan/navidrome:latest'."
fi

echo "New music should now be available in Navidrome!"

We restart the mount to invalidate the cache and restart the Navidrome container so that it recognizes that the mount has been restarted. Once Navidrome restarts (which it does quickly) it kicks off a new scan and the music is available when the scan concludes.


  1. One of the nice parts about using rclone for this is that it will only update files after the initial upload. If I change metadata (or anything else minor) the subsequent upload run is extremely quick. ↩︎

GitHub Availability Report: April 2025

In April, we experienced three incidents that resulted in degraded performance across GitHub services.

April 11 03:05 UTC (lasting 39 minutes)

On April 11, 2025, from 03:05 UTC to 03:44 UTC, approximately 75% of Codespaces users faced create and start failures. These were caused by manual configuration changes to an internal dependency that escaped our test coverage. Our monitors and detection mechanism triggered, which helped us triage, revert the changes, and restore service health.

We are working on building additional gates, safer mechanisms for testing, and rolling out such configuration changes. We expect no further disruptions. 

April 23 07:00 UTC (lasting 20 minutes)

On April 23, 2025, between 07:00 UTC and 07:20 UTC, multiple GitHub services experienced degradation caused by resource contention on database hosts. The resulting error rates, which ranged from 2–5% of total requests, led to intermittent service disruption for users. The issue was triggered by an interaction between query load and ongoing schema change that led to connection saturation. The incident recovered after the schema migration was completed.

Our prior investments in monitoring and improved playbooks helped us effectively organize our first responder teams, leading to faster triaging of the incident. We have also identified a regression in our schema change tooling that led to increased resource utilization during schema and reverted to a previous stable version. 

To prevent similar issues in the future, we are reviewing the capacity of the database, improving monitoring and alerting systems, and implementing safeguards to reduce time to detection and mitigation. 

April 23 19:13 UTC (lasting 42 minutes)

On April 23, 2025, between 19:13:50 UTC and 22:11:00 UTC, GitHub’s Migration service experienced elevated failures caused by a configuration change that removed access for repository migration workers. During this time, 837 migrations across 57 organizations were affected. Impacted migrations required a retry after the log message “Git source migration failed. Error message: An error occurred. Please contact support for further assistance.” was displayed. Once access was restored, normal operations resumed without further interruption.

As a result of this incident, we have implemented enhanced test coverage and refined monitoring thresholds to help prevent similar disruptions in the future.


Please follow our status page for real-time updates on status changes and post-incident recaps. To learn more about what we’re working on, check out the GitHub Engineering Blog.

The post GitHub Availability Report: April 2025 appeared first on The GitHub Blog.

❌