Home | Blog

Finding Secrets with Regular Expressions

(Originally published here)

Gitleaks uses regular expressions to search for secrets. In this blog post I'll go over the structure of Gitleaks's regular expressions found in the default Gitleaks configuration file. Here is an example regex from one of the rules. In this case we will be looking at the adafruit rule:
(?i)(?:adafruit)(?:[0–9a-z\-_\t .]{0,20})(?:[\s|']|[\s|"]){0,3}(?:=|>|:=|\|\|:|<=|=>|:)(?:'|\"|\s|=|\x60){0,5}([a-z0-9_-]{32})(?:['|\"|\n|\r|\s|\x60]|$)
Something to note before going through the sections one by one, when a group is prefixed by (?: that indicates to ignore the capture. We do not need to capture non-sensitive parts of a match. Okay, now let's break this regex down by logical sections:

1. (?i)

(?i) ignores case. That means the regex that follows will be case insensitve. We don't have to define [A-Za-z] for alphabets, rather we can just use [a-z].

2. (?:adafruit)

I call this group the "identifier" group. Most leaked secrets will follow some form of: ProviderKey="deadb33f". Secrets can occur in all sorts of places like comments and as string literals in function arguments, but for most cases, secrets will follow something like {identifier} {assignment} {string}.
Check out the Regex101 for this step: https://regex101.com/r/uRTOfo/1

3. (?:[0–9a-z\-_\t .]{0,20})

This group can be thought of as a catch-all for extended identifier names. In our adafruit example, if we have a secret like: adafruit_api_token = "22lyl_8yoba93u0__1e7l70ft-6jnjv2" then _api_token would match this extended identifier group.
Regex101 for this step: https://regex101.com/r/WMkPOz/1

4. (?:[\s|']|[\s|"]){0,3}

This section takes account for spaces and quotes after the identifier. The reason we include quotes in this group is so that common key/value syntax is supported (think json, Golang maps, python dicts, etc). The range {0,3} after the group is there to provide some buffer after the identifier word ends. Like in "key" : "value". The section that would match would be the second "and the space(\s) after the quotation.
Regex101 for this step: https://regex101.com/r/FgF3Ku/1

5. (?:=|>|:=|\|\|:|<=|=>|:)

Can you guess what this group is? It's the operator! This group covers common assignment/association operators seen in many programming languages. If you think one is missing, please open a PR!
Regex101 for this step: https://regex101.com/r/ds5rEa/1

6. (?:'|\"|\s|=|\x60){0,5}

This next section is similar to 4 in that it is present to match whitespaces, quotes, and back ticks (\x60) after the assignment operator. This will match the beginning of a string literal, for example.
Regex101 for this step: https://regex101.com/r/kvLi9R/1

7. ([a-z0–9_-]{32})

Yay! Our first capture group and the most important part of the regular expression. This group will capture the credential. Some secrets have known word lengths, like adafruit {32}, but others do not. This is why it is important to generate multiple test secrets when developing new rules.
Regex101 for this step: https://regex101.com/r/G1bfI6/1

8. (?:['|\"|\n|\r|\s|\x60]|$)

This group matches the end of a secret. It will capture a secret's end quotation, space, newline, or back tick. The $ symbol will match the end of a string or line.
Regex101 for this step: https://regex101.com/r/gmSe5F/1