How to reduce false positives while scanning for secrets

BluBracket
February 25, 2021
PALO ALTO, CALIFORNIA

Secrets in code are a pervasive and ever increasing attack vector in modern software companies. If you’ve ever used a secret scanning tool to detect secrets in your code, you’ve probably had to deal with the overwhelming amount of false positives.

In some cases, the level of noise is so high that it can be difficult or impossible to find actionable risk amongst the tremendous volume of data that your scanning tool produces.

To demonstrate this issue, we recently performed a benchmark scan on an organization containing 115 repositories. We performed this benchmark with BluBracket Community Edition, a premium free-to-use secret detection tool, and two popular open source scanning tools, TruffleHog and GitLeaks. The scan detected 17 secrets with Blubracket, 144 secrets with TruffleHog, and 339,275 secrets with GitLeaks.

Indeed, there was a large variation in the results and in the false positives each scanning tool produced.

There are many reasons why secret scanning tools can produce false positives. In this article, we will cover the following:

They do no deduplicate the same instance of a secret across different branches
They use regular expression patterns which are inaccurately defined, or defined too broadly
They make loose assumptions about how entropy should be used to evaluate detected secrets
They do not leverage activeness checks

As the results above demonstrate, BluBracket’s Community Edition was built from the ground up to handle challenges like these gracefully, and to reduce false positives found in more traditional scanning tools.

Let’s go into more detail on why some of these false positives occur and how BluBracket suppresses them:

Deduplicating secrets

Most scanning tools report the exact same instance of a secret multiple times. One example of this kind of duplication is when a developer introduces a secret into the code via a commit.

Let’s take any example commit C1:
password = “MyPassword1234”

Every time another developer merges this commit into their feature branch, say B1, the same lines of code, and therefore the same secrets, will technically appear again. In traditional scanning tools, the existence of this same secret from commit C1 is detected as separate instances in both the original branch and the new branch B1. Therefore, it will be reported twice.

On the contrary, BluBracket Community Edition scans the entire history of commits, creates unique identifiers for secrets, and reports them as a single distinct entity. This secret deduplication functionality allows for a substantial reduction in noise.

In the benchmark example, BluBracket found 4 private keys. All of those appeared to be real private keys. On the other hand, TruffleHog found 37 and GitLeaks found 33. The additional instances found by the other two tools were due to duplicate reports of the same 4 private keys.

How to reduce false positives while scanning for secrets 1

Broad or inaccurate Regular Expressions

Most scanning tools use a set of regular expressions to detect various types of secrets. In many cases the regular expression patterns are too broad or inaccurately defined, such that in addition to being able to detect real secrets, they alert on many similar patterns which are not real or meaningful. As an example, the regular expression for GitHub tokens in a commonly used open source tool is defined as:

[g|G][i|I][t|T][h|H][u|U][b|B].*[‘|\”][0-9a-zA-Z]{35,40}[‘|\”]

Unfortunately, the .* portion can match anything and incorrectly matches any GitHub URL, such as:

https://github.com/kubernetes/kubernetes/commit/a04b6e4b1671810ede5b8cacf4527741781d6fb9

as well as dependencies in Carthage dependency lock files:

Cartfile.resolved
github “jspahrsummers/xcconfigs” “3d9d99634cae6d586e272543d527681283b33eb0”

BluBracket has a large number of sophisticated regular expressions out of the box for every type of secret. After finding matches for regular expressions, BluBracket executes post-processing algorithms based on each type of secret, applying various heuristics to rule out additional false positives where the regular expressions alone did not suffice.

Inaccurate assumptions about how Entropy is used to detect secrets

Password Entropy is a concept used to assign a numerical score to how unpredictable a password is or the likelihood of highly random data in a string of characters. It is sometimes applied to predict how difficult a given password would be to crack through guessing, brute force, dictionary attacks, or other common methods. There are various Password Entropy formulas that can be used to make the calculation. For example, using the Pleacher Entropy Algorithm, the following string of characters

Lucky123!

has an entropy score of 37.2

While the following string of characters: 6e649ca13a7df3faacdc8bbb280aa9a6602d22fd9d545336077e573a1f4ff3b8

has an entropy score of 255.3

Notice that although in the first example above the string Lucky123! is shorter and has lower entropy, it is an example of a valid secret that we have found in scans of public open source repositories, whereas the second example is a hash string and therefore a false positive we found in a python pipfile lock file.

Most secret scanning tools lean heavily on higher entropy scores as an indication of the occurrence of a real secret. In the benchmark scan above, one of the popular open source scanning tools reported 210,421 instances of secrets detected based on entropy. The other scanning tool in our benchmark does not implement an entropy check. The first 50 of these 210,421 secrets we investigated were false positives. At the same time, the scanning tool failed to detect several real patterns as valid secrets because they were not above the entropy threshold required by the tool to be detected as secrets. Perhaps there are a few real secrets amongst the 210,421, but the task of finding real secrets amongst the large set of ~210k reported secrets is quite overwhelming. It is likely that most people would ignore the overwhelming number of secrets found through entropy alone and may spend their time investigating other categories of secrets such as private keys or AWS credentials which are more actionable.

It’s important to note that entropy scores were originally designed to measure the likelihood of highly random data, or the strength of a secret, and were not meant to be used to determine how likely a string of characters is a secret. To draw a direct correlation between highly random data or high entropy to a real secret would be to assume that all secrets were made of highly random data or strong passwords. While entropy is helpful for detecting some secret patterns such as private keys, there are many secret patterns such as passwords and credentials, which are generated at the discretion of the user generating the secret. Most products that require password authentication such as web-based email clients, encourage the user to select strong passwords. However, they still allow for short, trivial strings to be used as passwords. Therefore, shorter passwords with low entropy are still allowed and are still common in most systems today.

Basing a secret detection tool on entropy alone will likely produce a lot of false positives. BluBracket Community Edition uses other factors such as the repository metadata, filename, file type, secret type, and common dictionary words in conjunction with entropy. Then, the overall score is determined based on the various factors and their weights. This allows BluBracket to make a much more accurate estimation of the likelihood that the secret is real.

How to reduce false positives while scanning for secrets 2

Activeness Checks

An activeness check is a process by which a secret is used to make a request to a vendor’s API to determine if the secret is active. A secret is defined to be active if it has not expired, been deleted, or revoked. In other words, if the secret could still be used to authenticate against or otherwise access the services of that vendor. Many vendors such as Slack and GitHub provide APIs which can be leveraged to determine whether a secret could still be used to access their services. Oftentimes detected secrets are no longer active and can be ruled out as less severe or important. Many secret scanning tools do not leverage activeness checks to determine the validity of a detected secret, and therefore report all secrets as being of equal severity.

BluBracket Community Edition uses activeness checks to ensure that detected secrets are reported with the proper severity.

In Summary

There are a lot of great open source secret scanning tools out there. Although they are much better than running your organization with no protection, many of them produce a significant noise to signal ratio. In some cases, the ratio can be as high as 10 false positives to every real secret. Such a high noise to signal ratio often results in unusable data, or a significant engineering overhead to try and further distill the data.

Given the techniques outlined above, BluBracket Community Edition is able to automatically detect and distill secrets down to a meaningful and manageable batch. This gives your organization peace of mind without the hidden cost of spending engineering resources to investigate and make sense of secrets in your code. Plus it’s free. Join our community and let us know what you think.

Share this post!