Finding secret data in the source code

root · Apr 15, 2021

Finding secret data in the source code

Original author: Vickie Li

When developers add sensitive data like passwords and API keys directly into the source code, that data can easily reach public repositories.

As a developer, I admit that I used to allow secrets to get into public GitHub repositories. This kind of data, hardcoded into code, has always been a problem in various organizations. When I perform penetration testing in order to check the security systems of companies, I always first of all examine the code of these companies for the presence of secret data. If a developer introduces something like passwords into the code, this data can end up in public repositories or in application packages, and after that it can fall into the hands of attackers.

As microservice architectures and applications built around certain APIs become more and more widespread, developers often need software mechanisms to exchange identities and other secrets. This means that programmers, working with such data, can sometimes make mistakes.

Consider a practical example of an identity leak written directly into the source code of one system. Here bug report regarding reverb.com. The researcher found in the code the credentials used to access Cloudinary. The secret key was present in the source code of the Android Reverb application. Anyone who downloads this app can retrieve the appropriate credentials and be able to read, edit, and delete the files stored in the respective Cloudinary instance:

private static final java.lang.String CONFIG = "cloudinary: // 434762629765715: █████ @ reverb";

And such vulnerabilities are not at all uncommon. I am testing systems for penetration resistance, and I can say that I have happened to find a variety of classified data in public code or in compiled files of many organizations. Among them are credentials for authentication in various services, AWS keys, keys to the GitHub API. Sometimes a hacker who wants to hack a certain company just needs to search its GitHub repository for the credentials sent there accidentally to log into various systems.

Using regular expressions

How to detect secret data in the code before it gets into the public domain and leads to information leakage from the organization? The easiest and most straightforward way to find such data, hardcoded into the code, is to use text search tools and regular expressions.

If your code contains API keys, encryption keys, database passwords, you can often find them using keyword search tools like grep. For example, you can search for key, secret, password, or aws. This approach searches by identifiers, such as variable names, that are used to store the data of interest. Similarly, you can use text search to find file names and service data associated with secret data that are specific to files of a certain format. For example, you can search for the string ----- BEGIN RSA PRIVATE KEY -----.

Many API keys are also written using a specific data format. You can find such keys by performing a regular expression search. For example, AWS Access Key IDs typically start with the string AKIA followed by 16 alphanumeric characters. Therefore, if you search using the regular expression AKIA [0-9A-Z], you can find the corresponding keys in the code.

Keys for the Twilio API start with SK followed by 32 alphanumeric characters. This means that you can find them using the regular expression SK [a-z0-9] {32}. Passwords in URLs can be found by searching for patterns that match the basic syntax used in the respective authentication mechanisms: [a-zA-Z] {3,15}: \ / \ / [^ \ / \\: @] +: [^ \ / \\: @] + @. {1,100}. By using this pattern, you can discover the credentials included in the URL: protocol: // username: password@example.com ... To search for secret data in your own code, you need to find out how the keys used in the code are arranged, and then write regular expressions to find them.

Thanks to the two strategies described above for finding secret data in the code, you can find the bulk of such data. But, relying only on text search, we run the risk of missing those secret data that are not represented by strings of a certain format. Here we can come to the aid of entropy code analysis.

Let's talk about entropy

We can think of entropy as a measure of how "random" and "unpredictable" data is. For example, a single-character string like aaaaa has very low entropy. But a string that contains more different characters, like wJalrXUtnFEMI / K7MDENG / bPxRfiCYEXAMPLEKEY, has a higher entropy. You can check such lines and find out how the entropy index is calculated using by this Shannon's entropy calculator.

The entropy exponent is a good way to find highly randomized, complex strings. By calculating this metric for string literals used in your code, you can detect suspicious strings of any format.

What's next?

Code submitted to public repositories should be checked for any accidental secrets. If something like this has got into public access, it makes sense to consider it stolen, the corresponding keys, passwords, and so on, need to be changed.

Of course, not all code is open source, not all secret data, hard-coded in the code, end up in public repositories. But such a practice, anyway, can result in a problem, since this data can leave the company in executable files of applications, in the form of logs, if the source code is stolen. A good strategy to minimize the risk of sensitive data leaks is to perform a code scan using pattern search and entropy data analysis. This is done before the code gets into production. And the secret data themselves must be stored either in configuration files, or - using special services designed to manage such data.

Sometimes it may seem that secret data just needs to be stored in code, which, in the form of an application, gets to end users. For example - we can talk about API keys used in mobile applications. In such a case, you can take measures to prevent the possibility of detection of such data. For example, it is better not to give the variables in which some keys are stored, names that unambiguously indicate their contents, such as api_key and password. It is recommended to obfuscate the code, which will complicate the extraction of secret data from it. Finally, you can simply execute the portion of the application code that is responsible for accessing third-party services on the server, avoiding the need to include this code as part of the application package that is passed to the end user.

Always check your code for sensitive data and investigate the possibility of this data falling into the hands of intruders. If such data does not enter the code by accident, think about whether it really should be present in the code, and whether it is well protected.

As a result, I can say that static analysis is the most reliable way to detect secret data that accidentally got into the code.

Translation source

Sep 19, 2021

This is the first time I've seen such a thing. Marvelous!

Частный детектив. Москва. · Sep 19, 2021

Частный детектив. Армения. Ереван. said:
This is the first time I've seen such a thing. Marvelous!

Same!

Детективное агентство Израиль. · Oct 1, 2021

Interesting post.

Search

Search

Finding secret data in the source code

root

Частный детектив. Армения. Ереван.

Частный детектив. Москва.

Зарегистрированный

Детективное агентство Израиль.

Similar threads

Share this page