Unable to locate credentials in AWS
The Problem
If you have servers in AWS doing a high volume of AWS service requests, you may come across some rare but frustrating sporadic credential errors like these:
"Unable to locate credentials"
or if you're using aws-sdk in Node.js:
"CredentialsProviderError: Could not load credentials from any providers"
I'm not totally sure why these errors happen, but typically I see them happen across multiple services, accounts and regions around the same time, which leads me to believe that there can be some sporadic flakiness in the metadata service used for fetching IAM credentials.
I tried using metadata retries and other configuration parameters to prevent this, but they didn't seem to make any difference.
The Solution
Looking for a solution, I found this buried in the AWS documentation for instance metadata retrieval:
"If you're using the IMDS to retrieve AWS security credentials, avoid querying for credentials during every transaction or concurrently from a high number of threads or processes, as this might lead to throttling. Instead, we recommend that you cache the credentials until they start approaching their expiry time."
Now, I don't think this throttling was the source of all the errors I was seeing, but it may be playing a role. Maybe the metadata service tolerance for throttling changes over time as demand changes, I don't know.
Either way, this gave me an idea to write a bash script to cache the IAM credentials in ~/.aws/credentials
so they could be used by both the AWS CLI, and also any Node.js or Python clients accessing the AWS services:
#!/bin/bash
IMDS_URL="http://169.254.169.254/latest/meta-data/iam/security-credentials/"
AWS_CREDENTIALS_PATH="~/.aws/credentials"
PROFILE_NAME="default"
# 4.5 minutes, because new credentials appear 5 minutes before expiry
EXPIRY_BUFFER=270
get_aws_credentials() {
local role_name=$(curl -s $IMDS_URL)
local credentials_url="${IMDS_URL}${role_name}"
local response=$(curl -s $credentials_url)
local access_key_id=$(echo $response | jq -r '.AccessKeyId')
local secret_access_key=$(echo $response | jq -r '.SecretAccessKey')
local token=$(echo $response | jq -r '.Token')
local expiration=$(echo $response | jq -r '.Expiration')
local expiration_time=$(date -d "$expiration" +%s)
echo "[$PROFILE_NAME]" > $AWS_CREDENTIALS_PATH
echo "aws_access_key_id = $access_key_id" >> $AWS_CREDENTIALS_PATH
echo "aws_secret_access_key = $secret_access_key" >> $AWS_CREDENTIALS_PATH
echo "aws_session_token = $token" >> $AWS_CREDENTIALS_PATH
echo "expiration = $expiration_time" >> $AWS_CREDENTIALS_PATH
}
should_fetch_credentials() {
if [[ ! -f $AWS_CREDENTIALS_PATH ]]; then
return 0
fi
local expiration_time=$(grep 'expiration' $AWS_CREDENTIALS_PATH | cut -d ' ' -f 3)
local current_time=$(date +%s)
if (( $current_time + $EXPIRY_BUFFER > $expiration_time )); then
return 0
fi
return 1
}
if should_fetch_credentials; then
get_aws_credentials
fi
Since the credentials have to be refreshed every few hours, I set it up to run in a cron job every minute, to check if the expiration time has come:
* * * * * /home/ec2-user/credentials.sh > /dev/null 2>&1
Voila! No more credential errors! I hope that helps. Let me know if you've run into the same errors, and if you found this approach useful.