[Technical Blog] How to Ship your AWS Lambda logs to Google Cloud Logging - The Dig Security Way
If you are part of an application development team, chances are you are familiar with multi cloud environments. According to Gartner (Predicts 2022: Connecting the Digital Enterprise Dec, 2021), more than 75% of organizations use multiple public cloud services today, and the number is growing rapidly.
Adopting a multi-cloud strategy for application development allows you to choose the best tools to deliver faster, more scalable and to provide a better experience for your customers. However, with the freedom to choose your tools, comes the responsibility to track logs and alerts in a centralized manner.
Keeping logs in a centralized repository
If you are in dev operations like me, then keeping all your logs in a centralized repository clearly saves you the time otherwise spent on checking logs in multiple places and across different interfaces. Also, log entries can grow very quickly in volume so you need a resilient and simple solution that stores them all in one place such as Cloudwatch.
But there are other benefits to centralizing your logs such as filtering and searching across the different logs when something breaks and you are searching for the root cause. And then of course there is the security aspect which, being a company in the cyber security space, implementing “security by design” principles is of highest importance. If you are dealing with sensitive and regulated data such as personal information, secrets and company IP, imagine what could happen if this data falls into the wrong hands.
Before we start digging into the topic at hand, I should give you some context to what Dig Security is all about and why solving this issue is so important to us. At Dig Security, we are committed to securing cloud data no matter where the data resides. Dig provides a solution that discovers all the data in a customer's public-cloud environment, then it classifies the data to provide visibility into where the most sensitive data lives. In addition to that, Dig protects sensitive data by highlighting static and dynamic risk indicators that can prevent attacks early in the killchain. Preventing attacks requires a real time data detection and response capability, also known as DDR.
Therefore, at Dig, Every application is mission-critical. We use a Multi-Cloud environment (Microsoft Azure, Google GCP, AWS) and we can’t afford to miss even one line of logging. So imagine how disappointing it was to find that there was no easy way to ship Lambda logs from CloudWatch to Google Cloud Logging (formerly Stackdriver).
AWS Lambda is a popular event-driven serverless platform which makes it an important part of our DDR capabilities. All of our on-demand code (“Lambda”) is hosted in AWS, while the central logging system is under Google Cloud Logging. But there are no native solutions to export lambda logs (which are automatically stored in CloudWatch log groups) to GCP.
But this story has a happy ending and once we figured this out for ourselves we decided to share it with other diggers who might happen to be dealing with the same problem.
Below is a detailed step-by-step explanation on how you can do it yourself. You should also make sure you are familiar with the following pre-reqs before you begin:
- Kubernetes Cluster in any cloud provider
- Docker & Docker repo (ECR / GCR)
- AWS authentication in place (k8s service account / dedicated user with credentials)
- GCP auth JSON
To begin, you need to pre-configure the authentication for the cloud providers:
AWS: implemented via pod IRSA configuration in our case. But you can also configure authentication using aws access & secret key, directly in the fluentd configuration (see example shared later in this blog).
The assigned AWS permissions scope :
GCP: authentication is implemented by creating a dedicated GCP service account, assigning it with “Logs Admin” permissions and generating a private key. The key is mounted from a k8s secret into the fluentd pod to an absolute path which will be automatically recognized by the fluentd GCP plugin (/etc/google/auth/application_default_credentials.json)
Build fluentd custom image
Now that our authentication is in place, let’s build a custom docker image and push it into a dedicated ECR (which can be consumed from our AWS EKS clusters):
Prepare the provided helm custom chart
We now have a ready fluentd image.
In order to deploy it into our k8s clusters, I chose to write a custom helm chart dedicated to this implementation of fluentd.
There were a few key considerations:
- GCP auth json: published as a k8s secret and mounted into the fluentd pod
- fluentd config: published as a k8s configmap and mounted into fluentd pod
- fluentd app: published as a statefulset with a dedicated persistent volume for ongoing operations by fluentd
Generating the fluentd config
Now to the most important part: the fluentd config. The config is dynamic, It is generated via terraform and is calculated based on a list of cloudwatch group names to watch (local.dig_region_lambdas).
The full terraform object which creates the fluentd config can be seen here.
let’s break down each part of the object:
1. The config map resource itself
The config is generated inside the chart folder as a text file, and it will consumed by the chart’s configmap template
2. The source block in the config
Terraform will loop and create source blocks (as text) based on the count of cloudwatch group names provided in the local element. I chose to have a specific source per cloudwatch group in order to be able to tag each source event with the lambda name (which will be used in the next sections).
- The “<parse>” blocks restricts logs to json format only (without the AWS raw text logs).
- The “<web_identity_credentials>” is used for IRSA pod identity configuration (which is optional, you can replace it with aws access & secret key)
3. Filters and fluentd.* logs
Filter #1 is used to perform 2 critical things:
- transform the AWS cloudwatch json log and add a “severity” field, based on on “levelname” field. This is critical as GCP operations is using “severity” to properly ingest logs
- adding a json field called “lambda_name”, based on the fluentd tag in the event
** remember that we tagged the event in the source block **
Filter #2 (actually a log match) will find all of the logs generated by fluentd and route them to the “null” target, removing unnecessary logs from the system.
Filter #3 adds unique insert ids into each log - critical for GCP operations while eliminating log duplication in the target
4. Target destination (match clause)
Will match all of the logs (“**”) and push them to the configured GCP project. notice the usage of labels and zone - this is configurable and it is recommended to set these values according to how you intend to query the logs.
Deploy image with helm
After going over all of the critical components for the system, let's deploy it using helm.
An example of a values.yaml file to provide helm when deploying the chart:
Once the chart is successfully deployed to your cluster, the fluentd pod will spin up and you will be able to see that it loads the relevant add ons and is trying to communicate with the source and target (no visible errors should be seen).
If everything is in order, after a minute you will see a fluentd log “Successfully sent to Stackdriver Logging API”
1. Some chart configurations (service account, pod resources, pod secrets) can be customized further and adapted to specific use cases. The chart is generic and can be changed to fit every environment.
2. Fluentd source block config - I chose to build a source block per cloudwatch group, this can be simplified using fluentd config parameters to dynamically locate cloudwatch groups based on the prefix of the each group - which can be easier to manage.
Examples of the logs:
1. AWS cloudwatch (lambda):
2. GCP operations suite (target logging destination):
That’s it! Very simple indeed. Now you have your lambda logs in Stackdriver, Better security and visibility with alerts.
For us - it’s a life-changer.