Redshift Security: Access and Data Flows Explained - an Inside-Out Series

Redshift Security: Access and Data Flows Explained - an Inside-Out Series
Ofir ShatyOfir Shaty
Redshift Security: Access and Data Flows Explained - an Inside-Out Series

TL;DR This is a first in a series of blogs that exposes security risks, possible attack vectors, and how to hunt and prevent them using Data detection and Response (DDR).

When correctly configured, Redshift’s security features and capabilities protect sensitive data from leaks or other security incidents. However, often, organizations struggle to correctly configure and implement these features because the manual processes can be time-consuming and require specialized skills.

To mitigate security and privacy risks, you need to appropriately configure security features and follow practices. Implementing robust access management controls is the first step to protecting sensitive data stored in Redshift.

Why research data security in Redshift?

Redshift is a fully managed data warehouse service (such as Google BigQuery and Snowflake). Based on PostgreSQL, its performance enables extremely fast query execution on extremely big data sets by employing several features such as:

  • Massive Parallel Processing (MPP)
  • Columnar data storage 
  • Data compression

According to Amazon, Redshift delivers business value with two to seven times better price performance on high-concurrency and low-latency real world analytics. A popular service, Redshift, is used by more than 20,000 companies.

As a widely used data warehouse containing sensitive personal and intellectual property data, you need to secure your deployment to comply with privacy laws and mitigate data breach risk.

Implement least privileged access to sensitive data on redshift

Access control is the front gate to the data and the assets containing the data.
Features like RBAC and SSO are the first security guards to limit access to sensitive information inside the organization.

During our work with customers we have seen so many cases of excessive privileges that are difficult to track and exposing their data to a real danger.
Strong access control policy can help to protect against external threat and inside threat by limiting users access to sensitive data.

Using SSO authentication

Redshift offers identity federation as well as database user management.

With the Redshift browser SAML plugin, you can launch a custom AWS SSO SAML application that provides the SAML attributes needed to connect to Amazon Redshift using a JDBC/ODBC client.

When configured, AWS SSO authenticates the user identity against the identity source directory.

Bypass Redshift user authentication with IAM access

In most cases, Redshift users log-in with their database username and password. However, to prevent Redshift administrators from maintaining usernames and passwords, AWS enables you to create user credentials based on IAM credentials to access the databases in Redshift.

The AWS Redshift API GetClusterCredentials returns the user name and a temporary password for the database along with temporary authorization to log in.

To get the credentials, you need to assign a policy with the redshift:GetClusterCredentials to your User/Role.

After the temporary credentials are obtained, Redshift-data api can be used to perform SQL queries.

Usually, you need to specify a user in the request parameters. However, if the "AutoCreate" option is "True", a new user is created automatically.

Pro-Tip

It is recommended to know exactly which users and roles have “redshift:GetClusterCredentials” and “redshift-data” permissions, as they can potentially connect and read/write your sensitive data.

To check if there are policies allowing to generate temporary credentials for Redshift cluster, you can follow these steps and search for policies that allow the action “redshift:GetClusterCredentials”:

After identifying the relevant policies, check the attached users, groups and roles. Further, you should also review inline policies for users and roles which may contain this privilege.

To get temporary credentials to connect to Redshift with cli:

After the credentials are obtained, it is possible to execute queries against Redshift by running the following cli command:

Protecting Data in Redshift with RBAC Authorization

Access to database objects, such as tables and views, is controlled by local users, roles and groups in the Amazon Redshift database. These users are managed separately from AWS standard IAM with local users and roles acting similarly to standard RDBMS mechanisms.

The local users and roles act the same as the standard RDBMS mechanism.
Redshift admin can create users and groups, and assign them the relevant privileges they need. It is recommended to create a user per identity so later in any case of security incident it will be possible to track and monitor what data was accessed or even prevent unauthorized access before it happens.

The default users are the "rdsdb" user, and the user you created in the creation process.
They are both Supeusers, when the user  rdsdb is used internally by Redshift to perform routine administrative and maintenance tasks  and only rdsdb can update the system catalog.

Fine-grained data security with access control

Redshift gives you fine-grained data access controls that allow you to mitigate risk at the column- and row-levels.

Column-level security

You can grant user permissions to a subset of columns.

Similar to managing any database object, column-level access control can be done by using GRANT and REVOKE statements at the column level.

Row-level security (RLS)

In order to ensure that sensitive data is only accessible by those who need it, Redshift provides fine-grained access control at the row level with its row-level security (RLS) feature.

Based on the foundation of role based access control (RBAC), the RLS feature restricts who can access specific records within tables based on security policies defined at the database object level. Unlike other tools that require you to create different views or use different databases and schemas for different users, this approach is sustainable.

An RLS policy can control whether Amazon Redshift returns any existing rows from a table in a query by specifying expressions.

Protecting Sensitive Data on Redshift by Limiting network access

Disable public access

Redshift is not publicly accessible by default so access to redshift clusters is possible only from inside the VPC or through a VPN.
If the Public Access configuration is enabled, it is possible to connect to the Redshift cluster from anywhere without any restrictions.

Pro-Tip

We strongly recommend blocking public access to Redshift clusters.

Disable public access with cli:

Disable public access with AWS console:

Under the “Properties” tab, click “Edit” on the “Network and security settings” section:

Control network access with security group

Another option to control access to the Redshift cluster is by setting the cluster’s security group inbound rules. The security group is like firewall rules for the specific AWS service, by limiting to specific IPs, subnets, ports etc we can add another layer of security to our Redshift cluster.

Pro-Tip

It is recommended to use the security group as a second layer of security and not use it together with public access to avoid mistakes that can easily lead to your data being exposed publicly.

Enable enhanced VPC routing

When enhanced VPC routing is disabled, Amazon Redshift routes traffic through the internet, even traffic to other services within the AWS network.

Pro-Tip

It is recommended to enable enhanced VPC routing to add another layer of security to your data at-transit.

To enable enhanced VPC routing with cli:

To enable enhanced VPC routing with AWS console:
Under the “Properties” tab, on the “Network and security settings”, click on “Edit”.
Change the configuration to “Turn on” then click “Save changes”:

Manage your external data sources

As part of the AWS ecosystem, Redshift can use other services as data sources, to import or export data.

IAM role association with your cluster

Redshift can be associated with IAM roles that authorize the cluster to access external data sources like S3, DynamoDB etc.

Pro-Tip

It is recommended to make sure that the roles are configured correctly with the least privilege principle to avoid data leakage.

We will dig deeper into the data sources risks in our next blog.

Add or remove roles with cli:

Add or remove roles with AWS console:
Under the “Properties” tab, on the “Cluster permissions” section, click on “Manage IAM roles”

Data sharing with Redshift clusters

With the Redshift datashares feature it is possible to share data across Redshift clusters in the same account or in different accounts. The feature gives instant access to data without the need to copy or move it.
It is important to make sure that the cluster shares only data that is intended to be shared.

It is possible to share data on schema db level, schema level or table level.

Keep in mind that any data that leaves the cluster reduces the control on it, do not share sensitive data that shouldn’t leave the cluster.

Pro-Tip

In order to get full visibility to your current data shares status in your account, please follow the next steps.

To show Redshift data shares from cli:

To see your data shares from inside Redshift:

Pro-Tip

reduce the scope of sensitive information shared with other accounts.

To modify the tables or schemas that shared in the datashare you can use the alter command:

Another important aspect when creating a data share is whether your data share can be shared with Redshift clusters that are publicly accessible.

Pro-Tip

It is recommended to limit the data share so it won’t be shared with public clusters.

Below some examples for creating data shares:

In case that the data share is already set as publicly accessible it is possible to modify this setting to false:

Summary and How Dig Security can help with Data Security Posture Management for Redshift

Redshift is an awesome service with great capabilities that offers many security features.
We reviewed many security aspects that need to be considered, reducing the attack surface is the most challenging but important part of prevention.

Appropriately configuring Redshift access controls may be challenging, but it can be done manually. However, continuously monitoring your controls to ensure that they function as intended becomes a challenge, especially as you scale your data analytics usage.

Dig prevents exposure of sensitive data with full data security posture management (DSPM) capabilities, highlighting data misconfigurations, access anomalies and data vulnerabilities that increase the risk of a data breach. Dig enriches DSPM with real time data detection and response (DDR) to ensure an immediate handling of newly discovered data related incidents by integrating with your existing security solutions.

Interested in hearing more? Contact us.

What's next?

In our next blog in the series of Redshift inside-out we will continue to review security features of Redshift including encryption, backups and logs. More to come in our series exploring Redshift possible attack vectors, and how to hunt and prevent them (Data detection and response -DDR).
Stay tuned and secure, Dig Security.

About the Authors

Ofir Shaty is a Senior Security Researcher at Dig Security. A member of the Security Research team, Ofir has over 6 years of experience in Data Security and Web Application Security. When away from work on sunny days Ofir likes to hit the road with his motorcycle.
Ofir Balassiano is leading the security research team at Dig Security.
Ofir has over 8 years of experience in security research specifically in cloud security as well as low-level OS internals research.
Pro-Tip

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed consectetur do eiusmod tempor incididunt eiusmod.