DEV Community

Cover image for Step by Step process to setup Redshift datashare across Redshift clusters
Arvind Toorpu
Arvind Toorpu

Posted on • Edited on

Step by Step process to setup Redshift datashare across Redshift clusters

Creating datashare in redshift clusters :

Introduction:

Every organization has multiple teams – departments that often have to share data to promote unified and accurate decisions about their product or services. Within a classic Data Warehouse, Data Sharing can be time-consuming, stressful, and often dependent on DBA's additional tooling. Additionally, when it comes to accurate testing, users often have to move data themselves or announce well ahead when they will need that data so that DBAs can prepare a testing environment for them.

Finally, AWS has come up with a solution. Amazon Data Sharing is now generally available and with this feature, users can display data created in one cluster to multiple other clusters without any data movement or replication systems you need to buy and set up. Although this is a feature that AWS provides and markets as very simple to use, there are a few things you need to know to set it up and configure it correctly, and this article aims to explain how to do so properly

Commands to Work with Amazon Data Share:
Now that you’ve explored the basics of Amazon Redshift Data Sharing, you can start learning how to use the feature. There are certain processes Redshift users must master to share data effectively. These processes are:

Create Datashare
Alter Datashare
Desc Datashare
Show Datashare
Drop Datashare

Step 1: Create Datashare

The first step to sharing data is to create a datashare. You can create a datashare by entering the following syntax within an Amazon Redshift database:

CREATE DATASHARE datashare_name;
[[SET] PUBLICACCESSIBLE [=] TRUE | FALSE ];
Enter fullscreen mode Exit fullscreen mode

The parameter, [SET] PUBLICACCESSIBLE, states whether you can share the data with clusters that are publicly accessible. This should always be set to FALSE. don't see any reason for you to share publicly.

For example:

create datashare sales_datashare;
Enter fullscreen mode Exit fullscreen mode

Step 2: share schema level all objects in the schema to the consumer.
Set permissions on the database objects that you want to share. For example:
First add schema to datashare, so you can then add objects

alter datashare my source add schema abcd ;

Enter fullscreen mode Exit fullscreen mode

You have the option to add one object or all tables/Views in the schema to Datashare:

For example:

ALTER DATASHARE salesshare ADD ALL TABLES IN SCHEMA abcd;
Enter fullscreen mode Exit fullscreen mode

Step 3: To add/remove objects to datashare
Alter Datashare
This function allows you to add or remove objects from a datashare.
The syntax for this process is:

ALTER DATASHARE datashare_name ADD TABLE schemaname.table_name;

Enter fullscreen mode Exit fullscreen mode

Or

ALTER DATASHARE datashare_name REMOVE TABLE schemaname.table_name;

Enter fullscreen mode Exit fullscreen mode

For example:

ALTER DATASHARE salesshare ADD TABLE abcd.table_name;

Enter fullscreen mode Exit fullscreen mode

Or

ALTER DATASHARE salesshare REMOVE TABLE abcd.table_name;

Enter fullscreen mode Exit fullscreen mode

Note: If you want to ensure that your datashare should include all the future tables that are created under the schema. You should set the parameter includenew = true;

[ SET INCLUDENEW [=] TRUE | FALSE FOR SCHEMA schema ]

Enter fullscreen mode Exit fullscreen mode

A clause that specifies whether to add any future tables, views or SQL user-defined functions (UDFs) created in the specified schema to the datashare. Current tables, views, or SQL UDFs in the specified schema aren't added to the datashare. Only superusers can change this property for each datashare-schema pair. By default, the INCLUDENEW clause is false.

Desc Datashare

Enter fullscreen mode Exit fullscreen mode

This shows all the objects added to a datashare. The syntax for Desc Datashare is:

DESC DATASHARE datashare_name [ OF [ ACCOUNT account_id ] NAMESPACE namespace_guid ]
Enter fullscreen mode Exit fullscreen mode

Account_id indicates the account where the datashare was created.
Namespace_guid is a code number for the datashare.
Show Datashare
Use this function to view the inbound and outbound datashares within a cluster. Here’s how to request Amazon Redshift to show datashares:

SHOW DATASHARES [ LIKE 'namepattern' ]
Enter fullscreen mode Exit fullscreen mode

Namepattern refers to the similar characters that all the requested datashares have.
LIKE is an optional clause that matches the name pattern with the description of the datashares within an account.
Drop Datashare
This deletes a datashare object from a cluster. The syntax for Drop Datashare is:

DROP DATASHARE datashare_name;

Enter fullscreen mode Exit fullscreen mode

Amazon Redshift Data Sharing Use Cases
Now that you have gained a basic understanding of Amazon Redshift Data Sharing capability, below are some of the use cases listed where this feature is commonly used.

Step 4: Grant usage to the consumer cluster
Now to allow a consumer to access data, we need to grant usage on dtashare to that AWS account's Namespace. Allow permissions on the consumer cluster namespace to access the datashare.
For example:

grant usage on datashare salesshare to namespace '2b12345-1234-5678-9012-bb1234567890';
Enter fullscreen mode Exit fullscreen mode

*ON AWS Console We have to authorize the producer and consumer to allow data flow between the clusters *


There is an important step that needs to be done manually on the AWS console datashares tab.

Authorize On Producer AWS Console.
we need to authorize the producer by clicking on

datashare >> datashare name >> select authorize.
**
Associate on Consumer AWS Console.
we need to associate the producer by clicking on datashare name and **select associate in
datashares >> from other accounts. >> datasharename >> associate region/namespace
.


*---------- On Consumer(cluster) side ----------
*

On the consumer cluster, you create a database from the datashare. These steps describe how to share data between two clusters in the same account. For information on sharing data across AWS accounts, see Sharing data across AWS accounts in the Amazon Redshift Database Developer Guide.

You can use SQL commands or the query editor v2 tree-view panel to create the database.

Step 1: ** Create a database that will hold the producer database info
Create a database from the datashare for your account and the namespace of the producer cluster. For example:
**For example use SQL:

create database sales_shared_db from datashare salesshare of account '123456789012' namespace 'p1234567-8765-4321-p10987654321'; 
Enter fullscreen mode Exit fullscreen mode

Set permissions so that users can access the database and the schema.

Step 2 : Create external schemas
Create external schemas under your local database based on your shared database.

CREATE EXTERNAL SCHEMA 'local_schema_name' FROM REDSHIFT DATABASE 'sales_shared_db' SCHEMA 'shared_schema_name';
Enter fullscreen mode Exit fullscreen mode

In our case that would be:

CREATE EXTERNAL SCHEMA 'sales_ext_schema' FROM REDSHIFT DATABASE 'sales_shared_db' SCHEMA 'abcd';
Enter fullscreen mode Exit fullscreen mode

Note: For this option, you won't need to create any additional objects (tables, views) of this schema since they will be automatically created.
Step 3: Grant access to the external schema to :

grant usage on database sales_shared_db to usernames/groups/roles;
grant usage on schema sales_ext_schema to usernames/groups/roles;
Enter fullscreen mode Exit fullscreen mode

Set permissions so that users/groups can access the database and the schema.

Top comments (0)