Your data is the foundation for all the insights and strategies that drive your business forward. But the more data you collect, the higher the chances you encounter problems managing it.
Without a solid data management strategy, you're essentially crossing your fingers in hopes that nothing goes wrong. At best, this approach leads to inefficiencies and increased costs as you scramble to patch up emerging problems. At worst, it triggers a detrimental cycle that erodes trust in the data and in the capabilities of the teams managing it.
Thankfully, we have decades of industry best practices when implementing processes and tools to establish data governance practices that ensure trust in data. Effective data governance includes a wide range of principles and practices that may overlap with availability, usability, integrity, security, and compliance. And a big part of a proper data governance strategy in Snowflake involves using tags.
Why use tags in Snowflake
Think of your data as a city. For fun, let’s call it NYC. Here, your data points are the people of NYC, data tables are skyscrapers, and your data warehouse is the entire metropolitan area. Without any form of organization, finding what you need when you need it would be like trying to locate a single person amongst the 8.5 million who live there (all without knowing anything about where they live or work).
Data classification, in this analogy, is NYC’s zoning laws and addressing system. Residential, commercial, and industrial zones allow people to understand the general purpose and location of different areas. Tagging adds detailed signs and labels to each building, street, and neighborhood which provides additional, specific information beyond the basic structure provided by zoning and addresses.
TLDR: Data classification groups data into categories (e.g., sensitive information, financial records, or customer data). Tagging is a data classification technique that further refines these categories. It enables data stewards to monitor sensitive data for compliance, discovery, protection, and resource usage use cases through either a centralized or decentralized data governance management approach.
In Snowflake, a tag is a schema-level object that is assigned to another object as an arbitrary string value. This object-string match is then stored as a key-value pair that is unique to your schema.
There are multiple benefits to using these in Snowflake:
- Ease of use: You can define a tag once and apply it to as many different objects as desirable.
- Tag lineage: Since tags are inherited, applying the tag to objects higher in the securable objects hierarchy results in the tag being applied to all child objects. For example, if a tag is set on a table, the tag will be inherited by all columns in that table. This makes it easier to track object relations in future audits.
- Sensitive data tracking: Tags simplify identifying sensitive data (e.g. PII, Secrets). Then you add tags to tables, views, and columns, you can easily find many parts of the database that hold sensitive information just by searching for these tags. Once found, data stewards can figure out the best way to share it safely. They might limit who can see certain rows or decide if the data should be tokenized, fully masked, partially masked, or unmasked.
- Resource usage tracking: Tags bring visibility to Snowflake resource usage. With data and metadata in the same system, analysts can quickly determine which resources consume the most Snowflake credits based on the tag definition (e.g. cost_center, department).
To support different management approaches and fulfill regulatory requirements, we recommend starting your tagging journey with a proper strategy.
Tagging Strategy
Tags are extremely versatile; they can be linked to various types of objects, such as warehouses and tables, simultaneously. That’s why you need a tagging strategy before you begin object tagging in your account. This high-level plan should include:
- A list of objects or datasets that need tags
- Tag naming convention
- Use cases for assigning specific tags
Because tags are schema-level objects in Snowflake, they can be created in a single database/schema and assigned to other Snowflake objects across the account. You can choose between two approaches here: centralized or decentralized.
Centralized vs. Decentralized Approach
In a centralized approach to tagging, a single team or department within the organization is responsible for defining, implementing, and managing the tagging strategy. This team sets the standards for how data is categorized, ensures consistency in tagging across all data assets, and monitors compliance with these standards.
While it has plenty of benefits (consistency, control, and efficiency), a centralized approach can also lead to bottlenecks, as the centralized team may become a single point of failure or delay in the tagging process.
Conversely, a decentralized approach allows individual departments or teams within an organization to define and manage their own tags according to their specific needs and use cases. This approach inherently has more flexibility, speed, and customization, but without overarching governance, it often leads to inconsistencies in tagging across the organization, which can complicate efforts to manage data at a company-wide level.
Both approaches have their pros and cons, so the “right approach” really comes down to what’s most important to your organization. Either way, you’ll want to audit the tags periodically and make changes to the tagging plan according to the business context.
How to create and manage tags in Snowflake
When assigning tags, identifiers can either be consistent across multiple objects (e.g., all tagged objects labeled as sales) or vary to reflect different categories like engineering, marketing, or finance. To manage tags for objects in Snowflake using SQL commands, you generally create a tag, assign it to an object, and query that object by its tags. Here’s what each of those steps looks like:
1. Create tags
Creating a tag is a simple query with two parts :
- Use the
CREATE TAG
statement - Specify the tag string value when assigning the tag to an object.
Here’s what that looks like:
CREATE TAG tag_name;
2. Assign a Tag to an Object
After creating a tag, you can assign it to an object such as a table, view, or column. You can do this either when you're creating the object or by altering an existing object.
When creating a new object, you can assign a tag directly in the CREATE
statement by using the TAG clause as follows:
CREATE TABLE table_name (
column_name datatype
) TAG tag_name = 'tag_value';
Replace table_name
, column_name
, datatype
, tag_name
, and tag_value
with your specific details. Note that the tag_value
is the value you want to assign to this tag for the object.
To assign a tag to an existing object, use the ALTER
statement:
ALTER TABLE table_name SET TAG tag_name = 'tag_value';
This example assigns a tag to a table, but you can similarly alter other types of objects.
3. Query Objects by Tags
To find objects that have been assigned a specific tag or to see the tags associated with a particular object, you can use the SHOW TAGS statement or query the INFORMATION_SCHEMA.TAG_REFERENCES
view.
To find objects with a specific tag, use the following command:
SHOW TAGS LIKE 'tag_name';
This will list objects that have been tagged with tag_name.
To query specific tag assignments:
SELECT * FROM INFORMATION_SCHEMA.TAG_REFERENCES
WHERE TAG_NAME = 'tag_name';
This query returns detailed information about objects tagged with tag_name, including the object type and tag value.
How to manage tag quotas
When specifying tags, here are some “quotas” to keep in mind for each object tag:
- The string value for each tag can be up to 256 characters, with the option to specify allowed values for a tag.
- Snowflake allows a maximum number of 50 unique tags that can be set on a single object.
- In a CREATE or ALTER statement 100 is the maximum number of tags that can be specified in a single statement.
- For a table or view and its columns, the maximum number of unique tags that can be specified in a single CREATE or ALTER atement is 100.
Luckily, you can manage the tag quotas for an object easily:
- Query the
TAG_REFERENCES
view to determine the tag assignments. - Unset the tag from the object or column. For objects, you can use the corresponding
ALTER... UNSET TAG
command. For a table or view column, use the correspondingALTER { TABLE | VIEW } ... { ALTER | MODIFY } COLUMN ... UNSET TAG
command. - Drop the tag using a
DROP TAG
statement.
Your organizational structure in Snowflake (i.e. tagging strategy) can also be applied in Metaplane. So if you want to apply a tag to a collection of tables identified by absolute paths like {database}.{schema}.{table}
, Metaplane can help.
In Metaplane, you can bulk-apply tags from a custom dashboard. After navigating to the dashboard using the tag you'd like to bulk apply to objects or monitors, you'll be able to search and add additional objects. From there, you can set up alerting rules based on your tags (e.g. you can direct alerts for all data with a particular tag to a channel, email, or other alert destination). And when an incident occurs in Metaplane, you can view the tags that are affected to help your team prioritize incidents that affect critical data over ones that don't.
Want to see how Metaplane tags can improve your overall governance strategy? Talk to us or start a free trial today.
Top comments (0)