Implement data masking
-
Dynamic data masking
- Prevents unauthorized access by limiting exposure of sensitive data
- Configure masking policies on database fields to designate how much data to reveal to nonprivileged users
- Available in:
- Azure SQL Database
- Azure SQL Managed Instance
- Azure Synapse Analytics Dedicated SQL pools
- SQL Server on Azure VMs
- Masking policies
- Created in the Security section in the portal or via T-SQL
-
Components of a policy
- SQL users excluded from masking
- Masking rule
- Masking function
-
Masking functions
- Default - predefined, fully masks a field, replaces values with XXXX
- Email - replaces portion of address
- Credit card - replaces everything but last four digits with XXXX
- Custom text - replace with custom string (consists of exposed prefix, padding string, and exposed suffix)
- Random number - replaces values with randomly generated values of the same data type and length (T-SQL function 'random(1,45)' where 1,45 are the low and high ends of the range)
Encrypt data at rest and in motion
-
Encryption
- Uses a key to encrypt and decrypt data
- Disguises the data through a process of symmetric encryption
- Encryption key is stored in a secure location such as Key Vault
-
Encryption at rest
- Encrypting data in a physical location
- Azure Storage uses managed keys (customer or Microsoft managed)
- Azure SQL and Synapse SQL use transparent data encryption (TDE) also with service- and customer-managed keys
- TDE
- Real-time I/O encryption and decryption of a backup
- Prevents malicious party from restoring the backup
- In the portal (Azure SQL) under the Security section, there is a TDE option with an on/off toggle
- TDE
-
Always Encrypted
- Protects data in Azure SQL, Azure SQL Managed Instance, and SQL Server databases by encrypting it inside the client application and never revealing the encryption key to the database engine
- Allows separation of people who view data and those who manage data
- Uses two types of keys
- Column encryption keys (CEK) - used to encrypt data in a column
- Column master keys (CMK) - used to encrypt a CEK
- Supports two types of encryption
-
Randomized encryption
- less predictable
- more secure
- prevents searching, grouping, indexing, and joining on encrypted columns
-
Deterministic encryption
- always generates the same encryption value for a given plain text value
- has opposite qualities of randomized
-
Randomized encryption
-
Encryption in motion
- Securing data moving from one network location to another
- Solution is transport layer security (TLS)
- Enabled by default Azure Synapse SQL
- Can be enabled in Azure Storage via Settings and Configuration
Implement row-level and column-level security
-
Row-level security
- Restricts records in a table based on user running query
- Not permission based, but predicate based (rows are hidden based on whether predicate condition is true or false)
- Security policy defines users and predicates (inline table-valued functions)
- Policies are created in T-SQL
-
RLS best practices
- Create separate schema for security predicate function
- Avoid recursion in predicate functions to prevent performance degradation
- Components need to be dropped in a specific order if RLS is no longer used
- 1) Security policy
- 2) Table
- 3) Function
- 4) Schemas
- Avoid excessive table joins in predicate function
-
Column level security
- Controls access to columns based on user context
- Configured using GRANT SELECT statement and specifying columns and user
Implement Azure role-based access control (RBAC)
- Authorization for Azure Data Lake is controlled via
- Shared key authorization
- Shared access signature (SAS)
- Role-based access control (RBAC)
- Access Control Lists (ACL)
- RBAC
- Uses role assignment to apply permissions to security principals (users, groups, managed identities, etc)
- Can limit access to files, folders, containers, and accounts
- Roles
- Storage blob data owner (full container access)
- Storage blob data contributor (read, write, delete)
- Storage blob data reader (read, list)
Implement POSIX-like access control lists (ACLs) for Data Lake Storage Gen2
- ACL
- Holds rules that grant or deny access to certain environments
- RBAC is course grained (rice) vs ACL which is fine-grained (sugar)
- Roles are determined before ACL is applied (if user has RBAC the operation succeeds, if not it falls to ACL)
Implement a data retention policy
- Can be set on Azure SQL (long-term retention) or Azure Storage (Lifecycle Management)
-
Long-term retention
- Automatically retain backups in separate blob container for up to 10 years
- Can be used to recover database through portal, CLI, or Powershell
- Enabled by defining policy with four parameters
- Weekly (W)
- Monthly (M)
- Yearly (Y)
- Week of the year (WeekofYear)
-
Lifecycle Management
- Automated way to tier down files to cool and archive based on modified date
- Enabled by creating a policy with one or more rules
- Choose from number of days since blob was created, modified, or accessed (can enable access tracking)
- Automated way to tier down files to cool and archive based on modified date
Implement secure endpoints (private and public)
- Endpoint is an address exposed by a web app to communicate with external entities
- Service Endpoint
- Secure and direct access to Azure service/resource over the Azure network
- Firewall security feature
- Virtual network rule
- Allows for private IPs, but still uses a public address
- Works on Azure SQL, Synapse, and Storage
- Private link
- Carries traffic privately so traffic between virtual network and Azure Service travels through the Microsoft Network
- Uses private address on VNet instead of public address like Service Endpoint
Implement resource tokens in Azure Databricks
- Token is an authentication method that uses a personal access token (PAT) to connect via REST API
- PAT
- Can be used instead of passwords
- Enabled by default
- Set expiration date or indefinite lifetime
- Disabled, monitored, and revoked by workspace admins
- Create the PAT in the Databricks portal, then use it when setting up the linked service in Azure Synapse
Load a DataFrame with sensitive information
- Done through encryption using Fernet
-
Fernet
- Symmetric authenticated cryptography (uses a secret key)
- from cryptography.fernet import Fernet
- encryptionKey = Fernet.generate_key()
Write encrypted data to tables or Parquet files
- Write encrypted data to a table
- df.write.format("delta").mode("overwrite").option("overwriteSchema", "true").saveAsTable("Table")
- Write encrypted data to a parquet file
- encrypted.write.mode("overwrite").parquet("container_address/file_path")
Manage sensitive information
- Data discovery and classification - discovering, classifying, labeling, and reporting the sensitive data in your databases
- Capabilities
- Discovery and recommendations
- Labeling - apply sensitive classification labels to columns using metadata attributes
- Query result-set sensitivity - calculates the sensitivity of a query result in real-time
- Visibility - view DB classification state in a dashboard
- Defender for Cloud
- Cloud-native application protection platform (CNAPP)
- Set of security measures and practices to protect cloud-based apps
- Continuous monitoring, alerts, and threat mitigation
- Separate services for Storage and SQL
- Defender for SQL
- Discover and mitigate database vulnerabilities
- Alerts on anomalous activities
- Performs vulnerability assessments and Advanced Threat Protection
- Defender for Storage
- Detects potential threats to storage accounts
- Prevents three major impacts
- Malicious file uploads
- Sensitive data exfiltration
- Data corruption
- Includes
- Activity monitoring
- Sensitive data threat detection
- Malware scanning
Top comments (0)