DEV Community

Alec Dutcher
Alec Dutcher

Posted on • Edited on

DP-203 Study Guide - Implement data security

Study guide

Implement data masking

  • Dynamic data masking
    • Prevents unauthorized access by limiting exposure of sensitive data
    • Configure masking policies on database fields to designate how much data to reveal to nonprivileged users
    • Available in:
      • Azure SQL Database
      • Azure SQL Managed Instance
      • Azure Synapse Analytics Dedicated SQL pools
      • SQL Server on Azure VMs
  • Masking policies
    • Created in the Security section in the portal or via T-SQL
    • Components of a policy
      • SQL users excluded from masking
      • Masking rule
      • Masking function
  • Masking functions
    • Default - predefined, fully masks a field, replaces values with XXXX
    • Email - replaces portion of address
    • Credit card - replaces everything but last four digits with XXXX
    • Custom text - replace with custom string (consists of exposed prefix, padding string, and exposed suffix)
    • Random number - replaces values with randomly generated values of the same data type and length (T-SQL function 'random(1,45)' where 1,45 are the low and high ends of the range)

Encrypt data at rest and in motion

  • Encryption
    • Uses a key to encrypt and decrypt data
    • Disguises the data through a process of symmetric encryption
    • Encryption key is stored in a secure location such as Key Vault
  • Encryption at rest
    • Encrypting data in a physical location
    • Azure Storage uses managed keys (customer or Microsoft managed)
    • Azure SQL and Synapse SQL use transparent data encryption (TDE) also with service- and customer-managed keys
      • TDE
        • Real-time I/O encryption and decryption of a backup
        • Prevents malicious party from restoring the backup
        • In the portal (Azure SQL) under the Security section, there is a TDE option with an on/off toggle
  • Always Encrypted
    • Protects data in Azure SQL, Azure SQL Managed Instance, and SQL Server databases by encrypting it inside the client application and never revealing the encryption key to the database engine
    • Allows separation of people who view data and those who manage data
    • Uses two types of keys
      • Column encryption keys (CEK) - used to encrypt data in a column
      • Column master keys (CMK) - used to encrypt a CEK
    • Supports two types of encryption
      • Randomized encryption
        • less predictable
        • more secure
        • prevents searching, grouping, indexing, and joining on encrypted columns
      • Deterministic encryption
        • always generates the same encryption value for a given plain text value
        • has opposite qualities of randomized
  • Encryption in motion
    • Securing data moving from one network location to another
    • Solution is transport layer security (TLS)
      • Enabled by default Azure Synapse SQL
      • Can be enabled in Azure Storage via Settings and Configuration

Implement row-level and column-level security

  • Row-level security
    • Restricts records in a table based on user running query
    • Not permission based, but predicate based (rows are hidden based on whether predicate condition is true or false)
    • Security policy defines users and predicates (inline table-valued functions)
    • Policies are created in T-SQL

Image description

Image description

  • RLS best practices

    • Create separate schema for security predicate function
    • Avoid recursion in predicate functions to prevent performance degradation
    • Components need to be dropped in a specific order if RLS is no longer used
      • 1) Security policy
      • 2) Table
      • 3) Function
      • 4) Schemas
    • Avoid excessive table joins in predicate function
  • Column level security

    • Controls access to columns based on user context
    • Configured using GRANT SELECT statement and specifying columns and user

Implement Azure role-based access control (RBAC)

  • Authorization for Azure Data Lake is controlled via
    • Shared key authorization
    • Shared access signature (SAS)
    • Role-based access control (RBAC)
    • Access Control Lists (ACL)
  • RBAC
    • Uses role assignment to apply permissions to security principals (users, groups, managed identities, etc)
    • Can limit access to files, folders, containers, and accounts
    • Roles
      • Storage blob data owner (full container access)
      • Storage blob data contributor (read, write, delete)
      • Storage blob data reader (read, list)

Implement POSIX-like access control lists (ACLs) for Data Lake Storage Gen2

  • ACL
    • Holds rules that grant or deny access to certain environments
    • RBAC is course grained (rice) vs ACL which is fine-grained (sugar)
    • Roles are determined before ACL is applied (if user has RBAC the operation succeeds, if not it falls to ACL)

Implement a data retention policy

  • Can be set on Azure SQL (long-term retention) or Azure Storage (Lifecycle Management)
  • Long-term retention
    • Automatically retain backups in separate blob container for up to 10 years
    • Can be used to recover database through portal, CLI, or Powershell
    • Enabled by defining policy with four parameters
      • Weekly (W)
      • Monthly (M)
      • Yearly (Y)
      • Week of the year (WeekofYear)
  • Lifecycle Management
    • Automated way to tier down files to cool and archive based on modified date
      • Enabled by creating a policy with one or more rules
      • Choose from number of days since blob was created, modified, or accessed (can enable access tracking)

Implement secure endpoints (private and public)

  • Endpoint is an address exposed by a web app to communicate with external entities
  • Service Endpoint
    • Secure and direct access to Azure service/resource over the Azure network
    • Firewall security feature
    • Virtual network rule
    • Allows for private IPs, but still uses a public address
    • Works on Azure SQL, Synapse, and Storage
  • Private link
    • Carries traffic privately so traffic between virtual network and Azure Service travels through the Microsoft Network
    • Uses private address on VNet instead of public address like Service Endpoint

Implement resource tokens in Azure Databricks

  • Token is an authentication method that uses a personal access token (PAT) to connect via REST API
  • PAT
    • Can be used instead of passwords
    • Enabled by default
    • Set expiration date or indefinite lifetime
    • Disabled, monitored, and revoked by workspace admins
  • Create the PAT in the Databricks portal, then use it when setting up the linked service in Azure Synapse

Load a DataFrame with sensitive information

  • Done through encryption using Fernet
  • Fernet

    • Symmetric authenticated cryptography (uses a secret key)
    • from cryptography.fernet import Fernet
    • encryptionKey = Fernet.generate_key()
  • Create master key

    Image description

  • Create UDFs to encrypt/decrypt

    Image description

  • Use UDFs to encrypt/decrypt

    Image description

Write encrypted data to tables or Parquet files

  • Write encrypted data to a table
    • df.write.format("delta").mode("overwrite").option("overwriteSchema", "true").saveAsTable("Table")
  • Write encrypted data to a parquet file
    • encrypted.write.mode("overwrite").parquet("container_address/file_path")

Manage sensitive information

  • Data discovery and classification - discovering, classifying, labeling, and reporting the sensitive data in your databases
  • Capabilities
    • Discovery and recommendations
    • Labeling - apply sensitive classification labels to columns using metadata attributes
    • Query result-set sensitivity - calculates the sensitivity of a query result in real-time
    • Visibility - view DB classification state in a dashboard
  • Defender for Cloud
    • Cloud-native application protection platform (CNAPP)
    • Set of security measures and practices to protect cloud-based apps
    • Continuous monitoring, alerts, and threat mitigation
    • Separate services for Storage and SQL
  • Defender for SQL
    • Discover and mitigate database vulnerabilities
    • Alerts on anomalous activities
    • Performs vulnerability assessments and Advanced Threat Protection
  • Defender for Storage
    • Detects potential threats to storage accounts
    • Prevents three major impacts
      • Malicious file uploads
      • Sensitive data exfiltration
      • Data corruption
    • Includes
      • Activity monitoring
      • Sensitive data threat detection
      • Malware scanning

Top comments (0)