Introduction
Handling sensitive data securely is critical, especially when migrating from one database to another. In this article, we'll explore how to manage and store confidential data in ClickHouse, focusing on a scenario where we migrate data from MySQL daily. We will use encryption to ensure data security and implement views to control access.
Scenario
We have a MySQL table containing users' general information, including confidential fields like phone numbers and national codes. Our goal is to migrate this data to ClickHouse every day while ensuring the sensitive fields are encrypted.
MySQL Table Setup
First, let's set up our MySQL database and table:
CREATE DATABASE users;
CREATE TABLE users.general_info (
id INT,
first_name VARCHAR(255),
last_name VARCHAR(255),
phone_number VARCHAR(255),
national_code VARCHAR(255)
);
INSERT INTO users.general_info VALUES
(1, 'Shahab', 'Ranjbary', '0912345678', '123456'),
(2, 'Sepehr', 'Ranjbary', '0935999999', '982345'),
(3, 'Mostafa', 'Fekri', '09192222222', '3244442'),
(4, 'Saber', 'Farshbaf', '0912333435', '454554');
ClickHouse Table Setup
Next, we create a similar table in ClickHouse with an additional column for the initialization vector (IV) used in encryption:
CREATE DATABASE users;
CREATE TABLE users.general_info (
id INT,
first_name VARCHAR(255),
last_name VARCHAR(255),
phone_number VARCHAR(255),
national_code VARCHAR(255),
iv VARCHAR(255)
)
ENGINE = MergeTree
PRIMARY KEY id
ORDER BY id
SETTINGS index_granularity = 8192;
Data Migration and Encryption
We'll use the encrypt function to insert data into ClickHouse. This function will encrypt the confidential fields using AES-256-GCM encryption. You can automate this process using Airflow or other scheduling tools.
INSERT INTO users.general_info
SELECT
id,
encrypt('aes-256-gcm', first_name, 'keykeykeykeykeykeykeykeykeykey01', 'iv1') AS first_name,
encrypt('aes-256-gcm', last_name, 'keykeykeykeykeykeykeykeykeykey01', 'iv1') AS last_name,
encrypt('aes-256-gcm', phone_number, 'keykeykeykeykeykeykeykeykeykey01', 'iv1') AS phone_number,
encrypt('aes-256-gcm', national_code, 'keykeykeykeykeykeykeykeykeykey01', 'iv1') AS national_code,
'iv1' AS iv
FROM mysql('mysql:3306','users', 'general_info', 'root','123456');
Real-Time Data Handling
If you use Kafka, NATS, or another streaming service, you can use a materialized view in ClickHouse to encrypt data upon ingestion.
CREATE MATERIALIZED VIEW users.general_info_mv TO users.general_info AS
SELECT
id,
encrypt('aes-256-gcm', first_name, 'keykeykeykeykeykeykeykeykeykey01', 'iv1') AS first_name,
encrypt('aes-256-gcm', last_name, 'keykeykeykeykeykeykeykeykeykey01', 'iv1') AS last_name,
encrypt('aes-256-gcm', phone_number, 'keykeykeykeykeykeykeykeykeykey01', 'iv1') AS phone_number,
encrypt('aes-256-gcm', national_code, 'keykeykeykeykeykeykeykeykeykey01', 'iv1') AS national_code,
'iv1' AS iv
FROM kafka('kafka_broker:9092', 'topic', 'group', 'format');
Decryption and Secure Access
For users who need access to decrypted data, we create a view that decrypts the confidential fields on the fly using the tryDecrypt
function.
CREATE VIEW users.general_info_decryption AS
SELECT
id,
tryDecrypt('aes-256-gcm', first_name, 'keykeykeykeykeykeykeykeykeykey01', iv) AS first_name,
tryDecrypt('aes-256-gcm', last_name, 'keykeykeykeykeykeykeykeykeykey01', iv) AS last_name,
tryDecrypt('aes-256-gcm', phone_number, 'keykeykeykeykeykeykeykeykeykey01', iv) AS phone_number,
tryDecrypt('aes-256-gcm', national_code, 'keykeykeykeykeykeykeykeykeykey01', iv) AS national_code
FROM users.general_info;
Managing Permissions
We can control data access by granting permissions selectively to users:
-- Grant access to the main table for all users
GRANT SELECT ON users.general_info TO public;
-- Grant access to the decrypted view only to specific users
GRANT SELECT ON users.general_info_decryption TO admin;
Conclusion
We can securely handle confidential data by leveraging ClickHouse's encryption functions and carefully managing user access. This approach ensures that sensitive information remains protected during storage and transit, while still being accessible to authorized users.
Additional Tips
- Key Management: Use a secure key management system (KMS) to store and rotate encryption keys. Avoid hardcoding keys in your scripts or configurations.
- Auditing: Implement logging and monitoring to audit access to encrypted and decrypted data.
- Performance: Be mindful of the performance implications of encryption and decryption. Test your setup to ensure it meets your performance requirements.
By following these best practices, you can enhance the security and compliance of your data handling processes in ClickHouse.
For more information on ClickHouse's encryption functions and best practices, refer to the ClickHouse documentation.
Top comments (2)
nice
nice