DEV Community

Beatrice Akaeme for AWS Community Builders

Posted on

DATA STORAGE ON AWS

In modern businesses and technological organizations, data storage plays a very important role in making sure organizations reach their full potential in terms of their information assets. With the emergence of digital data, ranging from customer information and transaction records to sensor data and multimedia content, storing data effectively is more critical than ever. Now lets talk about its importance:

"Data Storage in the Modern Business and Technological organizations'
Data storage stands as the foundation upon which organizations build their digital infrastructure in this digital age and data-driven world. The rapid evolution of technology has led to an extraordinary generation of data from various sources and formats. Data is used in following sectors; for e-commerce transactions, social media interactions, for industrial sensors and scientific research, data has become the core of business operations and decision-making.

Data storage serves as the storehouse for every valuable information, providing the means to capture, retain, and access data reliably and efficiently. Businesses use stored data to gain insights into customer behaviors, track operational performance, develop innovative products, and refine strategies for growth.

Types of AWS Data Storage Services

Amazon Web Services (AWS) offers a variety of data storage services to cater to different use cases and requirements. These services can be broadly categorized into four main types:

Object Storage Services

Amazon S3 (Simple Storage Service): Amazon S3 is a large highly scalable object storage service that allows you to store and retrieve large amounts of data, such as images, videos, backups, and logs. It offers different storage classes based on data access patterns and costs.

Image description

Components Of Amazon S3:
1.Buckets:A bucket is a high-level cloud-based data storage service container that holds objects (files) in Amazon S3. After creating a bucket in the region of your choice which is closest to you, give it a globally unique name and choose the storage. It acts as a logical unit for organizing and managing objects.
Naming: Bucket names are globally unique across all of Amazon S3.

Objects: Is used for storage on Amazon S3 which can be data files, including documents, photos, and videos. It is made up of data, key and metadata. Object is identified by a unique key.

Key: The key is a unique identifier for each object within a bucket. It represents the full path of the object inside the bucket. It can be further broken down into “prefix”(folders, that is, the pathname) and “object name”. Can be used for organizing, managing and retrieving objects within a bucket.

Object URLs:
Every object in Amazon S3 has a unique URL or web address that allows direct access to a specific bucket or object within the bucket over the internet. The URL typically follows a pattern like https://.s3..amazonaws.com/.

Amazon EBS (Elastic Block Store): Amazon EBS provides block-level local storage volumes that can be attached to and used by Amazon EC2 instances. You can use it to keep data persistently in a file, to host databases, and other types of persistent storage.

Image description

Aspects of EBS:
Volume Types: EBS volumes are virtual hard drives attached to EC2 instances, providing durable block storage. They can be attached to and detached from EC2 instances spontaneously thereby enabling data mobility and flexibility.
EBS Volumes can only be attached to a single EC2 instance at a time.

Types of volumes are;
1.General Purpose (SSD): Balances both price and performance for a wide range of workloads, suitable for most of the use cases either small or medium workloads. Workloads can be production and development applications.

2.Provisioned IOPS (SSD): Delivers predictable high performance with low-latency and consistent I/O operations. Use cases can be critical production applications and databases requiring high maximum performance storage.

Throughput Optimized (HDD): Meant for frequently accessed, large storage and throughput workloads. Good for data warehousing and log processing.

3.Cold HDD: Good for infrequently accessed data with a focus on cost-effectiveness as the main criteria.

Features of EBS:
Snapshots: It contains all the information you need to restore your data from EBS volume. You can take snapshots of a volume to capture the Point-in-time backups of the EBS volumes, ensuring data backup, recovery, and volume duplication that you can attach to another instance.

Encryption: Amazon Web Services also offers flawless data encryption for EBS Volumes. When you attach an encrypted EBS Volume to an instance then all the data, including data on the volume, disk I/O and even the snapshots created from that encrypted volume, are all encrypted. EBS volumes can be encrypted to protect data at rest.

Availability and durability
Amazon provides very high available and durable service for EBS at no extra charges by replicating the EBS Volume data across multiple servers in an availability zone so as to prevent the loss of data.

Lifecycle Management: EBS allows creating lifecycle policies to automate volume creation, retention, and deletion.

Amazon EFS (Elastic File System): This is a fully managed scalable cloud-based storage provided by AWS and accessed by multiple EC2 instances at once. It's designed for scenarios that require shared access to files and data.

Image description

Features of EFS
Fully managed: Amazon EFS is a fully managed file storage solution that simplifies the process of sharing files across multiple instances. You as a developer are not required to manage file servers or storage, update hardware, configure software, or perform backups.

Highly available and durable:
Has a durability of 99.999999999 percent and availability of up to 99.99 percent. AWS EFS stores every directory, file, and link redundantly across multiple Availability Zones, for file systems that use Standard storage classes .

Scalability: Offers automatic scaling to accommodate your workloads or documents as they increase. This is done while minimizing time spent on administration.

Security and Access Control:
AWS private virtual cloud provides a secure, managed cloud environment that can be used to run workloads inside AWS.
EFS integrates with AWS Identity and Access Management (IAM) for centralized access control.

Use Cases:
1.Content Sharing: EFS is suitable for storing and sharing multimedia assets, documents, and other files in content-rich applications in a very secured, fast and easy manner.
This ensures consistency across the system.

2.Web Hosting applications: It's a great choice for web servers that require shared file storage for code, configuration files, and uploaded content.

3.Modernize application development: You can exchange the data from the AWS container resources such as ECS, EKS, as well as serverless web applications efficiently and without needing extra management.

4.Big Data and Analytics: EFS can be used to store data for analytical use by Amazon EMR and other big data services.

5.Machine Learning and AI Workloads: EFS is suitable for large data AI applications where same data will be accessed by multiple instances and containers to improve collaboration and reducing data duplication.

Integration with Services:
EFS can be integrated with various AWS services, including EC2, Lambda, and ECS, allowing these services to access the shared file storage.

Cost Considerations:
EFS pricing is based on the amount of data stored and accessed.
It's important to understand how usage patterns affect costs and choose the appropriate performance mode.

Amazon Glacier: Amazon Glacier is a low cost back-up and archival storage solution used for long-term durable data retention. It is appropriate for data that doesn’t need to be accessed frequently or immediately like backups, archives, cold storage.
The above examples each have specific features and use cases, making them fit for different storage use case.
Amazon Glacier is tailored for data archival, backup, and compliance use cases where data is retained for a very long time from few months to 10 or more years.
It offers a very costs-effective storage compared to other AWS storage services but has a much lower retrieval times.

Image description

Key features of Amazon Glacier

Vault Creation
Vault:The fundamental storage container in Amazon Glacier where you store your data just like buckets in Amazon S3.

Access policy: Controls who can access the data in a vault and what they can do with it. We can use AWS IAM to safely enter the management console and also safeguard the S3 Glacier data.
It is possible to create multiple users for IAM and specify separate user policies for each individual to limit or allow access to certain parts

Archives: all existing data is stored in archives. You can store different types of data in an archive, including photo images, video, audio, documents and is a top unit of storage in Glacier. We have unlimited storage capacity on AWS Glacier in the cloud and therefore, can store an infinite number of archives in a vault.

Data Retrieval Options:
Expedited Retrieval: Among the 3, this retrieval option Provides the fastest access to data, here the retrievals can happen within 1 – 5 minutes. This mode of data retrieval is needed when you have an urgent need to access data quickly from a group of archives.
Standard Retrieval: The retrieval time for data in this case takes several hours(3 - 5 hours) and is suitable for most use cases. This retrieval method is the most common. It be used for any size of data, full or partial archive.
*Bulk Retrieval: * Designed for data that is infrequently accessed. This is the most cost-effective retrieval with retrieval time which is within 5-12 hours. This data retrieval is suggested for bulk retrieval with access to significant portions of data(petabytes of data), cost-effectively.

Data Lifecycle Management: Data in Amazon Glacier can be managed using data lifecycle policies that changes objects from one storage class to another.
Leading to optimization of storage costs based on the data's access patterns over time.

Vault Lock:
Vault Lock is a feature that lets developers set retention policies on individual vault to implement data retention and stability for compliance purposes.
Once data is locked, it cannot be deleted or changed until the specified retention period expires. For example, WORM (Write Once Read Many) policies can be used to prevent additional edits after uploading.

Data Security:
Data is encrypted at rest by Amazon Glacier using the AES-256 encryption algorithm while supporting secure data transit with SSL.
Developers can choose to manage the encryption keys themselves or use AWS Key Management Service (KMS) for added security.
Data stored in Amazon Glacier is immutable, which means that after an archive is created it cannot be updated.
Gaining entry to Glacier requires credentials which AWS can use to authenticate your requests and it must have permissions to access Glacier vaults or S3 buckets.
Every requests is required by Glacier to be signed for authentication protection.

USE CASES
Data Archival: Amazon Glacier is suitable for storing historical records, financial data, medical records, and legal documents.
Backup and Recovery: Industries can use Glacier to store backups of critical systems and data.

Digital preservation. For example Government agencies can use Glacier for digital preservation efforts. Libraries can use it to preserve various data as well.

Applications such as Facebook and Instagram with large amount of multimedia data coming in, can use Amazon Glacier to store data.

Compliance: It helps organizations meet up with the regulatory requirements for data retention. Glacier enables financial services, healthcare institutions and other industries to keep data which serves as a regulatory and compliance archives over extended periods of time.

Integration with Other Services:
Amazon S3 Integration: lifecycle policies can be set up to automatically transition objects from S3 to Glacier based on access patterns.
AWS Data Transfer Services: Snowball and Snowmobile can be Used to transfer large amounts of data to Glacier.

Top comments (1)

Collapse
 
imaculate7 profile image
Beatrice Akaeme

Hello everyone! This is useful for anyone preparing for Cloud and Solution architect examinations