When developing a new service, we realized that a core model would be well-suited for storage in DynamoDB, Amazon’s fast and scalable NoSQL solution.
Why DynamoDB?
In my experience, when developing apps, the data primarily shown to users is often kept in a separate structure than the underlying database. Users rarely need to see raw data; they prefer summaries, denormalized representations, and charts. For our existing service, Zonmaster, we store user-facing data in an Elasticsearch database, which has served us well. However, for a new project, we wanted to explore alternative options, and that’s where DynamoDB comes into play.
What is DynamoDB?
To quote Amazon:
Amazon DynamoDB is a fully managed NoSQL database service that provides fast and predictable performance with seamless scalability. DynamoDB lets you offload the administrative burdens of operating and scaling a distributed database so that you don’t have to worry about hardware provisioning, setup and configuration, replication, software patching, or cluster scaling. DynamoDB also offers encryption at rest, which eliminates the operational burden and complexity involved in protecting sensitive data.
In essence, DynamoDB is an optimized NoSQL database that is fully managed.
Getting Started
The first step, of course, is to set up DynamoDB for development.
Personally, I prefer using Docker Compose to manage my database elements, such as MySQL, Redis, Elasticsearch, etc. Fortunately, there’s a Docker image available for a local version of DynamoDB, making it easy to run in a development environment. Here’s what my docker-compose.yml looks like:
Here’s what my docker-compose.yml looks like:
version: "3.7"
services:
dynamodb:
image: amazon/dynamodb-local
ports:
- ${DYNAMODB_PORT}:8000
command: ["-jar", "DynamoDBLocal.jar", "-sharedDb"]
mysqldb:
image: mysql:8.0
container_name: zmrm_mysql
volumes:
- zmrm_mysqldb-store:/var/lib/mysql
- ./log:/var/log/mysql
- ./docker/mysql/user-setup.sql:/docker-entrypoint-initdb.d/user-setup.sql:ro
environment:
- MYSQL_DATABASE=${MYSQL_DBNAME}_development
- MYSQL_USER=${MYSQL_DBUSER}
- MYSQL_PASSWORD=${MYSQL_DBPASS}
- MYSQL_ROOT_PASSWORD=${MYSQL_DBPASS}
- TZ=${TZ}
ports:
- ${MYSQL_DBPORT}:3306
redis:
image: redis:7.0.12
command: redis-server --appendonly yes
ports:
- target: 6379
published: ${REDIS_PORT}
protocol: tcp
mode: host
volumes:
- redis_data:/data
restart: always
environment:
- REDIS_REPLICATION_MODE=master
volumes:
redis_data:
zmrm_mysqldb-store:
Please focus only on the “dynamodb” section.
We pull down the docker image, and fire it up using the port we want. Done!
The Gem
Next, we need to install the dynamoid gem. You can do this using the following command:
> bundle add dynamoid
The class
Instead of the typical Rails migration, we define the class directly in its class file:
# app/models/reaction.rb
class Reaction
include Dynamoid::Document
field :marketplace, :string
field :review_count, :integer
field :rating, :float
field :content, :string
field :status, :string
end
Initializer
For development, we can use the following initializer file:
# config/initializers/dynamoid.rb
require 'dynamoid'
Dynamoid.configure do |config|
# [Optional]. If provided, it communicates with the DB listening at the endpoint.
# This is useful for testing with [DynamoDB Local] (http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Tools.DynamoDBLocal.html).
config.endpoint = ENV['DYNAMODB_URL']
config.logger.level = :debug
end
In production we will need to set the AWS access to something like this:
require 'dynamoid'
Dynamoid.configure do |config|
config.access_key = 'REPLACE_WITH_ACCESS_KEY_ID'
config.secret_key = 'REPLACE_WITH_SECRET_ACCESS_KEY'
config.region = 'us-west-2'
end
Please refer to the Dynmoid gem documentation for more configuration options.
Usage
Working with DynamoDB-backed objects is not much different from working with traditional ActiveRecord models. Here’s an example:
irb(main):001:0> Reaction.new
=> #<Reaction id: nil, marketplace: nil, review_count: nil, rating: nil, content: nil, status: nil, created_at: nil, updated_at: nil>
irb(main):002:0> reaction = Reaction.new(marketplace: 'mystore.shopify.com', review_count: 25, rating: 3.76, content: 'Generally positive', status: 'processed')
=> #<Reaction id: nil, marketplace: "mystore.shopify.com", review_count: 25, rating: 0.376e1, content: "Generally positive", status: "processed", created_at: nil, updated...
irb(main):003:0> reaction.save!
[Aws::DynamoDB::Client 200 0.009863 0 retries] list_tables(exclusive_start_table_name:nil)
(14.39 ms) LIST TABLES
(14.48 ms) CACHE TABLES
Creating dynamoid__development_reactions table. This could take a while.
[Aws::DynamoDB::Client 200 0.062638 0 retries] create_table(table_name:"dynamoid__development_reactions",key_schema:[{attribute_name:"id",key_type:"HASH"}],attribute_definitions:[{attribute_name:"id",attribute_type:"S"}],billing_mode:"PROVISIONED",provisioned_throughput:{read_capacity_units:100,write_capacity_units:20})
(66.08 ms) CREATE TABLE
[Aws::DynamoDB::Client 200 0.016588 0 retries] put_item(table_name:"dynamoid__development_reactions",item:{"marketplace"=>{s:"mystore.shopify.com"},"review_count"=>{n:"25"},"rating"=>{n:"3.76"},"content"=>{s:"Generally positive"},"status"=>{s:"processed"},"id"=>{s:"449b77e9-6d7e-4578-bf04-a5b911ee68c2"},"created_at"=>{n:"1690597034.022256"},"updated_at"=>{n:"1690597034.022475"}},expected:{"id"=>{exists:false}})
(17.26 ms) PUT ITEM - ["dynamoid__development_reactions", {:marketplace=>"mystore.shopify.com", :review_count=>25, :rating=>0.376e1, :content=>"Generally positive", :status=>"processed", :id=>"449b77e9-6d7e-4578-bf04-a5b911ee68c2", :created_at=>0.1690597034022256e10, :updated_at=>0.1690597034022475e10}, {}]
=> #<Reaction id: "449b77e9-6d7e-4578-bf04-a5b911ee68c2", marketplace: "mystore.shopify.com", review_count: 25, rating: 0.376e1, content: "Generally positive", status: "processed", created_at: Sat, 29 Jul 2023 02:17:14 +0000, updated_at: Sat, 29 Jul 2023 02:17:14 +0000>
Finding the data is equally as easy:
irb(main):015:0> Reaction.where(marketplace: 'mystore.shopify.com')
=>
#<Dynamoid::Criteria::Chain:0x000000010e806718
@consistent_read=false,
@key_fields_detector=
#<Dynamoid::Criteria::KeyFieldsDetector:0x000000010e805548
@forced_index_name=nil,
@query=
#<Dynamoid::Criteria::KeyFieldsDetector::Query:0x000000010e805408
@fields=["marketplace"],
@fields_with_operator=["marketplace"],
@query_hash={:marketplace=>"mystore.shopify.com"}>,
@result=nil,
@source=Reaction>,
@query={:marketplace=>"mystore.shopify.com"},
@scan_index_forward=true,
@source=Reaction>
This returns a set of objects. But if you use it you’ll see this warning:
irb(main):016:0> reactions.each {|r| p r.rating}
Queries without an index are forced to use scan and are generally much slower than indexed queries!
You can index this query by adding index declaration to reaction.rb:
* global_secondary_index hash_key: 'some-name', range_key: 'some-another-name'
* local_secondary_index range_key: 'some-name'
Not indexed attributes: :marketplace
(0.01 ms) SCAN - ["dynamoid__development_reactions", {:marketplace=>{:eq=>"mystore.shopify.com"}}]
[Aws::DynamoDB::Client 200 0.00822 0 retries] scan(table_name:"dynamoid__development_reactions",scan_filter:{"marketplace"=>{comparison_operator:"EQ",attribute_value_list:[{s:"mystore.shopify.com"}]}},attributes_to_get:nil)
0.376e1
=> nil
We haven’t indexed our data! Let’s fix that
Indexing
If you want to improve query performance, you can create an index for your data. In this example, we’ll add a global secondary index to the “marketplace” field:
class Reaction
include Dynamoid::Document
field :marketplace, :string
field :review_count, :integer
field :rating, :number
field :content, :string
field :status, :string
global_secondary_index hash_key: :marketplace, projected_attributes: :all
end
Then we rebuild the table. With the index in place, querying the data becomes more efficient:
irb(main):003:0> reactions = Reaction.where(marketplace: 'mystore.shopify.com')
=>
#<Dynamoid::Criteria::Chain:0x000000010e9a0178
...
irb(main):004:0> reactions.each {|r| p r.rating}
[Aws::DynamoDB::Client 200 0.020613 0 retries] describe_table(table_name:"dynamoid__development_reactions")
[Aws::DynamoDB::Client 200 0.025561 0 retries] query(consistent_read:false,scan_index_forward:true,index_name:"dynamoid__development_reactions_index_marketplace",table_name:"dynamoid__development_reactions",key_conditions:{"marketplace"=>{comparison_operator:"EQ",attribute_value_list:[{s:"mystore.shopify.com"}]}},query_filter:{},attributes_to_get:nil)
0.376e1
=> nil
Filtering
You can also filter records using queries similar to ActiveRecord:
irb(main):007:0> Reaction.where(marketplace: 'mystore.shopify.com', 'rating.gt': 3).each {|r| p r.marketplace}
[Aws::DynamoDB::Client 200 0.011604 0 retries] query(consistent_read:false,scan_index_forward:true,index_name:"dynamoid__development_reactions_index_marketplace",table_name:"dynamoid__development_reactions",key_conditions:{"marketplace"=>{comparison_operator:"EQ",attribute_value_list:[{s:"mystore.shopify.com"}]}},query_filter:{"rating"=>{comparison_operator:"GT",attribute_value_list:[{n:"3.0"}]}},attributes_to_get:nil)
"mystore.shopify.com"
=> nil
Conclusion
For us, using DynamoDB to store user-facing data, especially summary data used in chart representations, has significantly improved website speed. While this is a new site with not a large amount of data, we are optimistic about its performance. Additionally, DynamoDB is easier to manage compared to Elasticsearch under OpenSearch on Amazon.
You can find me on Twitter where I talk about Ruby on Rails, my company Zonmaster, and life in general. If you’re looking for help with your Rails project drop me a note on Twitter or LinkedIn
Top comments (0)