Introduction
In the past few weeks and specially the last few days with the recommendation to stay at home due to the pandemic crisis we are fighting against (stay at home, guys! it is never an exaggeration this recommendation), I've been reading a lot of articles and experimenting with new technologies. On one of those occasions I came across this great article by Sunil P V about how a URL shortening application works and after reading it and seeing the architecture I thought that developing it would be a great way to pass time. I wrote before about how I struggle at having ideas for practice projects here and here, so it sounded reasonable.
Explaining the Architecture
Borrowing the diagram that Sunil P V has in his article I will explain the technologies I used to develop the project and why. First, let's see the diagram:
Every URL shortening application I've seen so far works similarly. When the user sends a long URL, the API generates a short hash that identifies the long URL and saves both information into a database, then when the user requests for sho.rt/hash
the API gets the hash and the long URL represented by it and redirects to that URL. Easy enough.
The most important part of the architecture is the API itself, represented by the green box in the diagram, this API is supposed to have many instances behind a load balancer, which means it needs to be distributed. We have the database where the URL information is persisted, Redis for caching purposes and Zookeeper for coordination. Let's go into more details:
URL Shortening API: The API needs to be distributed and resilient. Since it was a project made to practice new things, I mixed things I know (Spring) and a subject I wanted to experiment for some time (WebFlux), and so the application is made with Spring WebFlux as its base. A small load test done with Apache JMeter suggests that thanks to its non-blocking approach it is much more resilient than plain old Spring MVC in high load. In addition to that, the architecture suggests the use of the library Hashids to generate the hashes for the URL, I followed this recommendation, more on that later.
Database: The database described by Sunil P V in this article is DynamoDB, but it also mentions relational databases. I decided to use a relational database for the easy setup (no need for cloud configuration, etc.) and because it also offered me another opportunity to experiment. Since the application itself was reactive and non-blocking, the remaining of the components must be as well to take full advantage of its benefits. As such, I decided to use PostgreSQL with its reactive driver and powered by Spring R2DBC. Spring R2DBC makes the use of its repositories a smooth sailing for anybody who used Spring Data JPA before, so it was not that new, but it is great that both controller and data layers talk the same language with
Mono
andFlux
all over.Redis: This is the easiest part to explain, not only because it is recommended in the architecture, but also because it is almost the standard for caching nowadays. It is used to cache the short url x long url relation so that it doesn't burden the database with numerous GETs for the same URL, basic caching. Spring also offers a reactive implementation for Redis, so it is a great idea to use it into the architecture.
Zookeeper: Maybe this is an overhead, although the article mentions Zookeeper specifically, it also says that it could be replaced by Redis. Zookeeper is just used to store a counter that it is shared by all instances of the API, which is used to generate the range of ids a given instance will be responsible for. Let me know what you think about this, since it is something that's new for me. First time ever I used Zookeeper.
The code can be found here and it also has some optional features that I was experimenting with such as tracing with Zipkin and metrics export to Elasticsearch. Since it does not make part of the core, I will not talk about it here, but feel free to enable them if you want.
Code Details
Lets start the code discussion by the specifics of the configuration. There are two very important beans and one for convenience that are configured. For convenience I declare a Hashids
bean with the salt already so that I don't need to recreate every time I use:
@Bean
public Hashids hashids() {
return new Hashids(hashIdsSalt);
}
Then there is a pretty straightforward Redis configuration, but notice that it is Reactive. Spring Data Redis already have a connection factory ready to use, I just need to have a ReactiveRedisTemplate
to use:
@Bean
public ReactiveRedisOperations<String, Object> redisOperations(ReactiveRedisConnectionFactory connectionFactory) {
GenericJackson2JsonRedisSerializer serializer = new GenericJackson2JsonRedisSerializer();
RedisSerializationContext<String, Object> context =
RedisSerializationContext.<String, Object>newSerializationContext(new StringRedisSerializer())
.value(serializer).build();
return new ReactiveRedisTemplate<>(connectionFactory, context);
}
Then there is a bean called UrlIdRange
that is used to control the range of ids that the instance of the application is responsible. As I said before, the range is configured on Zookeeper, but every instance has its own range of ids to assign and only when it is exhausted it goes to the Zookeeper server again to get a new range. It already contacts Zookeeper to create the range on bean creation.
@Bean
public UrlIdRange urlIdRange(SharedConfigurationService sharedConfigurationService) {
Integer counter = sharedConfigurationService.getSharedCounter(urlRangeKey);
return new UrlIdRange(counter);
}
With that it is a good time to explain the SharedConfigurationService
and the UrlIdRange
classes.
The implementation of SharedConfigurationService
uses Apache Curator Framework to connect to the Zookeeper server and perform the actions on the shared counter. I will spend a little more time on this because it is something I did for the first time and suggestions on how to improve or even say that something is wrong is welcomed. The class only has one method getSharedCounter
and a property that gets the Zookeeper server base url from the application properties yml file. It receives a key
of type String
and then starts its processing:
1 - Starts the client:
final CuratorFramework client = newClient(baseUrl,new RetryNTimes(3, 100));
client.start();
2 - Create the SharedCount
:
SharedCount sharedCounter = new SharedCount(client, key, 0);
3 - On try-catch block, everything is done:
try {
sharedCounter.start();
VersionedValue<Integer> counter = sharedCounter.getVersionedValue();
while(!sharedCounter.trySetCount(counter, counter.getValue() + 1)) {
counter = sharedCounter.getVersionedValue();
}
sharedCounter.close();
client.close();
return counter.getValue();
} catch (Exception e) {
log.error("Error while starting shared counter, impossible to update counter.", e);
throw new NotAbleToUpdateCounterException();
}
The SharedCount
is started and then the application get its versioned value and try to set the new value for the counter, it will only set if the values hasn't changed, otherwise it will keep trying. I did this way because I was thinking about the distributed characteristic of the application, in a high load environment collisions could happen and it would be a trouble for the management of the ids. I don't know if it is good practice or even if it is really necessary, so give me a heads up about what could go wrong or not. Then I close the counter and the client and return the value.
UrlIdRange
constructor is called on the bean creation with the current counter and then it does the following:
public UrlIdRange(Integer counter) {
this.calculateRange(counter);
this.hasNext = true;
}
public void calculateRange(Integer counter) {
this.initialValue = counter * 100_000;
this.currentValue = new AtomicInteger(initialValue);
this.finalValue = initialValue + 99_999;
this.hasNext = true;
}
The constructor calls calculateRange
where the calculation is done, the range is 100.000 ids, if the counter is 0, then it will be 0 to 99.999 and so on. Maybe I exaggerated a little bit on the number and it could be configurable through .yml file.
With the initial configuration out of the way, it is time to dive into WebFlux. The API have two endpoints, as I've said before, one GET to find the URL and redirect and a POST to generate new short URLs. The router is very simple:
@Bean
public RouterFunction<ServerResponse> route(UrlInfoHandler handler, ErrorHandler errorHandler) {
return RouterFunctions.route()
.onError(Exception.class, errorHandler::handleError)
.GET("/{url}", handler::findLongUrlAndRedirect)
.POST("/", accept(MediaType.APPLICATION_JSON), handler::generateAndSaveShortUrl)
.build();
}
People may find the @RequestMapping
annotation from Spring MVC convenient, but I think a router function like this where all routes are defined in one place also has its advantages, it is much easier to see what routes are available and Spring gives some features to keep it organized in large applications. Since Spring Boot is made for microservices, which shouldn't have many routes per application anyway, it is not a problem.
Then we have the UrlInfoHandler
where the fun happens. It is very interesting and maybe a bit overwhelming to face declarative programming for the first time, but since I had some experience already with Javascript and its declarative, non-blocking style, it was not so bad, but certainly I need to show to some of colleagues later to see if they understand what is happening. Let's start with the findLongUrlAndRedirect
method:
public Mono<ServerResponse> findLongUrlAndRedirect(ServerRequest serverRequest) {
String shortUrl = serverRequest.pathVariable("url");
return cacheService.getFromCacheOrSupplier(shortUrl, UrlInfo.class, () -> urlInfoService.findByShortUrl(shortUrl))
.doOnNext(urlInfo -> log.info("UrlInfo found: {}", urlInfo))
.flatMap(urlInfo -> permanentRedirect(URI.create(urlInfo.getLongUrl())).build())
.switchIfEmpty(status(HttpStatus.NOT_FOUND)
.bodyValue(ErrorResponse.notFound(shortUrl, ErrorCode.URL_INFO_NOT_FOUND)));
}
I get the shortUrl
from the request and the magic begins. First I go to the cacheService
to see if the UrlInfo
is already there, if not, the Supplier
given is called. Since the reactive approach of Redis cache in Spring is not supported by @Cacheable and its ease, as far as I know, I decided to use this approach. Maybe I could use AOP, but I thought that this way matched well with the rest of the code, full of lambda functions, or Functional Interfaces, as it is described in Java. Then I call a doOnNext()
just for logging purposes, and then a flatMap
redirecting. If empty, the application responds with a 404 status. The method that generates a short url works similarly in the reactive sense although different in logic, you can take a look in the code at the repository.
The implementation of CacheService
is interesting because it leverages Reactor's features exclusive for caching:
public <T> Mono<T> getFromCacheOrSupplier(String key, Class<T> clazz, Supplier<Mono<T>> orElseGet) {
return CacheMono.lookup(k -> redisOps.opsForValue().get(buildKey(key, clazz))
.map(w -> Signal.next(clazz.cast(w))), key)
.onCacheMissResume(orElseGet)
.andWriteWith((k, v) -> Mono.fromRunnable(() ->
redisOps.opsForValue().setIfAbsent(buildKey(k, clazz), v.get()).subscribe()));
}
Using CacheMono
I can get reactive control flow exclusive for caching, it also made possible the usage of my supplier in such a smooth way using onCacheMissResume
. But first it uses the lookup
method to call Redis with the given key, very straightforward since Redis reactive interfaces have the same usage as the blocking ones, but delivering a Mono<T>
to us. Finally, in case of miss, the value is cached using andWriteWith
. In any way, the object is returned to us, continuing the flow. Very interesting logic and please tell me if there is anything to improve!
Despite not getting into details about the POST route, it is interesting to see the implementation of saveIfNotExists
from the UrlInfoService
, which is called by the handler to generate the short url and save, but only if it does not exist, the chain of three methods is called:
// Tries to find the long url, if empty, save the request
public Mono<UrlInfo> saveIfNotExist(UrlGenerateRequest request) {
return findByLongUrl(request.getLongUrl()).switchIfEmpty(save(request));
}
// This is where hashids comes into play, encoding the id given by url range
private Mono<UrlInfo> save(UrlGenerateRequest request) {
Integer id = getUrlId();
String hash = hashids.encode(id);
return urlInfoRepository.save(
UrlInfo.builder()
.longUrl(request.getLongUrl())
.shortUrl(hash)
.expiryAt(request.getExpiryAt())
.build()
);
}
// Get the current value of the range and deals with range shortage as well
private Integer getUrlId() {
if (!urlIdRange.hasNext()) {
Integer counter = sharedConfigurationService.getSharedCounter(urlRangeKey);
urlIdRange.calculateRange(counter);
}
return urlIdRange.getCurrentValue();
}
The last important part that is left to be mentioned is the data layer using R2DBC. This is also very straightforward since Spring Data R2DBC repositories doesn't differ much from the standard, differing only by returning Mono
or Flux
. One thing I noticed though is that I didn't find a way to map table relationships on R2DBC, there is no javax annotations or Hibernate annotations, of course, for that stuff and I didn't find any reference to that on the documentation. However, I may be mistaken. Let me know if you know anything about that.
Conclusion
It was a really enjoyable experience to get a diagram of an application architecture, add your own ideas and materialize it in code. I hope I can find more articles of this type in the future. There was other experiments I didn't mention here such as documentation and testing, take a look at the repository to see what I did about that, I will probably write another post about it. Thank you for all who reached the end of the post (it was a long one) and let me hear what you think!
Top comments (0)