Hi there my name is Daniel and I'm a passionate Gopher and Dart developer. About three weeks ago, i was going through the web i stumbled upon the Google Hashcode competition read about it and decided to join in this year.
It would seem proper to check the problems released in previous years, and that's exactly what i did. I chose the 2019 final round problem. Of Course, I got stuck at some point in solving the problem(at this point it was really frustrating), and so i decided to just look up the solution. Fortunately i found none. Hence, after solving the problem, i decided to write my first article on dev.to to give back to community, even in this small way. The solution is written in Go, and here's the link to the repo.
Okay, let's get to work, oh and, the question can be found here, i'll only explain the basic part of the problem in the article so take time to read it.
The problem states that:
Google has a large codebase, containing billions of lines of code across millions of source files. From these source files, many more compiled files are produced, and some compiled files are then, used to produce further compiled files, and so on.ย Given the huge number of files, compiling them on a single server would take a long time. To speed it up, Google distributes the compilation steps across multiple servers. In this problem, we will explore how to effectively use multiple compilation servers to optimise compilation time.
This basically means that multiple files will be compiled across multiple servers.It also says:
Given a description of dependencies between files in a massive codebase, a number of build servers, and deadlines for the compilation targets, schedule the compilation steps on the available servers.ย
Now we see that certain files have dependencies, dependencies are analogous to mandatory ingredients in a dish, without them, the dish cannot be called the name it would normally be called. For instance,consider Pepperoni Pizza without Pepperoni. Since files could have dependencies,waiting for each dependency to compile is imminent.
It also says:
Each compiled file is described by: name, dependencies (other compiled files that need to be ready before this file can be compiled),the time it take to replicate the file to all other servers, and the time it takes to compile it.
Okay that's a full description of each file. We now have all information we need.
Our main objective is to schedule compilation steps on each server to minimize the time needed to compile the target files as defined in the PDF.
Firstly we need to fully represent or model each file in code. So i use a struct named file:
//The file struct fully represents a file and all its provided properties
type file struct {
Locker sync.RWMutex
Name string
CompileTime int
ReplicationTime int
Dependencies []string
CompiledOnServers []string //holds the distinct servers this file was compiled on
IsCompiled bool
IsReplicated bool
}
We do not have the number of servers that Google has and so we model this requirement using goroutines, we'll create as many goroutines as there are servers and compile files on each one of them, but we do not also have the luxury of time as specified in the input files so,we model using something like this in compile.go(in the repo):
time.Sleep((f.CompileTime)/1000000 * time.Duration)
//this sleeps the caller goroutine for the amount of time specified
Here's a complete breakdown of the solution:
At birth, each goroutine picks a file from the allFiles slice and begins compiling it. If the the file has dependencies it'll check if each dependency is compiled on the current server(goroutine) or is replicated, that is, copied to all servers, if both conditions hold true, that particular dependecy isn't recompiled, else, it will be compiled on the current goroutine.
After going through all of it's dependencies, the chosen file is then compiled.
It will replicated to all servers only if the compilation time is greater than the replication time or if at least the ratio of replication time to compile time os less than or equal to 1.8:
(f.ReplicationTime / f.CompileTime) <= 1.8 //if this is true the file will be replicated
The reason for this is that, it isn't that much of a burden to replicate the file if the replication time isn't up to double the compilation time, this idea was spotted in the example output given in the PDF, file c1 is replicated even though
the Replication time is 18s and the Compile time is 10s.
After compiling it's chosen file each goroutine will come back to check if all files are compiled and if true execute the code below:
waitGroup.Done() //this will decrement the waitGroup counter for the main goroutine
runtime.Goexit() //the goroutine terminates itself
This is my first blog post. If you have any contributions or corrections or suggestions please leave them in the comments section, they'd be highly appreciated.
Cheers!
Top comments (0)