I think this is one of many posts of mine about Docker and it’s for a reason.
I had the opportunity to work on multiple Docker-based projects recently and had many challenges with them, so as usual, I want to share those challenges and the possible solutions to them so it could lead to faster results.
The Dockerfile and the way we use it, are both the base of the building, pushing, and pulling an image from our Docker images repositories, and the commands inside of it are essentials regarding their behavior on our production or testing environments.
In some cases, people might prefer to not use cached layers, and go for the safe side of re-creating everything from scratch, and that’s OK but can be improved by utilizing Docker layers and understanding them better.
I also understand the motivation behind it because sometimes to write a good Dockerfile isn’t that easy as seems at first, and especially with complicated images. It has many implications on multiple flows for the end Docker image, so if adding to the mix the need to have time to market product, we might want at first aiming for a working flow and then improve it for the better.
If you aren’t familiar with Docker then I highly suggest you start here
In case you only want the bullet points out of this post then feel free to read the first chapter or else go to the next one.
Show me The Money First
- Dockerfile for each command creates a layer that is identified by a hash value.
- If you update a file which means changing his final binary, the Dockerfile detects that and requires to recreate the layer of that file.
- When a command in the Dockerfile changes, that triggers the Docker daemon to recreate layers.
- Docker daemon re-creates or creates (new additional commands) layers from the command (or file it copies) that was changed and so on until the last command of the Dockerfile.
- Put your most frequent to-change commands on the bottom of the Dockerfile.
- In Git related commands who clones from a repo that gets updated from git commit with push commands, add in your own way, the value of a version that will invalidate the command in case it didn’t change on the command surface but did change on the git repo(new commits).
Last bullet example:
RUN echo $APP_VERSION && git clone https://github.com....
A Few Basics
Each command in a Dockerfile (RUN, ADD, CMD, ENTRYPOINT, COPY, etc…) is representing a layer of a Docker image, which in the combination of multiple layers, we get the result of a Docker image.
Each command is being represented as a value of hash SHA256 value, and that is created when the layer is being created on the Dockerfile build process.
If you wish to see the layers of your Docker image, execute the command docker inspect {docker-image-name}:tag
Docker knows to look up on to your Dockerfile, understand if commands have changed, and even understand if a file that is being copied by a COPY command has changed binary-wise as well.
This allows the Docker daemon to know when a new layer is needed for an updated file, new command, or updated command. With this, he knows how to improve his performance in multiple aspects.
When building an image, he would simply use an already built layer instead of rebuilding it. In the push process to an images repository, he wouldn’t push all the layers because some of them already exist there. In the pull process, he wouldn’t again pull all the images or layers but simply pull the ones he needs to have the new image in his grasp.
This as maybe already understood alone, shorten our build times, the time it takes to push the image to a Docker repository and pulling an image. If in our CI/CD pipeline we will utilize our layers well, the entire pipeline will simply be shorter and striving to be error-free.
Let’s Talk About CI/CD A Little
I guess it’s pretty clear to say that the wish for a quick response time in deployments, or in failover and redeployment of them, it’s wished for a fast response time so we could achieve the best high availability as possible.
But what about the pull and push process?
When a deployment occurs, in most cases, we would also pull the image.
But what if the images let’s say, do not weigh 100M but ~3G, or I don’t know, anything else that might take a little longer.
It would also inflict the total deploy time, but then again, some might say that the internet connections today are getting better by each day, so we don’t need to worry over a 3G image.
I honestly agree with you, but in some cases, the deploy time is very crucial by the seconds or even in some cases there are places which the network connectivity isn’t that great, so in those areas, the layers could be of huge help to decrease the overall amount of data to download.
About the push, which is mostly can be done on personal laptops but mainly should be on our CI’s, the CI servers still need to push an image to some container repository.
If the push is also being improved in his overall time, it also allows for faster response time in your deployments, or development cycles.
Just for noting out, today the push operating of an image for the Docker daemon is using only 5 threads for doing that operation, so if configuring our Docker daeomn to use more threads, it could make the process of pushing the image faster. Further description can be found here, and the specific flag is max-concurrent-uploads
It might seem a little too much work or even an error-prone suggestion for writing a sophisticated Dockerfile, but it simply would allow for faster response time, and ease of utilizing our Docker containers for production deployments, and also testing purposes.
But after talking about shortening our development or deployments time, I want to ask a question.
How many times did you experience your CI/CD pipeline or even local build, experiencing an issue for compiling a project, cloning a repo which wasn’t authorized, CI dedicated machines experiencing local configurations issues, and many more…
This simply leads to a false-positive deployment which can be seen to be functioning as expected, but in the end, the final product of it doesn’t work that well, and why? Simply due to a small configuration or flow that went wrong.
This of course leads to more work on deployments time in the image of investigating what happened, maybe even requesting additional help to investigate the core problem and many more time-consuming processes.
But in addition to time-consuming aspects, another important one is not building layers that could fail, and overall block the deployment of a version for production or development purposes.
Just imagine that happening to you at 11 PM when a deployment crashes, and most of the people are sleeping, I hope it will not happen of course, but I think you got the picture 🙂
So in the aspects of errors in our build flow, when using already built cached images, we can only recreate the layers that need to be rebuilt, and prevent any kind of errors for the already built images.
Of course that errors could always occur, but using the cached images minimizes the territory in which we could experience any kind of issues.
Most Frequent Commands
When writing a Dockerfile, you mostly have some kind of binary installations, compilations, project setups, or any other flows that allow for the Docker container at the end, run properly as expected.
Some of the commands in that Dockerfile doesn’t change as frequently as your entrypoint script for example, or other scripts that install your application, or some kind of long RUN command that compiles and builds a huge missle that goes to the moon, been there, and done that 😉
Seriously, built a missle…
Not really, but I hope it made you smile 🙂
The reason for me talking on this subject is because we talked about Docker daemon re-creating layers from the command that was updated until the last command, so why not place the most frequently updated ones on the bottom and save the time creating not frequent updated layers on the top?
There’s nothing much more to say about that, but I think you got the point 🙂
“Hacking” the Dockerfile
When working a Git repo, updating it with new pushed commits, a Dockerfile command which clones that repo doesn’t know the repo or even the specific branch has changed.
So wait… if we will try to rebuild that image, will it simply create a new image based on all the cached layers?
That’s a big Y-E-S
So what do we do?…
Simple, we can use in the Dockerfile command which tries to clone the repo, involve some kind of command that involves the version of the image, or more specifically the version of the branch that is destined to run in the future built image.
This would allow re-creating the layer according to the passed value of the version.
I simply encountered this by adding an echo command like this
RUN echo $APP_VERSION && git clone https://github.com....
It might seem that elegant but of course, you can take this approach and assimilate in your own way.
Another option is to involve the last commit id and pass it as an argument to the build phase, so I leave you with that in order to let you come with your own artistic way for this challenge.
In Conclusion
Managing layers of a Docker image might seem very troublesome at first, and for good reason, because not every technology is that well understood at the start, so there’s always a learning curve for it. I hope that the information here helped to ease that curve in any way possible 🙂
We saw a short description about how Docker daemon manages the creation of images layers, saw a few aspects of CI/CD regarding images building/pulling/pushing phases, and also a way to make Docker daemon re-create layers which on the plain field hasn’t changed but did change git-wise.
I hope the post has helped you out and was informative. I tried to make it shorter than my other usual posts so if you think something is missing or maybe even not well described deeply please let me know.
I hope you had a great time reading this piece, and if you have any further questions I would be delighted to answer them.
Also, if you have any opinions or suggestions for improving this piece, I would like to hear 🙂
Thank you all for your time and I wish you a great journey!
Resources
- Post cover photo – Photo by Ash – Link.