Contents of This Article
- Cache invalidation at one instruction invalids cache of all subsequent instructions
- Cache is invalid even when adding commands that don’t do anything
- Cache is invalid when you add spaces between command and arguments inside instruction
- Cache is used when you add spaces around commands
- Cache is used for non-idempotent instructions
- Instructions after ADD never cached (Only versions prior to 0.7.3)
Why do we need to use Dockerfile?
Dockerfile is not yet-another shell. Dockerfile has its special mission: automation of Docker image creation.
Once, you write build instructions into Dockerfile, you can build the same image just with
docker build command.
Dockerfile is also useful to tell the knowledge of what a job the container does to somebody else. Your teammates can tell what the container is supposed to do just by reading Dockerfile. They don’t need to know login to the container and figure out what the container is doing by using ps command.
For these reasons, you must use Dockerfile when you build images. However, writing Dockerfile is sometimes painful. In this post, I will write a few tips and gochas in writing Dockerfile so that you love the tool.
ADD and understanding context in Dockerfile
ADD is the instruction to add local files to Docker image. The basic usage is very simple. Suppose you want to add a local file called myfile.txt to /myfile.txt of image.
Then your Dockerfile looks like this.
Very simple. However, if you want to add /home/vagrant/myfile.txt, you can’t do this.
1 2 3 4 5 6 7 8 9
no such file or directory error even if you have the file. Why? This is because /home/vagrant/myfile.txt is not added to the context of Dockerfile. Context in Dockerfile means files and directories available to the Dockerfile instructions. Only files and directories in the context can be added during build.
Files and sub directories under the current directory are added to the context. You can see this when you run build command.
What’s happening here is Docker client makes tarball of entries under the current directory and send it to Docker daemon. The reason why thiis is required is because your Docker daemon may be running on remote machine. That’s why the above command says Uploading.
There is a pitfall, though. Since automatically entries under current directories are added to the context, it tries to upload huge files and take longer time for build even if you don’t add the file.
1 2 3 4 5 6
So the best practice is only placing files and directories that you need to add to image under current directory.
Treat your container like a binary with CMD
By using CMD instruction in Dockerfile, your container acts like a single executable binary. Suppose you have these instructions in your Dockerfile.
1 2 3
When you build a container from this Dockerfile and run with
docker run -i run_image, it runs
/usr/local/bin/run.sh script and exists.
If you don’t use
CMD, you always have to pass the command to the argument:
docker run -i run_image /usr/local/bin/run.sh.
This is not just cumbersome, but also considered to be a bad practice from the perspective of operation.
If you have
CMD instruction, the purpose of the container becomes explicit: all what the container wants to do is running the command.
But, if you don’t have the instruction, anybody except the person who made the container need to rely on external documentation to know how to run the container properly.
So, in general, you should have
CMD instruction in your Dockerfile.
Difference between CMD and ENTRYPOINT
ENTRYPOINT are confusing.
Every commands, either passed as an argument or specified from
CND instruction are passed as argument of binary specified in
/bin/sh -c is the default entrypoint. So if you specify
CMD date without specifying entrypoint, Docker executes it as
/bin/sh -c date.
By using entrypoint, you can change the behaviour of your container at run time that makes container operation a bit more flexible.
With the entrypoint above, the container prints out current date with different format.
1 2 3 4 5
exec format error
There is one caveat in default entrypoint. For example, you want to execute the following shell script.
1 2 3
When you run the container, your expectation is the container prints out
hello, world. However, what you will get is a error message that doesn’t make sense.
You see this message when you didn’t put shebang in your script, and because of that, default entrypoint
/bin/sh -c does not know how to run the script.
To fix this, you can either add shebang
or you can specify from command line.
Build caching: what invalids cache and not?
Docker creates a commit for each line of instruction in Dockerfile. As long as you don’t change the instruction, Docker thinks it doesn’t need to change the image, so use cached image which is used by the next instruction as a parent image.
This is the reason why
docker build takes long time in the first time, but immediately finishes in the second time.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
However, when cache is used and what invalids cache are sometimes not very clear. Here is a few cases that I found worth to note.
Cache invalidation at one instruction invalids cache of all subsequent instructions
This is the basic rule of caching. If you cause cache invalidation at one instruction, subsequent instructions doesn’t use cache.
1 2 3 4 5 6 7 8 9 10
Since you add Run apt-get update instruction, all instructions after that have to be done from the scratch even if they are not changed. This is inevitable because Dockerfile uses the image built by the previous instruction as a parent image to execute next instruction. So, if you insert an instruction that creates a new parent image, all subsequent instructions cannot use cache because now parent image differs.
Cache is invalid even when adding commands that don’t do anything
This invalidates caching. For example,
1 2 3 4 5
true command doesn’t change anything of the image, Docker invalids the cache.
Cache is invalid when you add spaces between command and arguments inside instruction
This invalids cache
1 2 3 4 5
Cache is used when you add spaces around commands inside instruction
Cache is valid even if you add space around commands
1 2 3 4 5
Cache is used for non-idempotent instructions
This is kind of pitfall of build caching. What I mean by non-idempotent instructions is the execution of commands that may return different result each time.
apt-get update is not idempotent because the content of updates changes as time goes by.
You made this Dockerfile and create image. 3 months later, Ubuntu made some security updates to their repository, so you rebuild the image by using the same Dockerfile hoping your new image includes the security updates.
However, this doesn’t pick up the updates. Since no instructions or files are changed, Docker uses cache and skips doing
If you don’t want to use cache, just pass
-no-cache option to build.
Instructions after ADD never cached (Only versions prior to 0.7.3)
If you use Docker before v7.3, watch out!
1 2 3 4
If you have Dockerfile like this, Run apt-get update and Run apt-get install openssh-server will never be cached.
The behavior is changed from v7.3. It caches even if you have ADD instruction, but invalids cache if file content is changed.
1 2 3 4 5 6 7 8 9 10 11
Since you change rock.you file, instructions after Add doesn’t use cache.
Hack to run container in the background
If you want to simplify the way to run containers, you should run your container on background with
docker run -d image your-command.
Instead of running with
docker run -i -t image your-command, using
-d is recommended because you can run your container with just one command and you don’t need to detach terminal of container by hitting
Ctrl + P + Q.
However, there is a problem with
-d option. Your container immediately stops unless the commands are not running on foreground.
Let me explain this by using case where you want to run apache service on a container. The intuitive way of doing this is
However, the container stops immediately after it is started. This is because
apachectl exits once it detaches apache daemon.
Docker doesn’t like this. Docker requires your command to keep running in the foreground. Otherwise, it thinks that your applications stops and shutdown the container.
You can solve this by directly running apache executable with foreground option.
1 2 3 4 5 6 7
Here we are manually doing what
apachectl does for us and run apache executable. With this approach, apache keeps running on foreground.
The problem is that some application does not run in the foreground. Also, we need to do extra works such as exporting environment variables by ourselves. How can we make it easier?
In this situation, you can add
tail -f /dev/null to your command. By doing this, even if your main command runs in the background, your container doesn’t stop because
tail is keep running in the foreground. We can use this technique in the apache case.
Much better, right? Since
tail -f /dev/null doesn’t do any harm, you can use this hack to any applications.