An introduction

Why should you care?

Docker is a way of packaging code and everything that is needed to run it

It's kind of like Git (Remember that?)

but for a whole operating system and everything running in it

This means..

Common software can be super easy to install and run
If software works on your machine, it works in the same way on everyone else's
It can act as installation and execution documentation for your future self (or others)

Installation

Follow Instructions

Some terminology

Image: A blueprint for running your code
Container: An instance of an image. You can run many containers based on one image
Volume: A place to put data that containers can access

Images

Blueprint for your code and execution environment. They are a snapshot of an operating system, along with any code, data and configuration you specify.

Running Images

Images are usually specified as [name]:[version]


$ docker run ubuntu:16.04

If an image you specify isn't found on your computer, Docker looks for it in a registry and tries to downloads it from there

Managing Images

You can see all the images you have on your system


$ docker images

and delete them if you don't need them any more


$ docker rmi ubuntu:16.04

Containers

When you run an image, you get a container.


$ docker run --name ubuntu-test --rm -it ubuntu:16.04 bash

Managing Containers

You can see all the containers you have running on your system


$ docker ps

and delete them if you don't need them any more


$ docker rm ubuntu-test

Volumes

Act like mounts or shared drives

As a matter of principle, don't expect data to persist in a container unless it is in a volume

There are two types of volumes you should care about

Docker managed data volumes
Host directory mounts

Docker managed data volumes

Start an ubuntu container with a volume called test_volume mounted at /tmp


$ docker run --name ubuntu-test --rm -it -v ubuntu-volume:/tmp ubuntu:16.04 bash

Display all volumes


$ docker volume ls

Getting data in and out of containers

Data in a volume is only available to containers that mount it

Get data in and out of a running container with docker cp


$ docker cp ubuntu-test:/tmp/blarg .

Host directory mount

Start an ubuntu container with host directory /tmp mounted in the container at /tmp


$ docker run --rm -it -v /tmp:/tmp ubuntu:16.04 bash

Can be convenient but has some issues we'll come back to

Fish Blast

A "practical" example

We wanted to know what kind of fish this presentation was most like.

We extracted all the A T G and Cs using this terrifying bash script:

echo ">X" > /tmp/dna && egrep [atcg] -oIRh | tr -d '\n' >> /tmp/dna

(Don't try and memorise this)

We found a docker image online with blast installed

docker pull simonalpha/ncbi-blast-docker

And a set of mitocondrial fish fasta files

wget "http://mitofish.aori.u-tokyo.ac.jp/files/complete_partial_mitogenomes.zip"

We built the database

docker run --rm -v ~/blast/project:/blast -v ~/blast/databases:/db /
-w /blast simonalpha/ncbi-blast-docker /
makeblastdb -in /blast/fish.fa -parse_seqids -dbtype nucl

Then ran our fasta file against it

docker run --rm -v ~/blast/project:/blast -v ~/blast/databases:/db /
-w /blast simonalpha/ncbi-blast-docker blastn -query /blast/test.faa /
-db /db/fish.fa -task blastn  -out q_seq_V_pres_db

But..

This is no better than just downloading blast.

And there are far too many options.

And making the database each time is a pain.

Solution:

We can make our own docker image!

A Docker image is built from a Dockerfile

which contains, at a minimum, a parent image and a command that the container runs

As we're pretty happy with the image from the previous command, we're going to use that as our parent

FROM simonalpha/ncbi-blast-docker:2.2.30plus

We want to create some directories, as we'll use them later

RUN mkdir /db

Next, we want to add the contents of the database folder to our image

ADD database /db

There's no getting round the fact that blast has a load of setup options, so we've hidden them in a bash script which we need to add


ADD blast.sh /blast/

RUN chmod +x /blast/blast.sh

We need some volumes for the input and output files to go into

VOLUME /input

VOLUME /output

Finally, we need the image to run that script when it starts

ENTRYPOINT ["/blast/blast.sh"]

We have a Dockerfile!


FROM simonalpha/ncbi-blast-docker:2.2.30plus

RUN mkdir /db

ADD database /db

ADD blast.sh /blast/

RUN chmod +x /blast/blast.sh

VOLUME /input
VOLUME /output

ENTRYPOINT ["/blast/blast.sh"]

Now what?

We need to create an image (called blast-fish)

docker build -t blast-fish .

And run it!

docker run --rm -v *input_directory or file*:/input -v *output_directory*:/output blast-fish

The result:

Samaris  cristatus (["Pleuronectiformes;"] ["Samarida...  86.0    1e-12

Cockatoo righteye flounder

Caveats

Command line noise

Docker run commands can get pretty ugly. Docker compose is a good way of taming this

Communicating between containers

Best-practice is to run one process per container. Docker compose also makes managing multiple containers easier

Data management

If you mount volumes from the host machine, managing permissions can be tricky. If you use docker data volumes, getting the data in and out is annoying. Docker compose can also make managing volumes easier.