I tried the Docker swarm
TD; LR
Although docker swarm is very easy to use, it is easier to actually operate after using it properly.
Foreword
Docker swarm has been integrated with docker since 1.12.0 of Docker, making it easy to use Docker in a cluster environment. This article is a record which I examined about this Docker swarm mode as free study of summer vacation.
It is long. It is fine. Various assumptions are skipped. We do not recommend reading all.
It is a great help if you point out what is wrong.
Repository
- Repository on this sentence
- https://github.com/shirou/docker-swarm-test
- Engine Docker : Docker Engine CLI specific guy.
- https://github.com/docker/docker
- Swarmkit : it contains the actual contents of the swarm in this
- https://github.com/docker/swarmkit
- Swarm : Swarm implementation of up to 1.12. Not covered this time
- https://github.com/docker/swarm
Docker swarm mode
The docker swarm mode is a mode in which multiple docker hosts can be bundled and used. Traditionally it was another thing as a docker swarm, but it was integrated into the docker engine from 1.12.
I skip over the summary quickly. The following are memos of terms and commands.
Character
- Manager node
- Management node. The optimum number of manager nodes 3 to 7
- Worker node
- Task execution node
- Node
- A server running one docker engine
- Service
- Multiple docker engines cooperate to provide services with one port One swarm cluster can have multiple services
How to use
- Swarm
- docker swarm init
- Initialize swarm clustr. In other words, the docker host that executed this command becomes the first manager node.
- docker swarm join
- Join the swarm cluster managed by the specified manager node. --token Specifies the swarm cluster token in. If you specify manager token you will join as a manager, if you specify worker token join as a worker. Or --manager You can also explicitly to join as a manager in.
- docker swarm leave
- Leave cluster
- docker swarm init
- Node
- docker node ls
- Look at the state of node
- docker node ps
- Look at the state of task
- docker node update
- Update node
- docker node demote / docker node promote
- Demote to worker (demote) / Promote to manager (promote)
- docker node ls
- Service
- docker service create
- Create service
- docker service ls
- I look at the state of service
- docker service ps
- Look at the state of task
- docker service update
- I do rolling update
- docker service create
- Network
- docker network create
- Create an overray network
- docker network create
Process used this time
The process to be executed this time is as follows. Code is こちら .
package main
import (
"fmt"
"net/http"
"strings"
"time"
)
var wait = time.Duration(1 * time.Second)
func handler(w http.ResponseWriter, r *http.Request) {
rec := time.Now()
time.Sleep(wait)
rep := time.Now()
s := []string{
rec.Format(time.RFC3339Nano),
rep.Format(time.RFC3339Nano),
}
fmt.Fprintf(w, strings.Join(s, ","))
}
func main() {
http.HandleFunc("/", handler)
http.ListenAndServe(":8080", nil) // fixed port num
}
I merely wait for one second and only return the time of request and reply with CSV 8080 port. It's the worst process of blocking one second and waiting.
This time the build is left to CircleCI, and since we have made tar.gz, we can import on each node as follows. Tag is appropriate.
$ docker import https://<circleci artifact URL>/docker-swarm-test.tar.gz docker-swarm-test:1
Hint
golangはglibcなどが必要なく1バイナリで実行できるので、Dockerfileとか別に要らなくてtar.gzで十分。linuxでnetを使うとdynamic linkになる件は Go 1.4でstatic binaryを作成する や golangで書いたアプリケーションのstatic link化 をみてください。
$ docker service create --name web --replicas 3 --publish 8080:8080 docker-swarm-test:1 "/docker-swarm-test"
$ docker service ps web
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR
18c1hxqoy3gkaavwun43hyczw web.1 docker-swarm-test:1 worker-2 Running Running 3 minutes ago
827sjn1t4nrj7r4c0eujix2it web.2 docker-swarm-test:1 manager-1 Running Running 3 minutes ago
2xqzzwf2bte3ibj2ekccrt6nv web.3 docker-swarm-test:1 worker-3 Running Running 3 minutes ago
In this state, we will return it with curl on worker - 2, 3, manager - 1 where the container is running. Besides that, even if you listen to curl for a worker - 1 whose container is not working, it will answer properly. This is because the request has been transferred inside.
Rolling update
In Cluster Swarm --publish and to provide the service with a, ` 一つのport will do the load balancing to allocate requests to multiple node in. Therefore, there have been cases where the port dynamically changes in docker or using the same port number in one node, but that problem does not arise. Also, because it performs load balancing within swarm cluster, rolling update is also easy.
% sudo docker service update --image "docker-swarm-test:1" web
So I tried it. I will wait one second for the previous process, so if you do not do it well you should miss the request.
I use ab for the tool. This time it is not a processing capacity but a test as to whether the request will not be lost, so we decided that ab is sufficient.
% ab -rid -c 10 -n 500 http://45.76.98.219:8080/
Concurrency Level: 10
Time taken for tests: 50.146 seconds
Complete requests: 500
Failed requests: 8
(Connect: 0, Receive: 4, Length: 0, Exceptions: 4)
That is why I missed it. that's too bad. --update-delay is concerned because it is time to start the next container is unlikely. --restart-delay tried in combination also It was useless. If you manually change the status of node to drain, it may work, but it does not try as it takes time and effort.
When examining it, it seems that this area is related.
It will be fixed by the next patch release. I still do not have enough investigation up to libnetwork, so I do not know if this is really fixed, but it seems that it is about to be used on a production environment yet soon.
Rather than using nginx
Than, the first place ingress overlay network says 内部で使う用途で、外部サービスに公開する用途ではない , of the fact that It seems. When publishing it to the outside, it seems to be nginx and decide which container to use according to DNS service discovery described below.
I feel like I have to look a little further in the neighborhood this time.
Network
docker network create in to create the network. Later docker service update --network-add When you try to add a network in
Error response from daemon: rpc error: code = 2 desc = changing network in service is not supported
I was offended and I will rebuild the service.
docker service create --replicas 3 --name web --network webnet ...
Then, launch alpine as a shell.
$ docker service create --name shell --network webnet alpine sleep 3000
$ sudo docker service ls
ID NAME REPLICAS IMAGE COMMAND
1f9jj2izi9gr web 3/3 docker-swarm-test:1 /docker-swarm-test
expgfyb6yadu my-busybox 1/1 busybox sleep 3000
and that belong to the same network in the nslookup entered into in Exec web looks for a service in DNS.
$ docker exec -it shell.1.3x69i44r6elwtu02f1nukdm2v /bin/sh
/ # nslookup web
Name: web
Address 1: 10.0.0.2
/ # nslookup tasks.web
Name: tasks.web
Address 1: 10.0.0.5 web.3.8y9qbba8eknegorxxpqchve76.webnet
Address 2: 10.0.0.4 web.2.ccia90n3f4d2sr96m2mqa27v3.webnet
Address 3: 10.0.0.3 web.1.44s7lqtil2mk4g47ls5974iwp.webnet
web that is, the VIP To hear the service name, tasks.web us to answer each node directly in the DNS RoundRobin To hear.
In this way, as long as you belong to the same network, you can draw other services by name, so I think that it is easy to cooperate between containers.
protocol
Raft
In docker swarm, the Leader election between a plurality of Node Manager Raft consensus we use. The implementation of raft is raft library of etcd. docker node ls any Manager in you see how Leader.
$ docker node ls
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS
5g8it81ysdb3lu4d9jhghyay3 worker-3 Ready Active
6td07yz5uioon7jycd15wf0e8 * manager-1 Ready Active Leader
91t9bc366rrne19j37d0uto0x worker-1 Ready Active
b6em4f475u884jpoyrbeubm45 worker-2 Ready Active
Since it is raft, in order to have proper fault tolerance, at least 3 manager nodes are needed, and if 2 out of 3 falls, you can not select Leader. In this case, docker swarm will be in a state where new task can not be accepted.
Heat beat
swarm node between the heatbeat has a life-and-death monitoring in. heatbeat is usually 5秒間隔 you, but, docker swarm init at --dispatcher-heartbeat duration can also be specified in. Results of life and death monitoring are distributed with gossip.
Doubt
What happens to the container after erasing service?
docker service rm and turn off the service in, container also will disappear altogether. It takes time to disappear, so be careful
What if I get tasks over the worker node?
What happens if docker swarm scale web = 10 if there are only three nodes?
The answer is that there are multiple containers on one node.
Concept of pod
It is unlikely. When you create a Service --constraint placement restrictions in the affinity wonder if such Toka use.
Afterword
I do not care about container technology itself any longer, I personally think that multiple Node management such as Kubernetes is important. Although Docker swarm itself has existed before, it is felt the enthusiasm that integrating it with the Docker Engine makes it possible to manage not only containers but also de facto standards on them. Moreover, although it is difficult to start using kubernetes, since it has little time and effort, I feel that it is advantageous.
It is easy to form swarm cluster, and cluster itself seemed to be very stable. Of course, I am not doing a fancy test such as split brain, so it's nothing but I am thinking that the raft relationship is stabilized because it uses etcd.
However, it seems that the network is still unstable, making and erasing the service, creating and erasing the network, the name can not be closed (details are not pursued).
There are points that have not yet been achieved, such as network and graceful, but I think that Docker swarm will become popular in the future.