swarms

Understanding Swarm clusters

A swarm is a group of machines that are running Docker and joined into a cluster. After that has happened, you continue to run the Docker commands you’re used to, but now they are executed on a cluster by a swarm manager. The machines in a swarm can be physical or virtual. After joining a swarm, they are referred to as nodes.

swarm 就是 docker 集群模式。启用 swarm 之后，命令只能在 swarm manager 上执行。加入集群的机器可以是物理机或虚拟机，加入之后，被称为节点。

Swarm managers can use several strategies to run containers, such as

“emptiest node” – which fills the least utilized machines with containers. 为使用率最低的机器分配容器
Or “global”, which ensures that each machine gets exactly one instance of the specified container. 为每个机器都分配一个容器

You instruct the swarm manager to use these strategies in the Compose file, just like the one you have already been using.

Swarm managers are the only machines in a swarm that can execute your commands, or authorize other machines to join the swarm as workers. Workers are just there to provide capacity and do not have the authority to tell any other machine what it can and cannot do.

Up until now, you have been using Docker in a single-host mode on your local machine. But Docker also can be switched into swarm mode, and that’s what enables the use of swarms. Enabling swarm mode instantly 立即 makes the current machine a swarm manager. From then on, Docker will run the commands you execute on the swarm you’re managing, rather than just on the current machine.

Set up your swarm

A swarm is made up of multiple nodes, which can be either physical or virtual machines. The basic concept is simple enough: run docker swarm init to enable swarm mode and make your current machine a swarm manager, then run docker swarm join on other machines to have them join the swarm as workers.

docker swarm init : 创建 swarm 且当前机器做为 swarm manager (地主)
docker swarm join : 加入已经创建好的 swarm 做 worker (苦工)

Create a cluster

原文中是使用 docker-mechine 创建了两个虚拟机。

由于我运行 docker-ce 的 Ubuntu1604Server 已经是虚拟机了，无法再在里面安装 virtualbox。因此，这里开了两个 Ubuntu1604Server ，并且都安装了 docker-ce

节点名	节点角色	IP 地址	系统版本	Docker 版本
S12	Manager	192.168.56.212	Ubuntu 1604.03	Docker-ce 17.06
S13	Worker	192.168.56.213	Ubuntu 1604.03	Docker-ce 17.06

初始化 swarm

在 S12 上执行命令 docker swarm init。命令执行后， S12 便成为这个集群的 swarm manager，控制集群的所有命令都通过 S12 发出。

由于 S12 上有多个 IP，因此在初始化 swarm 的时候，必须使用 --advertise-addr 192.168.56.212 指定 swarm 绑定的 IP。

[user@S012 04.swarm_sample]$ docker swarm init --advertise-addr 192.168.56.212
Swarm initialized: current node (z2yzvrbh0mv2w2yzhoc9ryzmb) is now a manager.

To add a worker to this swarm, run the following command:

    docker swarm join --token SWMTKN-1-4750ptov1hfil71va7ofrwvkihvwupn4gs8akpz4hqis13y7u5-59w992ywlsrxme1glcr6axc9a 192.168.56.212:2377

To add a manager to this swarm, run 'docker swarm join-token manager' and follow the instructions.

初始化完成后，可以看到系统提示，当前主机已经是 manager 地主 了。

并且显示 docker swarm join 命令提示其他主机如何加入该集群。

To add a worker to this swarm, run the following command:

  docker swarm join \
  --token <token> \
  <ip>:<port>

在 S13 上执行 docker swarm join 命令加入刚才创建的 swarm。

user@S013:~$ docker swarm join --token SWMTKN-1-4750ptov1hfil71va7ofrwvkihvwupn4gs8akpz4hqis13y7u5-59w992ywlsrxme1glcr6axc9a 192.168.56.212:2377

This node joined a swarm as a worker.
# 该节点已经成功加入 swarm ， 卖身为奴。

注意：在使用 docker swarm join 时，如果 node 也是多 IP 环境，也必须使用 --advertise-addr ipaddr 选项。否则 docker 会根据路由或者其他条件随机选择一个网卡，然而这个可能不是你所期待的。

参考《docker 99 问》多宿主网络怎么配置

现在回到 S12, 使用 docker node ls 查看当前节点状态

$ docker node ls
ID                            HOSTNAME            STATUS              AVAILABILITY        MANAGER STATUS
re0g5dw09cwd7h4ciab68uaw1     S013    Ready               Active              
z2yzvrbh0mv2w2yzhoc9ryzmb *   S012    Ready               Active              Leader

可以看到， S12 和 S13 都已经成为集群的节点了。并且，在 MANAGER STATUS 一栏， S12 被标为 Leader。

Deploy your app on a cluster

继续留在 S12 上，通过 swarm manager 的身份向集群发布应用。找到 03.service 节中的 docker-compose.yml。

使用 docker stack deploy 命令发布应用。

[user@S012 03.service_sample]$ docker stack deploy -c docker-compose.yml getstartedlab
Creating network getstartedlab_webnet
Creating service getstartedlab_web

需要注意的是：由于 S13上面没有 octowhale/friendlyhello:tag 镜像，因此所有在 S13 节点上启动容器都失败了，提示 No such image:... 。最后发现 5 个容器都启动在 S12 上了。

注意：这里其实是我将镜像名字写错了。本来应该是 latest 而写成了 tag。

[user@S012 03.service_sample]$ docker stack ps getstartedlab
ID                  NAME                      IMAGE                       NODE                DESIRED STATE       CURRENT STATE             ERROR                              PORTS
tqxgpkg81j3d        getstartedlab_web.1       octowhale/friendlyhello:tag   S013    Ready               Preparing 1 second ago                                       
lylttxrahjsm         \_ getstartedlab_web.1   octowhale/friendlyhello:tag   S012    Shutdown            Rejected 1 second ago     "No such image: octowhale/friend…"

... 略 ...

首先，在 S12 上执行命令 docker stack rm 关闭服务。

[user@S012 03.service_sample]$ docker stack rm getstartedlab
Removing service getstartedlab_web
Removing network getstartedlab_webnet

然后，切回到 S13，使用 docker pull 命令拉取镜像 octowhale/friendlyhello:lastest。

$ docker pull octowhale/friendlyhello:latest
latest: Pulling from octowhale/friendlyhello
ad74af05f5a2: Pull complete
a36a1c51ab4d: Pull complete
be169522399f: Pull complete
286703095347: Pull complete
1c1fda0fa4c6: Pull complete
0951c86a6675: Pull complete
0302b40f6cd6: Pull complete

最后，再次回到 S12 上发布应用

使用 docker stack ps getstartedlab 查看结果

[user@S012 03.service_sample]$ docker stack ps getstartedlab  
ID                  NAME                  IMAGE                          NODE                DESIRED STATE       CURRENT STATE           ERROR               PORTS
tcw0rpmg3jmh        getstartedlab_web.1   octowhale/friendlyhello:latest   S013    Running             Running 8 seconds ago                       
v2ww0v172uch        getstartedlab_web.2   octowhale/friendlyhello:latest   S013    Running             Running 8 seconds ago                       
yhlyjw4quyol        getstartedlab_web.3   octowhale/friendlyhello:latest   S012    Running             Running 9 seconds ago                       
5yyyq98y7ad4        getstartedlab_web.4   octowhale/friendlyhello:latest   S013    Running             Running 8 seconds ago                       
qpbonq3g97sv        getstartedlab_web.5   octowhale/friendlyhello:latest   S012    Running             Running 9 seconds ago

可以发现，容器已经在两台节点上分别启动起来了。

注意：第二次发布应用的时候， S13 依旧失败，提示 "starting container failed: Ad…"，经分析，应该是觉得 docker-compose.yml 发布是端口映射的原因 - "80:80"。由于用户普通用户，不能打开 80 端口。修改端口为 8080 后，重新发布正常了。

# 错误提示
j9hnjmfghnmb         \_ getstartedlab_web.2   octowhale/friendlyhello:latest   S013    Shutdown            Failed 31 seconds ago    "starting container failed: Ad…"

Accessing your cluster

You can access your app from the IP address of either 同时 S12 or S13. The network you created is shared between them and load-balancing. 共享网络和 LB

The reason both IP addresses work is that nodes in a swarm participate in an ingress routing mesh 共享路由人口. This ensures that a service deployed at a certain port within your swarm always has that port reserved to itself, no matter what node is actually running the container. Here’s a diagram of how a routing mesh for a service called my-web published at port 8080 on a three-node swarm would look:

Having connectivity trouble?

Keep in mind that in order to use the ingress network in the swarm, you need to have the following ports open between the swarm nodes before you enable swarm mode: 如果要实现 共享路由入口 ingress-routing-mesh，需要在节点直接相互允许以下端口访问：

Port 7946 TCP/UDP for container network discovery.
Port 4789 UDP for the container ingress network.

Iterating and scaling your app

From here you can do everything you learned about in part 3.

Scale the app by changing the docker-compose.yml file.

Change the app behavior by editing code.

In either case, simply run docker stack deploy again to deploy these changes.

You can join any machine, physical or virtual, to this swarm, using the same docker swarm join command you used on S13, and capacity will be added to your cluster. Just run docker stack deploy afterwards, and your app will take advantage of the new resources.

Cleanup

You can tear down the stack with docker stack rm. For example:

# 在 S12 上执行 rm 删除 service
docker stack rm getstartedlab

集群