WordPress / EasyEngine failed to start after server reboot

Imagine the following scenario: you use EasyEngine to manage your WordPress sites in a VPS. Your server recently had a status change, maybe it got rebooted, or some services had a blip – went offline then online again. Suddenly all your WordPress sites went down. You went into panic mode.

I faced this nasty situation last week. It took me the whole night to figure it out. A big part of the difficulty is that I could not find any concrete info online describing all parts of this problem, so I have to stitch the pieces together.

I hope this article could help people encountering similar problems in future. Let’s dive in.

Context

My VPS was restarted as part of the service provider’s routine maintenance. I’ve also encountered this exact issue when I migrated my VPS to a different node / datacenter. In summary, this issue is almost always correlated to a change of your server’s status.

Problem symptoms

Before getting to the solutions, let’s talk about the problem’s symptoms. Let’s say yoursite.net is the WordPress instance’s ID, in EasyEngine’s terms. Your first instinct may be to check the status of your site:

root@li1984-106:~# ee site enable yoursite.net
Error: yoursite.net is already enabled!

root@li1984-106:~# ee site enable yoursite.net --refresh
Enabling site yoursite.net.
Error: There was error in enabling yoursite.net. Please check logs.

Now let’s try restarting the site via EasyEngine’s CLI. You may encounter this error:

root@li1984-106:~# ee site restart yoursite.net
No container found for nginx_1

Error: Nginx test failed

That’s strange. No container for nginx? Let’s take a look at all the running Docker instances:

root@li1984-106:~# docker ps -a
CONTAINER ID        IMAGE                               COMMAND                  CREATED             STATUS                           PORTS               NAMES
e11e3494df25        easyengine/nginx:v4.1.4             "/usr/bin/openresty …"   3 months ago        Exited (128) About an hour ago   80/tcp              yoursitenet_nginx_1
9db46d17e382        easyengine/php7.2:v4.1.6            "docker-entrypoint.s…"   3 months ago        Up About an hour                 9000/tcp            yoursitenet_php_1
fc98027815e9        easyengine/postfix:v4.1.5           "postfix start-fg"       3 months ago        Exited (128) About an hour ago   25/tcp              yoursitenet_postfix_1
d9102ce8619d        easyengine/nginx-proxy:v4.1.4       "/app/docker-entrypo…"   3 months ago        Exited (2) About an hour ago                         services_global-nginx-proxy_1
1a934a1b06ae        easyengine/newrelic-daemon:v4.0.0   "sh -c '/usr/local/b…"   3 months ago        Up About an hour                                     services_global-newrelic-daemon_1
845e81b2e13d        easyengine/mariadb:v4.1.3           "docker-entrypoint.s…"   3 months ago        Exited (1) About an hour ago     3306/tcp            services_global-db_1
ebc58a590bd4        easyengine/redis:v4.1.4             "docker-entrypoint.s…"   3 months ago        Up About an hour                 6379/tcp            services_global-redis_1
de2358afdf25        easyengine/mariadb:v4.0.0           "docker-entrypoint.s…"   3 months ago        Restarting (1) 8 seconds ago                         ee_global-db_1
f5b161945cd0        easyengine/redis:v4.0.0             "docker-entrypoint.s…"   3 months ago        Up About an hour                 6379/tcp            ee_global-redis_1
385f9bddd415        easyengine/cron:v4.0.0              "/usr/bin/ofelia dae…"   22 months ago       Up About an hour                                     ee-cron-scheduler

The formatting may be messed up but the important thing is that we have the yoursitenet_nginx_1 container instance with port 80, and more importantly, services_global-nginx-proxy_1 container instance with no port.

Wait a minute… no port? That’s when I noticed something’s fishy.

Solution

There are several aspects to this problem. Firstly we need to disable the WordPress sites. This provides not only a safer debug environment, but also a clean slate when we eventually bringing the sites back up:

root@li1984-106:~# ee site disable yoursite.net
Disabling site yoursite.net.
cd /Success: Site yoursite.net disabled.

The first and very likely problem is that something’s occupying port 80, thus preventing EasyEngine from mounting its global nginx proxy container to that port.

The solution would be to find out what’s occupying port 80 of your server with the following command:

root@li1984-106:~# lsof -i :80 | grep LISTEN
nginx   623     root    6u  IPv4  17941      0t0  TCP *:http (LISTEN)
nginx   623     root    7u  IPv6  17942      0t0  TCP *:http (LISTEN)
nginx   624 www-data    6u  IPv4  17941      0t0  TCP *:http (LISTEN)
nginx   624 www-data    7u  IPv6  17942      0t0  TCP *:http (LISTEN)

As you can see, in my case, the server’s default nginx service is causing a conflict with EasyEngine.

To be specific, the server’s default nginx service is running globally as a native UNIX process, whereas EasyEngine’s (supposedly) global nginx service is running inside a Docker container that has a lower priority than the server’s counterpart. Hence EasyEngine’s nginx proxy could not use port 80 because it’s already occupied.

I did some digging and turns out one of the maintainers of EasyEngine also advised against running the server’s nginx alongside EasyEngine. Well, that surely is a vote of confidence.

Let’s stop the server’s nginx process and check on the usage of port 80 again:

root@li1984-106:~# service nginx stop
root@li1984-106:~# lsof -i :80 | grep LISTEN
# now it prints nothing, meaning nothing's using port 80

Now with the ports sorted out, we need to restart the EasyEngine global services that are stuck in a problematic state. This is done by stopping and removing all the Docker container instances and rebuilding them again inside EasyEngine’s service directory:

root@li1984-106:~# cd /opt/easyengine/services/

root@li1984-106:/opt/easyengine/services# docker-compose down
Stopping services_global-newrelic-daemon_1 ... done
Stopping services_global-nginx-proxy_1     ... done
Stopping services_global-redis_1           ... done
Stopping services_global-db_1              ... done
Removing services_global-newrelic-daemon_1 ... done
Removing services_global-nginx-proxy_1     ... done
Removing services_global-redis_1           ... done
Removing services_global-db_1              ... done
Network ee-global-frontend-network is external, skipping
Network ee-global-backend-network is external, skipping

root@li1984-106:/opt/easyengine/services# docker-compose up -d 
Creating services_global-db_1              ... done
Creating services_global-redis_1           ... done
Creating services_global-nginx-proxy_1     ... done
Creating services_global-newrelic-daemon_1 ... done

And just for your reference, this is what happens when you try to execute this step while your port is occupied:

root@li1984-106:~# cd /opt/easyengine/services/

root@li1984-106:/opt/easyengine/services# docker-compose down
Stopping services_global-newrelic-daemon_1 ... done
Stopping services_global-redis_1           ... done
Removing services_global-nginx-proxy_1     ... done
Removing services_global-newrelic-daemon_1 ... done
Removing services_global-db_1              ... done
Removing services_global-redis_1           ... done
Network ee-global-frontend-network is external, skipping
Network ee-global-backend-network is external, skipping

root@li1984-106:/opt/easyengine/services# docker-compose up -d
Creating services_global-newrelic-daemon_1 ...
Creating services_global-nginx-proxy_1     ... error
Creating services_global-redis_1           ...
Creating services_global-newrelic-daemon_1 ... done

Creating services_global-redis_1           ... done
Creating services_global-db_1              ... done

ERROR: for global-nginx-proxy  Cannot start service global-nginx-proxy: driver failed programming external connectivity on endpoint services_global-nginx-proxy_1 (2e44b6924f68439bfe437381d90ee5b38d55d91b8c367ff6710eae72e2df51bb): Error starting userland proxy: listen tcp 0.0.0.0:80: bind: address already in use
ERROR: Encountered errors while bringing up the project.

Now let’s verify the Docker container’s status, specifically the previously problematic services_global-nginx-proxy_1. We can see that now it’s using and forwarding traffic from port 80 as expected:

root@li1984-106:~# docker ps -a
CONTAINER ID        IMAGE                               COMMAND                  CREATED             STATUS              PORTS                                      NAMES
def4bc5316aa        easyengine/nginx-proxy:v4.1.4       "/app/docker-entrypo…"   2 minutes ago       Up 2 minutes        0.0.0.0:80->80/tcp, 0.0.0.0:443->443/tcp   services_global-nginx-proxy_1

The last step would be to re-enable your WordPress sites:

root@li1984-106:~# ee site enable yoursite.net
Enabling site yoursite.net.
Success: Site yoursite.net enabled.
Running post enable configurations.
Starting site's services.
Global auth exists on admin-tools. Use `ee auth list global` to view credentials.
Success: admin-tools enabled for yoursite.net site.

You should see your WordPress sites up and running again.

Congrats if that’s the case. If not, please leave a comment below to share the specific problem you encountered.