WordPress / EasyEngine failed to start after server reboot
Imagine the following scenario: you use EasyEngine to manage your WordPress sites in a VPS. Your server recently had a status change, maybe it got rebooted, or some services had a blip – went offline then online again. Suddenly all your WordPress sites went down. You went into panic mode.
I faced this nasty situation last week. It took me the whole night to figure it out. A big part of the difficulty is that I could not find any concrete info online describing all parts of this problem, so I have to stitch the pieces together.
I hope this article could help people encountering similar problems in future. Let’s dive in.
Context
My VPS was restarted as part of the service provider’s routine maintenance. I’ve also encountered this exact issue when I migrated my VPS to a different node / datacenter. In summary, this issue is almost always correlated to a change of your server’s status.
Problem symptoms
Before getting to the solutions, let’s talk about the problem’s symptoms. Let’s say yoursite.net is the WordPress instance’s ID, in EasyEngine’s terms. Your first instinct may be to check the status of your site:
root@li1984-106:~# ee site enable yoursite.net
Error: yoursite.net is already enabled!
root@li1984-106:~# ee site enable yoursite.net --refresh
Enabling site yoursite.net.
Error: There was error in enabling yoursite.net. Please check logs.
Now let’s try restarting the site via EasyEngine’s CLI. You may encounter this error:
root@li1984-106:~# ee site restart yoursite.net
No container found for nginx_1
Error: Nginx test failed
That’s strange. No container for nginx? Let’s take a look at all the running Docker instances:
root@li1984-106:~# docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
e11e3494df25 easyengine/nginx:v4.1.4 "/usr/bin/openresty …" 3 months ago Exited (128) About an hour ago 80/tcp yoursitenet_nginx_1
9db46d17e382 easyengine/php7.2:v4.1.6 "docker-entrypoint.s…" 3 months ago Up About an hour 9000/tcp yoursitenet_php_1
fc98027815e9 easyengine/postfix:v4.1.5 "postfix start-fg" 3 months ago Exited (128) About an hour ago 25/tcp yoursitenet_postfix_1
d9102ce8619d easyengine/nginx-proxy:v4.1.4 "/app/docker-entrypo…" 3 months ago Exited (2) About an hour ago services_global-nginx-proxy_1
1a934a1b06ae easyengine/newrelic-daemon:v4.0.0 "sh -c '/usr/local/b…" 3 months ago Up About an hour services_global-newrelic-daemon_1
845e81b2e13d easyengine/mariadb:v4.1.3 "docker-entrypoint.s…" 3 months ago Exited (1) About an hour ago 3306/tcp services_global-db_1
ebc58a590bd4 easyengine/redis:v4.1.4 "docker-entrypoint.s…" 3 months ago Up About an hour 6379/tcp services_global-redis_1
de2358afdf25 easyengine/mariadb:v4.0.0 "docker-entrypoint.s…" 3 months ago Restarting (1) 8 seconds ago ee_global-db_1
f5b161945cd0 easyengine/redis:v4.0.0 "docker-entrypoint.s…" 3 months ago Up About an hour 6379/tcp ee_global-redis_1
385f9bddd415 easyengine/cron:v4.0.0 "/usr/bin/ofelia dae…" 22 months ago Up About an hour ee-cron-scheduler
The formatting may be messed up but the important thing is that we have the yoursitenet_nginx_1 container instance with port 80, and more importantly, services_global-nginx-proxy_1 container instance with no port.
Wait a minute… no port? That’s when I noticed something’s fishy.
Solution
There are several aspects to this problem. Firstly we need to disable the WordPress sites. This provides not only a safer debug environment, but also a clean slate when we eventually bringing the sites back up:
root@li1984-106:~# ee site disable yoursite.net
Disabling site yoursite.net.
cd /Success: Site yoursite.net disabled.
The first and very likely problem is that something’s occupying port 80, thus preventing EasyEngine from mounting its global nginx proxy container to that port.
The solution would be to find out what’s occupying port 80 of your server with the following command:
root@li1984-106:~# lsof -i :80 | grep LISTEN
nginx 623 root 6u IPv4 17941 0t0 TCP *:http (LISTEN)
nginx 623 root 7u IPv6 17942 0t0 TCP *:http (LISTEN)
nginx 624 www-data 6u IPv4 17941 0t0 TCP *:http (LISTEN)
nginx 624 www-data 7u IPv6 17942 0t0 TCP *:http (LISTEN)
As you can see, in my case, the server’s default nginx service is causing a conflict with EasyEngine.
To be specific, the server’s default nginx service is running globally as a native UNIX process, whereas EasyEngine’s (supposedly) global nginx service is running inside a Docker container that has a lower priority than the server’s counterpart. Hence EasyEngine’s nginx proxy could not use port 80 because it’s already occupied.
I did some digging and turns out one of the maintainers of EasyEngine also advised against running the server’s nginx alongside EasyEngine. Well, that surely is a vote of confidence.
Let’s stop the server’s nginx process and check on the usage of port 80 again:
root@li1984-106:~# service nginx stop
root@li1984-106:~# lsof -i :80 | grep LISTEN
# now it prints nothing, meaning nothing's using port 80
Now with the ports sorted out, we need to restart the EasyEngine global services that are stuck in a problematic state. This is done by stopping and removing all the Docker container instances and rebuilding them again inside EasyEngine’s service directory:
root@li1984-106:~# cd /opt/easyengine/services/
root@li1984-106:/opt/easyengine/services# docker-compose down
Stopping services_global-newrelic-daemon_1 ... done
Stopping services_global-nginx-proxy_1 ... done
Stopping services_global-redis_1 ... done
Stopping services_global-db_1 ... done
Removing services_global-newrelic-daemon_1 ... done
Removing services_global-nginx-proxy_1 ... done
Removing services_global-redis_1 ... done
Removing services_global-db_1 ... done
Network ee-global-frontend-network is external, skipping
Network ee-global-backend-network is external, skipping
root@li1984-106:/opt/easyengine/services# docker-compose up -d
Creating services_global-db_1 ... done
Creating services_global-redis_1 ... done
Creating services_global-nginx-proxy_1 ... done
Creating services_global-newrelic-daemon_1 ... done
And just for your reference, this is what happens when you try to execute this step while your port is occupied:
root@li1984-106:~# cd /opt/easyengine/services/
root@li1984-106:/opt/easyengine/services# docker-compose down
Stopping services_global-newrelic-daemon_1 ... done
Stopping services_global-redis_1 ... done
Removing services_global-nginx-proxy_1 ... done
Removing services_global-newrelic-daemon_1 ... done
Removing services_global-db_1 ... done
Removing services_global-redis_1 ... done
Network ee-global-frontend-network is external, skipping
Network ee-global-backend-network is external, skipping
root@li1984-106:/opt/easyengine/services# docker-compose up -d
Creating services_global-newrelic-daemon_1 ...
Creating services_global-nginx-proxy_1 ... error
Creating services_global-redis_1 ...
Creating services_global-newrelic-daemon_1 ... done
Creating services_global-redis_1 ... done
Creating services_global-db_1 ... done
ERROR: for global-nginx-proxy Cannot start service global-nginx-proxy: driver failed programming external connectivity on endpoint services_global-nginx-proxy_1 (2e44b6924f68439bfe437381d90ee5b38d55d91b8c367ff6710eae72e2df51bb): Error starting userland proxy: listen tcp 0.0.0.0:80: bind: address already in use
ERROR: Encountered errors while bringing up the project.
Now let’s verify the Docker container’s status, specifically the previously problematic services_global-nginx-proxy_1. We can see that now it’s using and forwarding traffic from port 80 as expected:
root@li1984-106:~# docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
def4bc5316aa easyengine/nginx-proxy:v4.1.4 "/app/docker-entrypo…" 2 minutes ago Up 2 minutes 0.0.0.0:80->80/tcp, 0.0.0.0:443->443/tcp services_global-nginx-proxy_1
The last step would be to re-enable your WordPress sites:
root@li1984-106:~# ee site enable yoursite.net
Enabling site yoursite.net.
Success: Site yoursite.net enabled.
Running post enable configurations.
Starting site's services.
Global auth exists on admin-tools. Use `ee auth list global` to view credentials.
Success: admin-tools enabled for yoursite.net site.
You should see your WordPress sites up and running again.
Congrats if that’s the case. If not, please leave a comment below to share the specific problem you encountered.