I don’t normally post to this blog, as angelahighland.info’s primary purpose is to be a backup site in case my main site at angelahighland.com goes down.
But that presupposes that angelahighland.com is reliably working, and right now, it’s kinda not. It’s been prone to crashing ever since my wife and I upgraded our web server to Debian 10.
And by crash, I mean the following behavior:
- Timing of the crash is sometimes but not always at midnight, when the Apache logs rotate. I have seen this crash happen during the middle of the day as well.
- Result of the crash is that Apache doesn’t actually die, but any site dependent upon PHP falls over. Primarily, this means any site we’re running on WordPress. Other sites we have up that aren’t PHP-dependent remain accessible.
- Behavior when the impacted sites die is that if I try to hit any of the sites in my browser, I get an error from Varnish, the backend system we use to handle caching. The error says (I don’t have a screenshot, so I’m going by memory and what I can find on Google): “Error 503 Backend fetch failed”.
- In our Apache logs, I get a lot of lines that look like this.
error.log.1:[Sat Jan 11 23:50:28.982648 2020] [core:notice] [pid 24524] AH00052: child pid 29076 exit signal Segmentation fault (11)
Once I see this crash occur (or have one of our small number of site users contact me about it happening), then restarting Apache restores functionality. However, I’ve had to restart Apache a lot lately, and I’d really like to find a proper solution to the problem.
But I’ve been stymied by not actually knowing what’s causing the problem! Hence this call for help.
What I’ve tried so far:
- As part of the general server upgrade to Debian 10, we also upgraded PHP to version 7. I think we when did the original PHP upgrade, that may have taken us to 7.1? It was after the initial upgrade that we saw this crash behavior most frequently (though again, it has NOT stopped happening, it’s just slacked off some). On the theory that I may have confused something considerably in the PHP upgrade, I tried both updating to the current PHP available on Debian (7.3.11), and completely uninstalling and reinstalling PHP.
- I also tried upgrading from the current Apache available on Debian 10 to the latest version ported to buster-backports (2.4.41), which only bumped me ahead a couple of minor versions and did not fix the problem.
- Since moving ahead to Apache 2.4.41 didn’t help, and because I needed to do so in order to try to install the Apache debug symbols package, I returned to 2.4.38.
- WordPress has been through at least a couple of upgrades since the server upgrade occurred. Upgrading WordPress has not fixed the problem either.
- I went googling for any sign of bug reports on Debian.org with similar behavior, and the closest I got was this Debian.org bug from 2018 (which accordingly doesn’t seem exactly current enough to be pertinent), this Debian.org bug from 2015 (again, not exactly current, but it has commentary dating into 2019), and a few other pages from Stack Overflow and other types of general question sites. Nothing seemed like a good enough or current enough match to what I have going on.
- I followed these instructions to install gdb and do other configuration necessary to generate Apache coredumps. I also followed these Debian.org instructions on how to install Apache’s debug symbols package.
However, and here’s the critical part:
I do not know how to reproduce the error.
Now, in my day job as an SDET, this is arguably one of the most important things I need to be able to tell a dev team about a bug (short of, of course, that a bug exists and what it is). If a dev can’t reproduce a bug, they can’t analyze and fix it.
I have the same problem here. I do not have clue one what’s actually causing this problem, and until I figure that out, I can’t fix it.
And I am not familiar enough with debugging on a Linux environment to begin to get the data I need. I have gdb in place as well as the Apache debug symbols package, but I do not have experience with these, and don’t know how I can utilize them constructively. I’m not even a hundred percent sure this is a direct Apache bug, vs. maybe a PHP bug.
I can however say what environment we’re running: Debian 10 (straight up 10, we’re not on 10.2 yet), PHP 7.3, Apache 2.4.38, Varnish 6.1.1.
So, any Debian gurus out there able to advise me on how to proceed with information gathering? Alternately, if anyone has experience with a similar bug, were you able to find a solution?
Talk to me in the comments, and thanks in advance!
Editing to add 1/20/2020: I’m wrong, we are actually on Debian 10.2. We’ve kept up with regular updates since upgrading to 10, and I hadn’t realized that also meant we’d advance in minor versions.
Also, after talking with the folks on the Debian IRC channel, I’ve installed the systemd-coredump package on our server, to see if that’ll actually generate a coredump when Apache falls over. The issue IS still happening, as of 1/20!