The Day "/proc" Died

#solaris #sysadmin #story

Solaris'd...

This morning I got an email from our group manager asking about a UNIX server out at one of the districts:

Date: Thu, 27 Jan 2011 14:14:49
From: "UNIX Manager"
To: UNIX_SA_GROUP
Subject: Milky Way District UNIX Server

Someone take a look into whats going on with Milky Way District UNIX Server today.
The server is up but is acting really strange, I was not able to get a
response from ps or ptree commands.  Please note, there was not any server
consolidation work at this site last night.

--
A. Manager
Enterprise Unix & Storage Systems Manager
Spacely Sprockets, Inc.

So I logged in expecting a mundane failure:

workstation$ ssh milkyway1
milkyway1$ ps -eaf
     UID   PID  PPID   C    STIME TTY         TIME CMD
    root     0     0   0   Dec 08 ?           0:45 sched
    root     1     0   0   Dec 08 ?          72:15 /sbin/init
    root     2     0   0   Dec 08 ?           0:00 pageout
    root     3     0   0   Dec 08 ?        5514:24 fsflush
    root    59     1   0   Dec 08 ?           0:00 /lib/svc/method/iscsid
    root     7     1   0   Dec 08 ?           1:48 /lib/svc/bin/svc.startd
    root     9     1   0   Dec 08 ?           3:13 /lib/svc/bin/svc.configd
    root   149     1   0   Dec 08 ?           0:04 devfsadmd
  daemon   168     1   0   Dec 08 ?          73:17 /usr/lib/crypto/kcfd
^C
^C

~.

So far, so good. It does indeed appear to be broken. It probably can't talk to its LDAP server or something silly. I check "nsswitch.conf" and the only thing configured for "ldap" is "sudoers" and "netgroup". Suspiciously normal.

workstation$ ssh milkyway1
milkyway1$ sudo -i
milkyway1# cd /proc
milkyway1# echo *
0 1 10045 10116 10162 10163 10169 10176 10240 10243 10244 10246 10263 10318 10386
10483 10489...
milkyway1# ls -ln
^C
^C

~.

Hmm. Curiouser and curiouser. Why is my "ls" hanging ? What HAS science done ? Let's take a look at what the kernel thinks about what is going on with that process while it is "hung" (being able to attach a debugger to the running kernel is one of the MOST USEFUL things about Solaris. However it sometimes gets abused, which can be tragic.

workstation$ ssh milkyway1
milkyway1$ sudo -i
milkyway1# ls -ln /proc & echo $!
20023
milkyway1# mdb -k
> 0t20023::pid2proc
302765d6078
> 302765d6078::walk thread
300e3d5e100
> 300e3d5e100::findstack -v
stack pointer for thread 300e3d5e100: 2a10d548d71
[ 000002a10d548d71 cv_wait+0x38() ]
  000002a10d548e21 pr_p_lock+0x80(0, 60032d48030, 60032d58600, 300b84af968, ff000000, 18f8218)
  000002a10d548ed1 prgetattr+0x2d4(30025ce8240, 2a10d549998, 149, ffffffffffffffef, 300bd1a8da8, 0)
  000002a10d548f91 fop_getattr+0x18(30025ce8240, 2a10d549998, 0, 3025fb17d00, 2a10d549ad8, 1362e00)
  000002a10d549041 cstat64_32+0x1c(30025ce8240, ffbffb20, 0, 3025fb17d00, 3fff, 3c00)
  000002a10d549221 cstatat64_32+0x5c(ffffffffffd19553, 26578, 1000, ffbffb20, 1000, 0)
  000002a10d5492e1 syscall_trap32+0xcc(26578, ffbffb20, ffffffffffffffff, 27f68, 6c, 1b)
>

Simple enough, it seems to be waiting for a conditional variable (cv_wait()) as part of a mutex lock after a 32-bit process has made a system call (syscall_trap32()) for stat() with the file whose name is stored at 0x26578.

This pointer will probably point to a string that is a filename under "/proc" since that's what I was running "ls" on after all. I'll have to switch to the context of that process to interpret that memory.

> 302765d6078 $p
debugger context set to proc 302765d6078
Segmentation Fault
milkyway1#

Oh. Well, that wasn't nice. Let's try it another way...

milkyway1# mdb -p 20023 
mdb: failed to initialize /lib/libc_db.so.1: libthread_db call failed unexpectedly
mdb: warning: debugger will only be able to examine raw LWPs
Loading modules: [ ld.so.1 libc.so.1 libavl.so.1 ]
> 26578::print -i char*
0x26578 "/proc/26936"
>

So my "ls" is hanging because "stat()" is being called on "/proc/26936". What is running with the PID of 26936 ? Hmm... Well, we can't just run "ps" since that will "stat()" that file in "/proc" and hang. The modular debugger to the rescue, again:

milkyway1# echo '::ps -fz' | mdb -k | grep 26936
S    PID   PPID   PGID    SID  ZONE    UID      FLAGS             ADDR NAME
R  26936  26935  26935  26935     3   1000 0x4a014000 00000300b84af968

So that process corresponds to a command with no name (what ?) run by the UID 1000 in the zone 3. Now we know the "who", what about the "what in the world is it doing?" ?

milkyway1# mdb -k
Loading modules: [ unix genunix specfs dtrace ufs sd mpt px md ldc ip hook neti sctp
arp usba fcp fctl emlxs qlc lofs zfs ssd random crypto fcip logindmux ptm nfs ipc ]
> 00000300b84af968::walk thread
3000f654d40
> 3000f654d40::findstack -v
stack pointer for thread 3000f654d40: 2a109f38a61
[ 000002a109f38a61 cv_wait+0x38() ]
  000002a109f38b11 txg_wait_open+0x54(6003f983d20, 16533a, 0, 6003f983d64, 6003f983d66, 6003f983d18)
  000002a109f38bc1 zfs_putapage+0x1e0(6004d6fae40, 5b, 2a109f39570, 2a109f39568, 400, 10a6ac0)
  000002a109f38cb1 zfs_putpage+0x1b8(3002836dd40, b7c1a000, 0, 400, 30114c15318, 7000f9c9d80)
  000002a109f38d81 fop_putpage+0x1c(3002836dd40, 0, b7c1a000, 400, 30114c15318, 7b262c7c)
  000002a109f38e31 zfs_delmap+0x6c(3002836dd40, 0, 10a6800, 30114c15318, b7c1a000, b)
  000002a109f38ef1 zfs_shim_delmap+0x3c(3002836dd40, 30114c15318, 1, 1, f, b)
  000002a109f38fc1 fop_delmap+0x40(3002836dd40, 30114c15318, 1, f, b7c1a000, b)
  000002a109f39091 segvn_unmap+0x180(3017e8a0bd0, fffffffec0000000, b7c1a000, b7c1a000, f, 600439d6b78)
  000002a109f39181 as_unmap+0xe4(300ee744d20, 3017e8a0bd0, 300de51bd88, b7c1a000, 1, 1)
  000002a109f39231 munmap+0x78(1fff, b7c1a000, 10a6800, 300b84af968, 300b84af968, fffffffec0000000)
  000002a109f392e1 syscall_trap+0xac(fffffffec0000000, b7c1a000, 0, 8, 10012c4d0, 10012c460)
>

Hmm, well that process ALSO seems to be waiting on a conditional variable in a mutex lock as part of the ZFS kernel module. Maybe they're related ?

I look at the storage available to the zone with the ID of 3 (obtained via "zoneadm list -v") and noticed that the zpool that provides most of the storage to that zone (including it's "/") is 100% full.

I send out an email detailing my findings and someone logs into that zone and cleans a few files up and things magically start working again.

Hooray.

Top comments (6)

Franco Traversaro • Jul 3 '19

As a rule of thumb, whenever I see that a server is acting strangely I just ‘df -h‘: I learned the hard way that a full filesystem can make unpredictable and mysterious things! 😣

𒎏Wii 🏳️‍⚧️ • Jul 3 '19 • Edited

Same. I don't know that much about Linux, but when something is acting up, the things I check are usually 1. Have I tried turning it off and on again? 2. Is there enough RAM? 3. Is there enough disk space? 4. Is there someone smarter than me who could fix this?

Franco Traversaro • Jul 7 '19

I agree with you, I just leave the option one as last resource because the smarter people could gain advantages from information lost in reboot, and I think that understand the origin of probles is more important that simply fix it :-)

Ben Sinclair • Nov 9 '19

If that looks ok, follow it up with df -i because you can have available space without having any blocks left. That caught me out a couple of times.

Shawn McElroy • Jul 3 '19

Other than randomly seeing if the disk is full like Franco Mentioned, as I've had that happen before, there is no way I would have been able to find the cause like this. It makes me glad there's always smarter developers than I.