DTrace - you really should you know

Posted on Mar 11, 2004

One of the nicer new features in Solaris 10 is DTrace. For an introduction and background information, see the introduction posted to comp.unix.solaris by Bryan Cantrill.

As a very quick example, DTrace made it easy to debug a problem with link state notifications in GLD a couple of days ago (it's bug 5008363 if you care). If an interface is plumbed but not marked up then no link state indications would be received by IP. Given that only state changes are sent, later marking the link up didn't cause any of the missed indications to be received.

Anyway, a quick check of the GLD code indicated that link state notifications are sent in a couple of places, meaning that it was easy to watch them with DTrace:

#!/usr/sbin/dtrace -s

BEGIN { self->tracing = 0; }

gld_notify_ind:entry { self->tracing = 1; }

gld_notify_ind:return { self->tracing = 0; }

qreply:entry /self->tracing == 1/ { printf("\ntime: %d note: 0x%x", timestamp, ((dl_notify_ind_t *)((mblk_t *)arg1)->b_rptr)->dl_notification); stack(5); }

gld_notify_qs:entry { printf("\ntime: %d note: 0x%x", timestamp, arg2); stack(5); }

From this script you can probably intuit that notifications are sent using qreply() from gld_notify_ind() and by gld_notify_qs().

The script demonstrates that GLD doesn't send notifications if a stream is not bound to a particular SAP. That's the case when IP has plumbed an interface but not marked it up.

The fix was simple - two lines of code. What is interesting is that DTrace allowed debugging of the problem without changing any code, rebooting any systems or disturbing them in any significant way (it was necessary to bring interfaces up and down a few times).

If you are a kernel developer on Solaris you need to look into DTrace. For those with Solaris source it may not enable you to do anything that you couldn't do before, but it makes it so easy to do some things that you actually bother. In this case I could run my script on half a dozen machines in 20 minutes. Previously I'd either have had to build a new gld module for those machines and install it or wrestle with mdb. If you don't have Solaris source it may be even more helpful - now you can trace what is happening in a particular part of the kernel without resorting to mdb at all!

Even user-level application developers should take a look - DTrace can help there too, it's just not my own area of interest.

To get hold of Solaris 10 now, check out Solaris Express.