profile picture

Michael Stapelberg

Debug Go core dumps with delve: export byte slices

published 2024-10-22
in tag golang
Edit Icon
Table of contents

Not all bugs can easily be reproduced — sometimes, all you have is a core dump from a crashing program, but no idea about the triggering conditions of the bug yet.

When using Go, we can use the delve debugger for core dump debugging, but I had trouble figuring out how to save byte slice contents (for example: the incoming request causing the crash) from memory into a file for further analysis, so this article walks you through how to do it.

Simple Example

Let’s imagine the following scenario: You are working on a performance optimization in Go Protobuf and have accidentally badly broken the proto.Marshal function. The function is now returning an error, so let’s run one of the failing tests with delve:

~/protobuf/proto master % dlv test
(dlv) b ExampleMarshal
(dlv) c
> [Breakpoint 1] google.golang.org/protobuf/proto_test.ExampleMarshal() ./encode_test.go:293 (hits goroutine(1):1 total:1) (PC: 0x9d6c96)
(dlv) next 4
> google.golang.org/protobuf/proto_test.ExampleMarshal() ./encode_test.go:297 (PC: 0xb54495)
   292: // [google.golang.org/protobuf/types/known/durationpb.New].
   293: func ExampleMarshal() {
   294: b, err := proto.Marshal(&durationpb.Duration{
   295: Nanos: 125,
   296: })
=> 297: if err != nil {
   298: panic(err)
   299: }
   300:
   301: fmt.Printf("125ns encoded into %d bytes of Protobuf wire format:\n% x\n", len(b), b)
   302:

Go Protobuf happens to return the already encoded bytes even when returning an error, so we can inspect the b byte slice to see how far the encoding got before the error happened:

(dlv) print b
[]uint8 len: 2, cap: 2, [16,125]

In this case, we can see that the entire (trivial) message was encoded, so our error must happen at a later stage — this allows us to rule out a large chunk of code in our search for the bug.

But what would we do if a longer part of the message was displayed and we wanted to load it into a different tool for further analysis, e.g. the excellent protoscope?

The low-tech approach is to print the contents and copy&paste from the delve output into an editor or similar. This stops working as soon as your data contains non-printable characters.

We have multiple options to export the byte slice to a file:

  1. We could add os.WriteFile("/tmp/b.raw", b, 0644) to the source code and re-run the test. This is definitely the simplest option, as it works with or without a debugger.

  2. As long as delve is connected to a running program, we can use delve’s call command to just execute the same code without having to add it to our source:

    (dlv) call os.WriteFile("/tmp/b.raw", b, 0644)
    (dlv)
    

Notably, both options only work when you can debug interactively. For the first option, you need to be able to change the source. The second option requires that delve is attached to a running process that you can afford to pause and interactively control.

These are trivial requirements when running a unit tests on your local machine, but get much harder when debugging an RPC service that crashes with specific requests, as you need to only run your changed debugging code for the troublesome requests, skipping the unproblematic requests that should still be handled normally.

Core dump debugging with Go

So let’s switch example: we are no longer working on Go Protobuf. Instead, we now need to debug an RPC service where certain requests crash the service. We’ll use core dump debugging!

Core memory

In case you’re wondering: The name “core dump” comes from magnetic-core memory. These days we should probably say “memory dump” instead. The picture above shows an exhibit from the MIT Museum (Core Memory Unit, Bank C (from Project Whirlwind, 1953-1959)), a core memory unit with 4 KB of capacity.

To make Go write a core dump when panicing, run your program with the environment variable GOTRACEBACK=crash set (all possible values are documented in the runtime package).

You also need to ensure your system is set up to collect core dumps, as they are typically discarded by default:

  • On Linux, the easiest way is to install systemd-coredump(8) , after which core dumps will automatically be collected. You can use coredumpctl(1) to list and work with them.
  • On macOS, you can enable core dump collection, but delve cannot open macOS core dumps. Luckily, macOS is rarely used for production servers.
  • I don’t know about Windows and other systems.

You can find more details and options in the CoreDumpDebugging page of the Go wiki. For this article, we will stick to the coredumpctl route:

We’ll use the gRPC Go Quick start example, a greeter client/server program, and add a panic() call to the server SayHello handler:

% cd greeter_server
% go build -gcflags=all="-N -l"  # disable optimizations
% GOTRACEBACK=crash ./greeter_server
2024/10/19 21:48:01 server listening at [::]:50051
2024/10/19 21:48:03 Received: world
panic: oh no!

goroutine 5 gp=0xc000007c00 m=5 mp=0xc000100008 [running]:
panic({0x83ca60?, 0x9a3710?})
	/home/michael/sdk/go1.23.0/src/runtime/panic.go:804 +0x168 fp=0xc000169850 sp=0xc0001697a0 pc=0x46fe88
main.(*server).SayHello(0xcbb840?, {0x877200?, 0xc000094900?}, 0x4a6f25?)
	/home/michael/go/src/github.com/grpc/grpc-go/examples/helloworld/greeter_server/main.go:45 +0xbf fp=0xc0001698c0 sp=0xc000169850 pc=0x8037ff
[…]
signal: aborted (core dumped)

The last line is what we want to see: it should say “core dumped”.

We can now use coredumpctl(1) to launch delve for this program + core dump:

% coredumpctl debug --debugger=dlv --debugger-arguments=core
           PID: 1729467 (greeter_server)
           UID: 1000 (michael)
           GID: 1000 (michael)
        Signal: 6 (ABRT)
     Timestamp: Sat 2024-10-19 21:50:12 CEST (1min 49s ago)
  Command Line: ./greeter_server
    Executable: /home/michael/go/src/github.com/grpc/grpc-go/examples/helloworld/greeter_server/greeter_server
 Control Group: /user.slice/user-1000.slice/session-1.scope
          Unit: session-1.scope
         Slice: user-1000.slice
       Session: 1
     Owner UID: 1000 (michael)
       Storage: /var/lib/systemd/coredump/core.greeter_server.1000.zst (present)
  Size on Disk: 204.7K
       Message: Process 1729467 (greeter_server) of user 1000 dumped core.
                
                Module /home/michael/go/src/github.com/grpc/grpc-go/examples/helloworld/greeter_server/greeter_server without build-id.
                Stack trace of thread 1729470:
                #0  0x0000000000479461 n/a (greeter_server + 0x79461)
[…]
                ELF object binary architecture: AMD x86-64

Type 'help' for list of commands.
(dlv) bt
 0  0x0000000000479461 in runtime.raise
    at /home/michael/sdk/go1.23.0/src/runtime/sys_linux_amd64.s:154
 1  0x0000000000451a85 in runtime.dieFromSignal
    at /home/michael/sdk/go1.23.0/src/runtime/signal_unix.go:942
 2  0x00000000004520e6 in runtime.sigfwdgo
    at /home/michael/sdk/go1.23.0/src/runtime/signal_unix.go:1154
 3  0x0000000000450a85 in runtime.sigtrampgo
    at /home/michael/sdk/go1.23.0/src/runtime/signal_unix.go:432
 4  0x0000000000479461 in runtime.raise
    at /home/michael/sdk/go1.23.0/src/runtime/sys_linux_amd64.s:153
 5  0x0000000000451a85 in runtime.dieFromSignal
    at /home/michael/sdk/go1.23.0/src/runtime/signal_unix.go:942
 6  0x0000000000439551 in runtime.crash
    at /home/michael/sdk/go1.23.0/src/runtime/signal_unix.go:1031
 7  0x0000000000439551 in runtime.fatalpanic
    at /home/michael/sdk/go1.23.0/src/runtime/panic.go:1290
 8  0x000000000046fe88 in runtime.gopanic
    at /home/michael/sdk/go1.23.0/src/runtime/panic.go:804
 9  0x00000000008037ff in main.(*server).SayHello
    at ./main.go:45
10  0x00000000008033a6 in google.golang.org/grpc/examples/helloworld/helloworld._Greeter_SayHello_Handler
    at /home/michael/go/src/github.com/grpc/grpc-go/examples/helloworld/helloworld/helloworld_grpc.pb.go:115
11  0x00000000007edeeb in google.golang.org/grpc.(*Server).processUnaryRPC
    at /home/michael/go/src/github.com/grpc/grpc-go/server.go:1394
12  0x00000000007f2eab in google.golang.org/grpc.(*Server).handleStream
    at /home/michael/go/src/github.com/grpc/grpc-go/server.go:1805
13  0x00000000007ebbff in google.golang.org/grpc.(*Server).serveStreams.func2.1
    at /home/michael/go/src/github.com/grpc/grpc-go/server.go:1029
14  0x0000000000477c21 in runtime.goexit
    at /home/michael/sdk/go1.23.0/src/runtime/asm_amd64.s:1700
(dlv) 

Alright! Now let’s switch to frame 9 (our server’s SayHello handler) and inspect the Name field of the incoming RPC request:

(dlv) frame 9
> runtime.raise() /home/michael/sdk/go1.23.0/src/runtime/sys_linux_amd64.s:154 (PC: 0x482681)
Warning: debugging optimized function
Frame 9: ./main.go:45 (PC: aaabf8)
    40:	}
    41:	
    42:	// SayHello implements helloworld.GreeterServer
    43:	func (s *server) SayHello(_ context.Context, in *pb.HelloRequest) (*pb.HelloReply, error) {
    44:		log.Printf("Received: %v", in.GetName())
=>  45:		panic("oh no!")
    46:		return &pb.HelloReply{Message: "Hello " + in.GetName()}, nil
    47:	}
    48:	
    49:	func main() {
    50:		flag.Parse()
(dlv) p in
("*google.golang.org/grpc/examples/helloworld/helloworld.HelloRequest")(0xc000120100)
*google.golang.org/grpc/examples/helloworld/helloworld.HelloRequest {
[…]
	unknownFields: []uint8 len: 0, cap: 0, nil,
	Name: "world",}

In this case, it’s easy to see that the Name field was set to world in the incoming request, but let’s assume the request contained lots of binary data that was not as easy to read or copy.

How do we write the byte slice contents to a file? In this scenario, we cannot modify the source code and delve’s call command does not work on core dumps (only when delve is attached to a running process):

(dlv) call os.WriteFile("/tmp/name.raw", in.Name, 0644)
> runtime.raise() /home/michael/sdk/go1.23.0/src/runtime/sys_linux_amd64.s:154 (PC: 0x482681)
Warning: debugging optimized function
Command failed: can not continue execution of core process

Luckily, we can extend delve with a custom Starlark function to write byte slice contents to a file.

Exporting byte slices with writebytestofile

You need a version of dlv that contains commit https://github.com/go-delve/delve/commit/52405ba86bd9e14a2e643db391cbdebdcbdb3368. Until the commit is part of a released version, you can install the latest dlv directly from git:

% go install github.com/go-delve/delve/cmd/dlv@master

Save the following Starlark code to a file, for example ~/dlv_writebytestofile.star:

# Syntax: writebytestofile <byte slice var> <output file path>
def command_writebytestofile(args):
	var_name, filename = args.split(" ")
	s = eval(None, var_name).Variable
	mem = examine_memory(s.Base, s.Len).Mem
	write_file(filename, mem)

Then, in delve, load the Starlark code and run the function to export the byte slice contents of in.Name to /tmp/name.raw:

% coredumpctl debug --debugger=dlv --debugger-arguments=core
(dlv) frame 9
(dlv) source ~/dlv_writebytestofile.star
(dlv) writebytestofile in.Name /tmp/name.raw

Let’s verify that we got the right contents:

% hexdump -C /tmp/name.raw
00000000  77 6f 72 6c 64                                    |world|
00000005

Core dump debugging with net/http servers

When you want to apply the core dump debugging technique on a net/http server (instead of a gRPC server, as above), you will notice that panics in your HTTP handlers do not actually result in a core dump! This code in go/src/net/http/server.go recovers panics and logs a stack trace:

defer func() {
    if err := recover(); err != nil && err != ErrAbortHandler {
        const size = 64 << 10
        buf := make([]byte, size)
        buf = buf[:runtime.Stack(buf, false)]
        c.server.logf("http: panic serving %v: %v\n%s", c.remoteAddr, err, buf)
    }
}()

Or, in other words: the GOTRACEBACK=crash environment variable configures what happens for unhandled signals, but this signal is handled with the recover() call, so no core is dumped.

This default behavior of net/http servers is now considered regrettable but cannot be changed for compatibility. (We probably can add a struct field to optionally not recover panics, though. I’ll update this paragraph once there is a proposal.)

So, what options do we have in the meantime?

We could recover panics in our own code (before net/http’s panic handler is called), but then how do we produce a core dump from our own handler?

A closer look reveals that the Go runtime’s crash function is defined in signal_unix.go and sends signal SIGABRT with the dieFromSignal function to the current thread:

//go:nosplit
func crash() {
        dieFromSignal(_SIGABRT)
}

The default action for SIGABRT is to “terminate the process and dump core”, see signal(7) .

We can follow the same strategy and send SIGABRT to our process:

func main() {
	http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
		defer func() {
			if err := recover(); err != nil {
				proc, err := os.FindProcess(syscall.Getpid())
				if err != nil {
					panic(fmt.Sprintf("could not find own process (pid %d): %v", syscall.Getpid(), err))
				}
				proc.Signal(syscall.SIGABRT)
				// Ensure the stack triggering the core dump sticks around
				proc.Wait()
			}
		}()
		// …buggy handler code goes here; for illustration we panic
		panic("this should result in a core dump")
	})
	log.Fatal(http.ListenAndServe(":8080", nil))
}

There is one caveat: If you have any non-Go threads running in your program, e.g. by using cgo, they might pick up the signal, so ensure they do not install a SIGABRT handler (see also: cgo-related documentation in os/signal).

If this is a concern, you can make the above code more platform-specific and use the tgkill(2) syscall to direct the signal to the current thread, as the Go runtime does.

Conclusion

Core dump debugging can be a very useful technique to quickly make progress on otherwise hard-to-debug problems. In small environments (single to few Linux servers), core dumps are easy enough to turn on and work with, but in larger environments you might need to invest into central core dump collection.

I hope the technique shown above comes in handy when you need to work with core dumps.

I run a blog since 2005, spreading knowledge and experience for almost 20 years! :)

If you want to support my work, you can buy me a coffee.

Thank you for your support! ❤️

Table Of Contents