Hey,
I was reviewing a PR where it seemed like somethnig odd could go on with a context with a timeout that could be reached earlier than expected, and that got me thinking about how could it be that Go implement context cancellation for HTTP requests under the hood.
Here in this post I go through the exploration, getting to the point where we end up with a code in C that looks similar to what Go does, understanding what mechanisms are involved.
a regular tcp client
With a typical implementation of a TCP client that we usually learn when getting started with sockets, one would end up with something like this:
int
main(int argc, char** argv)
{
struct sockaddr_in addr = { 0 };
int fd;
if (!~(fd = socket(AF_INET, SOCK_STREAM, IPPROTO_IP))) {
perror("socket");
return 1;
}
addr.sin_family = AF_INET;
addr.sin_port = htons(PORT);
inet_pton(AF_INET, HOST, &addr.sin_addr);
if (!~connect(fd, (struct sockaddr*)&addr, sizeof(addr))) {
perror("fd: connect");
return 1;
}
do_read(fd);
do_write(fd);
// ..
}
with the subsequent reads and writes looking like the following
int
do_read(int sock_fd)
{
char buf[BUFSIZE] = { 0 };
if (!~read(sock_fd, buf, BUFSIZE)) {
perror("read");
return -1;
}
printf("read: '%s'\n", buf);
return 0;
}
int
do_write(int sock_fd)
{
const char* out_msg = "GET / HTTP/1.1\r\nHost: 127.0.0.1\r\n\r\n";
if (!~write(sock_fd, out_msg, strlen(out_msg))) {
perror("write");
return -1;
}
return 0
}
there’s the problem by being blocking calls, they’ll essentially block for as long as other internal (very long) timeouts hold.
in the case of a language like Go, where one is supposed to perform IO operations (specially networking ones) without having to worry too much about how those will perform, going with blocking calls wouldn’t really work - that’d be too costly as that’d require at least one thread for each of those calls if doing them concurrently.
making it non-blocking
making those non-blocking takes quite a bit more of code, but the idea is not that complicated. others have talkd way more in depth about this than I plan to do here, so I’ll keep it short.
the idea is that rather than asking the kernel for something and waiting until what we want is available, we could instead let the kernel know that at some point we’d like to have it, but not necesarilly right now - when ready, just let us know.
on Linux, epoll
is a common mechanism to do that pretty much that: add the
socket to an epoll
, and when ready to read data from it, just do it without
blocking.
create the socket in non-blocking mode
socket(AF_INET, SOCK_STREAM|SOCK_NONBLOCK, IPPROTO_IP) = 3
partially start connection - as this is blocking, it'll happen in the
background, and we'll be notified later
connect(3, {sa_family=AF_INET, sin_port=htons(1337), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EINPROGRESS (Operation now in progress)
create the epoll device
epoll_create1(0) = 4
add the socket to the epoll device, expecting only notifications for the
ability to `write` on that socket fd
epoll_ctl(4, EPOLL_CTL_ADD, 3, {EPOLLOUT, {u32=3, u64=3}}) = 0
wait for that
epoll_wait(4, [{EPOLLOUT, {u32=3, u64=3}}], 32, -1) = 1
now that we're ready for a write (`connect` finished), check if there
were any errors (remember, this is non-blocking, so, if there were any
errors, we'd see them stored in the socket)
getsockopt(3, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
write to the socket (here we send our HTTP request)
write(3, "GET / HTTP/1.1\r\nHost: 127.0.0.1\r"..., 35) = 35
with the request sent, now only wait for EPOLLIN (as we want to `read`)
(ps.: I could've used `EPOLL_CTL_MOD` rather then trying an ADD followed by a
DEL and another ADD)
epoll_ctl(4, EPOLL_CTL_ADD, 3, {EPOLLIN, {u32=3, u64=3}}) = -1 EEXIST (File exists)
epoll_ctl(4, EPOLL_CTL_DEL, 3, NULL) = 0
epoll_ctl(4, EPOLL_CTL_ADD, 3, {EPOLLIN, {u32=3, u64=3}}) = 0
wait for the ability to read
epoll_wait(4, [{EPOLLIN, {u32=3, u64=3}}], 32, -1) = 1
read from the socket (without blocking)
read(3, "HTTP/1.1 200 OK\r\nDate: Sat, 28 D"..., 4096) = 120
If you’re curious about how this looks codewise, I’ve implemented an example
client that sends a GET / HTTP/1.1
to an HTTP server and then reads is
response using that mechanism here:
`cirocosta/http-ctx-cancellation#http-client.c.
cancelling a long read
As you can tell from those traced syscalls above, there’s plently of opportunity between a blocking action and another to stop the whole thing.
For instance, consider that we’re communicting with a server that’s very far away, with a terrible internet connection.
Trying to read(2)
from it would result in a blocking action - read(2)
would
return EAGAIN
(or EWOULDBLOCK
) right away.
At this point, we could either decide to wait for a little longer (either
through epoll
or nanosleep
), or perhaps decide not to wait, because we’ve
already waited to long. In the last case, we’d be doing pretty much what Go does
in terms of request cancellation - give up on waiting, destroy that
connection1 (close(2)
it).
ret = read(sock, buf, 4096)
if (!~ret) {
if (errno == EAGAIN && time_elapsed > threshold) {
close(sock);
}
return = ERR_CONTEXT_DEADLINE_EXCEEDED;
}
printf("%s\n", buf);
return OK;
observing Go’s behavior on context cancellation
We can see that close(2)
being issued by Go is indeed the one that we
expect by looking at the syscalls that it’s performing (filtering some stuff
out):
strace -f -e 'trace=!futex,nanosleep' ./http
1. non-blocking socket gets created
socket(AF_INET, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP) = 3
setsockopt(3, SOL_SOCKET, SO_BROADCAST, [1], 4) = 0
connect(3, {sa_family=AF_INET, sin_port=htons(1337), sin_addr=inet_addr("127.0.0.1")}, 16) = -1
EINPROGRESS (Operation now in progress)
2. socket gets added to epoll facility
epoll_ctl(4, EPOLL_CTL_ADD, 3, {
EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, {u32=812239112, u64=139655968835848}}) = 0
3. wait on the fds that were added - once `connect` finishes, we
should get an event for that fd
epoll_pwait(4, [{
EPOLLOUT, {u32=812239112, u64=139655968835848}}], 128, 0, NULL, 824634156712) = 1
4. check if there were any errors in the conn
getsockopt(3, SOL_SOCKET, SO_ERROR, <unfinished ...>
5. write to the socket
write(3, "GET / HTTP/1.1\r\nHost: localhost:"..., 95
6. try to read from it
read(3, <unfinished ...>
<... read resumed> 0xc00011e000, 4096)
-1 EAGAIN (Resource temporarily unavailable)
// ...
7. would block, so lets continue waiting ..
8. deadline reached, all we gotta do is remove from the set and
close it
epoll_ctl(4, EPOLL_CTL_DEL, 3, 0xc000132984) = 0
close(3) = 0
And to be absolutely sure about that, we can even trace with bpftrace
how we
get from userspace down to the sys_enter_close
tracepoint (where we can
observe the close(2)
syscall from the kernel perspective), and verify that it
was indeed after a read that took too long:
bpftrace -e 'tracepoint:syscalls:sys_enter_close / comm == "http" / { printf("%s", ustack); }'
syscall.Syscall+48
internal/poll.(*FD).destroy+67
internal/poll.(*FD).readUnlock+81
internal/poll.(*FD).Read+519
net.(*netFD).Read+79
net.(*conn).Read+104
net/http.(*persistConn).Read+117
bufio.(*Reader).fill+259
bufio.(*Reader).Peek+79
net/http.(*persistConn).readLoop+470
-
in the case of HTTP2 where there’s a single connection for multiple requests, that’s not really a connection destroy, but more of a “stream cancellation” AFAIK (please let me know if I’m wrong) ↩︎