Friday, November 23, 2007

Why do sockets die?

I'm testing a network lib, which uses select for polling socket.

I'm running a stress test, which connects 100 sockets to a server socket (all in the same process), then echoes data back and forth as quickly as possible. If a socket dies, it gets removed. Each loop iteration I print time passed, and the number of active sockets left.

As it runs, I watch the number of sockets slowly decline, until I have a set of 15 sockets left, which seem to keep running happily. Why do the other 85 sockets die? They either raise ECONNRESET, EPIPE or ETIMEDOUT. I imagined sockets connected via localhost would be quite reliable...

Update: The same test between two different machines does _not_ show this same problem. So what's up with localhost?

6 comments:

Marius Gedminas said...

What about the same test but between two different _processes_ on the same machine?

Simon Wittber said...

I get the same issue when running two different processes.

Maximum sustained socket count is 15-20 sockets.

Anonymous said...

Why, I don't know but I have observerd the same behavior. I suspected that when some queues are full, localhost network just drops the socket.

I imagine that the same would be possible for different machines, but with much more sockets. So: Catch those errors and retry, until the OS is happy. I imagine that that's what we were supposed to do anyway, and the kernel is based on that so it feels free to kill the sockets.

garylinux said...

What os. Is it Operating system dependent? Have you tried it on more then one OS?

Anonymous said...

Could be bandwidth. If you can overload the listening sockets on localhost due to unlimited bandwidth, that could be a problem. Could be a real network actually hides the problem by limiting data transfer speeds?

Unknown said...

Perhaps you are using an event loop which is timing out because the writer has filled the outgoing queue before the receiver can empty it?

Popular Posts