Day 3: Keynote: A Zero Copy Delayed Defragmentation Infrastructure for Linux



Dave Miller, who's well known for many things in the linux kernel, is one of the few people who have authority over IP networking in the kernel.

[picture]

David started to look into zero copy now that more ethernet hardware has the necessary support for zero copy and while some people just talked about how to possibly do it, Dave greatly favors actually doing the work, which a couple of people did and came back a few days with a sample implementation (Mingo and Alexey)

The changes are made on the transmit size by issuing sendfile to a TCP socket. There is little protection and for instance a user can write to the packet buffer before or while it is being sent, but that's ok because he/she is only screwing up his/her own data.

Ingo Molnar then worked on Tux 1.0 which is geared at doing http serving while minimizing data copy.

BSD has always been better than linux with NFS performance, because among other things it has chained buffers and handles fragments better.
Linux would however wait for all the fragments, and once they were received, they were copied into a new buffer, which does add overhead (actually, there are 3 copies). David worked on removing those copies to bring performance up.

Now that the NFS client side has been fixed, the server side (knfsd) still needs to be worked on.

Among several fun stories, Dave told us that they finally got Alexey to come at a conference (Ottawa) and he is indeed a single guy, not a roomfull of Russians working under the same Email address :-)

For hardware checksum support, most cards only do IPv4, some do IPv6 too, and the good cards actually support checksuming on any kind of dataset.

Right now, 3c59x, acenic, SunHME and loopback are working :-). Eepro100 might work but Intel hasn't been forthcoming about the docs, and considering the number of bugs in the hardware, and 7 different actual chips, no one knows for sure yet.
SysKonnect and tulip and known not to work.

Ingo deserves all the credit for Tux and SpecWeb99 results that came from it, the web accelerator that he designed gives really good numbers. Right now, Tux holds the SpecWeb99 record with 2, 4, and 8 CPUs, while IBM has the record for a higher number of CPUs.

Among the credits, Dave thanked Mindcraft for causing the linux developers to write all this code.

You can look at pictures of his slides in the picture library

[library] Picture library [back] Back to Main Page [next] Next page


[ms free site] Email
Link to Home Page

2001/01/28 (15:01): Version 1.0