Vectored AIO for Linux
This patch adds a vectored form of asynchronous I/O to Linux's libaio:
IO_CMD_PREADV and IO_CMD_PWRITEV. Here is a
test program.
I tested only on ext3, a raw O_DIRECT SCSI block device, and remote
NFS, but it should work also with ext2, jfs, and reiserfs. Xfs is not
yet supported. This patch also passes all the stress tests in ltp-aio*
in the Linux-test-project suite.
- The main patch is against linux-2.6.9-rc3-mm3. It's split
into two parts, plus two optional bonus patches. You need to
apply them in order.
- bug fixes to the existing aio code
- the main patch. This patch
adds vectored libaio support to: ext2. ext3, nfs, jfs, socket, and reiserfs.
pipe and xfs are not yet supported.
- this patch allows passing an
explicit address to io_setup context.
- this patch is a port of
Feng Zhou's epoll + libaio patch to -mm. It allows receiving epoll events
via io_getevents.
- This is a slightly older patch, but
it applies against linux-2.6.8.1 +
Suparna Bhattacharya's aio patch.
- To use this new feature, you need to add two contants IO_CMD_PREADV (7)
and IO_CMD_PWRITEV (8) to libaio.h. The struct iocb already contains definitions
to support vectored I/O.
Here is a simple test program.
Caution! this program writes on /dev/sdb.
You need a fairly recent glibc to run this program. I tested only on
Fedora Core 2 running on VMWare.
How it works
Roughly, here're the contents of this patch:
- add "aio_readv" and "aio_writev" to struct file_operations.
- aio_abi.h: add IOCB_CMD_PREADV and PWRITEV. Also make struct iocb compatible with the latest glibc libaio.h.
- aio.c: change kiocb to support vectored operations. the IOCB_CMD_PREAD and PWRITE are now simply implemented as degenerate variations of PREADV and PWRITEV.
- block_dev.c, file.c, {ext2,ext3,jfs,reiserfs}/file.c, pipe.c, and others: Add implementations of aio_readv and wriitev methods.
They are straightforward since the low-level code already supports vectored I/O.
- mm/filemap.c, __generic_file_aio_read: There apparently is a bug in this function when nr_segs > 0 --- when the data is
not immediately ready, this function reads and stores data from a wrong offset.
I fixed in this patch by aborting and retrying the read when the
data is not ready.
Related links
Last updated: 10/14/2004
Yaz Saito