![]() |
| August 2000 | Get BSD | New to BSD? | Search BSD | Submit News | FAQ | Contact Us | Join Us |
|
The disk is finite,
so the files move to tape,
and return like magic.
William Studenmund of Veridian MRJ Technology Solutions and NASA/Ames Research Center presented a Data Migration File System (DMFS) for NetBSD, which he developed with collaborators at NASA/Ames. The research was presented at the FREENIX track of the USENIX Technical Conference in San Diego on 21 June 2000.
The system, called NAStore 3, was designed to provide convenient mass storage for the Numerical Aerospace Simulation facility at NASA/Ames. Its purpose is to automatically migrate files from disk to tape as they cease to be accessed, and to transparently restore them to disk when a program tries to open them. It consists of three main components:
The filesystem is implemented in NetBSD as a layer atop FFS or other filesystems. Its general purpose is to note whether a file access is an attempt to access a non-resident file, and if so ask dmfsd to retrieve the file from tape. It causes the requesting process to block until the restoration is complete, but the block is interruptible (e.g. by Control-C).
The filesystem is also responsible for maintaining a database of metadata, to ensure that the file stays in a consistent state. There are metadata flags indicating whether the file is resident and whether it is archived, a generation number for the data (for when an on-tape copy is superseded by disk modifications), and the usual metadata such as file size, atime, and mtime. Storing the metadata on disk allows operations like ls(1) and find(1) not to recall data from tape.
Making DMFS work on NetBSD required several changes to the NetBSD kernel. Filesystem layering had to be made workable, and they implemented a new OVERLAY filesystem, which is similar to NULLFS. But whereas NULLFS replicates a filesystem elsewhere, OVERLAY does its work in-place, so that users (including administrators) do not have access to a non-overlayed filesystem where they could introduce inconsistencies behind DMFS' back. They also expanded file locking support to more filesystems, and added a fcntl(2) interface to the filesystem. In userland, they modified the long listings of ls(1) to report whether files were archived and/or resident.
Studenmund noted that the migration policy can be extremely flexible as it is implemented by highly configurable user-space code. He also noted in that there are certain applications that store metadata within data files; for example, a simulation program might write files with a header that describes the simulation parameters. These applications might benefit from always having a certain amount of header information resident on disk even if the rest of the data has been migrated.
Their work is described in greater detail in the conference proceedings, and the author can be reached by email at wrstuden@zembu.com. My impression is that the DMFS is an improvement to NetBSD that may provide a useful solution for sites with large quantities of data; transparent migration may be more convenient and effective than moving data to tapes manually. I also expect that the group's filesystem work on NetBSD will benefit an even larger community of users.