![]() |
Notes on File Systems |
Dr. Rick Smith
|
|
Dr. Smith Home | Research | Classes | Blackboard | Cryptosmith | QMCS Home | UST A-Z | UST Home
last update: |
||
This is a collection of notes intended to introduce the fundamentals of file systems. The notes are organized into the following sections; the first is an overview and the rest review some simple file systems:
When faced with a large expanse of hard drive space, one has three problems:
Over the decades, different file systems have produced different solutions to these problems. Usually the differences can be traced back to the following, sometimes mutually exclusive, objectives:
As hard drives have grown in capacity, file systems have grown in complexity. Still, the systems' weird features usually trace their origins back to the problems being solved or the particular objectives being pursued.
If we look back into ancient history, when semi-trailor-sized beheamoths were being out-evolved by refrigerator-sized creatures in university computer labs, we find many comprehensible file systems.
The Forth programming system was developed in the late 1960s by Chuck Moore. It provided a very powerful, text based mechanism for controlling a computer and writing programs when RAM and hard drive space were extremely tight. Early implementations were routinely restricted to 8KB of RAM. Some early implementations relied exclusively on diskette drives that stored less than a half a megabyte of data.
Starting in the 1970s, typical Forth systems treated hard drives as consisting of a linear set of numbered blocks, each 1KB in size. The first block on the drive (block 0) contained the bootstrap program to get Forth started, and a small number of subsequent blocks might also contain binary executable code that was loaded into RAM when Forth started.
Following the blocks of executable code, the remaining hard drive blocks generally contained ASCII text and were referred to by number. If a programmer needed to modify part of a Forth program, he would edit the hard drive block that contained that program, and refer to the block by its number.
Here is an assessment of Forth's file system in the context of the eight concepts noted above:
Boston University (BU) developed its own timesharing system in the 1970s for its IBM 360 and 370 mainframes. The system was based on the batch-oriented Remote Access Computing System (RACS) developed by IBM. McGill University also participated in RAX development, but their version was renamed "McGill University System for Interactive Computing" (MUSIC). Although many of the details are lost in the mists of time, both systems used some text processing tools developed at BU.
At the time, IBM had developed a few timesharing systems, but they were generally expensive and slow. IBM's standard operating systems for the 360 series had a file system; files were referred to as data sets. To put matters as charitably as possible, IBM's data set support was not suited to the dynamic nature of file access in timesharing environments. Frankly, it was a beast. So RAX really needed its own file system.
In accordance with the traditions of IBM data processing, a RAX file looked more-or-less like a deck of punched cards. Files consisted of "records" that carried individual lines of text. Unlike punched cards, trailing blanks were omitted and the individual records (lines) could vary in length. More significantly, files were either read sequentially in a single pass, or written sequentially in a single pass. There wasn't any notion of random access or of modifying the middle of a file without rewriting the whole thing. While RAX did support random access to hard drive files, the function was limited to specially allocated files (standard IBM data sets, actually) and used special operations that were only avaliable to assembly language programmers.
Each file had a unique name and was 'owned' by the user that created it. Users could modify the permissions on files to share them with other users.
The RAX system's timesharing hours were generally limited to daytime and evenings. Overnight, the CPU was rebooted with IBM's OS/360 or OS/VS1 to run batch jobs. Thus, the RAX hard drives had to be compatible with IBM's native file system, such as it was. The RAX library was implemented inside a collection of IBM data sets, each data set serving as a pool of disk blocks to use in library files. These disk blocks were called space sets and contained 512 bytes each.
A complete RAX library file name contained two parts: an 8-character index name and an 8-character file name. While this gave the illusion of there being a hierarchical file system, there was no true 'root' directory. All files not used by the RAX system programming staff resided in the "userlib" index; if no index name was given, RAX searched in userlib. The directory arrangement apparently worked as follows:
There were a small number of IBM data sets that served as library directories (indexes). A file's index name selected the appropriate data set to search for that file's directory entry. These index files were apparently set up using IBM's Indexed Sequential Access Method (ISAM). Such files were specially formatted to use a feature of the IBM disk hardware. Each data block in the file contained a key field along with space for a library file's directory entry. The "key" part contained the file name. The IBM disk hardware could be told to scan the data set until it found the record whose key contained that name, and then it would retrieve the corresponding data. This put the burden of directory searching on the hard drive, and freed up the CPU to work on other tasks.
The directory entry contained the usual timestamps (date created, accessed, modified, etc.), ownership information, access permissions, size, and a pointer to the first space set in the file.
Once the system knew the location of the file's first space set, it could retrieve the file's contents sequentially. A space set address was a 32-bit number formatted in 2 fields:
|
8 bits
Lib number |
24 bits
Space set number |
Remember that the library consisted of numerous data sets that served as pools of data blocks These pools were called lib files, and were numbered sequentially. The data blocks, or space sets, were numbered sequentially inside each lib file.
Files within the RAX library were implemented as a list of linked space sets. The first four bytes of each space set carried the pointer to the next one in the file. The pointer bytes were managed automatically by the system's read and write operations; they were invisible to user programs. The net result was that user programs perceived space sets as containing only 508 bytes, since 4 bytes were used for the link pointer.
A single library file could contain space sets from many different lib files. Since each lib file tended to represent a contiguous set of disk space, file retrieval was most efficient when all space sets came from the same lib file. In practice, however, a file would incorporate space sets from whichever lib file had the most available.
Free space was managed within individual lib files. Each lib file kept a linked list of free space sets. Space sets from deleted files were added back to the free list in the appropriate lib file.
Here is a review of the eight issues listed above:
This work by Dr. Rick Smith is licensed under a Creative Commons Attribution-Share Alike 3.0 United States License. |