Table of Contents INTRODUCTION TO FILE STRUCTURES. The Heart of File Structure Design. A Short History of File Structure Design. A Conceptual Toolkit: File Structure Literacy. Object-Oriented Programming in C++.
FUNDAMENTAL FILE PROCESSING OPERATIONS. Physical Files and Logical Files. Opening Files. Closing Files. Reading and Writing. Seeking. Special Characters in Files. The UNIX Directory Structure. Physical and Logical Files in UNIX. File-related Header Files. UNIX File System Commands.
SECONDARY STORAGE AND SYSTEM SOFTWARE. Disks. Magnetic Tape. Disk versus Tape. Storage as a Hierarchy. A Journey of a Byte. Buffer Management. I/O in UNIX.
FUNDAMENTAL FILE STRUCTURE CONCEPTS. Field and Record Organization. Using Classes to Manipulate Buffers. Using Inheritance for Record Buffer Classes. Managing Fixed Length, Fixed Field Buffers. An Object-Oriented Class for Record Files.
MANAGING FILES OF RECORDS. Record Access. More about Record Structures. Encapsulating Record Operations in a Single Class. File Access and File Organization. Object-Oriented Approach to File Access. Portability and Standardization.
ORGANIZING FILES FOR PERFORMANCE. Data Compression. Reclaiming Space in Files. Finding Things Quickly: An Introduction to Internal Sorting and Binary Searching. Keysorting.
INDEXING. What Is an Index? A Simple Index for Entry-Sequenced File. Template Classes in C++. Object-Oriented support for Indexed, Entry-Sequenced Files of Data Objects. Indexes That Are Too Large to Hold in Memory. Indexing to Provide Access by Multiple Keys. Retrieval Using Combinations of Secondary Keys. Improving the Secondary Index Structure: Inverted Lists. Selective Indexes. Binding.
COSEQUENTIAL PROCESSING AND THE SORTING OF LARGE FILES. A Model for Implementing Cosequential Processes. Application of the Model to a General Ledger Program. Extension of the Model to Include Multiway Merging. A Second Look at Sorting in Memory. Merging as a Way of Sorting Large Files on Disk. Sorting Files on Tape. Sort-Merge Packages. Sorting and Cosequential Processing in UNIX.
MULTI-LEVEL INDEXING AND B-TREES. Introduction: The Invention of the B-Tree. Statement of the Problem. Binary Search Trees are not a Solution. Multi-level Indexing, A Better Approach to Tree Indexes. B-Trees: Working up from the Bottom. Example of Creating a B-Tree. An Object-Oriented Representation of B-Trees. B-Tree MEthods Search, Insert, and Others. B-Tree Nomenclature. Formal Definition of B-Tree Properties. Worst-case Search Depth. Deletion, Merging, and Redistribution. Redistribution during Insertion: A Way to Improve Storage Utilization. B* Trees. Buffering of Pages: Virtual B-Trees. Variable-length Records and Keys.
INDEXED SEQUENTIAL FILE ACCESS AND PREFIX B+ TREES. Indexed Sequential Access. Maintaining a Sequence Set. Adding a Simple Index to the Sequence Set. The Content of the Index: Separators Instead of Keys. The Simple Prefix B+ Tree. Simple Prefix B+ Tree Maintenance. Index Set Block Size. Internal Structure of Index Set Blocks: A Variable-order B-Tree. Loading a Simple Prefix B+ Tree. B+ Trees. B-Trees, B+ Trees, and Simple Prefix B+ Trees in Perspective.
HASHING. Introduction. A Simple Hashing Algorithm. Hashing Functions and Record Distributions. How Much Extra Memory Should Be Used? Collision Resolution by Progressive Overflow. Storing More Than One Record per Address: Buckets. Making Deletions. Other Collision Resolution Techniques. Patterns of Record Access.
EXTENDIBLE HASHING. Introduction. How Extendible Hashing Works. Implementation. Deletion. Extendible Hashing Performance. Alternative Approaches.
APPENDIX A. FILE STRUCTURES ON CD-ROM. Using this Appendix. Introduction to CD-ROM. Physical Organization of CD-ROM. CD-ROM Strengths and Weaknesses. Tree Structures on CD-ROM. Hashed Files on CD-ROM. The CD-ROM File System.
APPENDIX B. ASCII TABLE. APPENDIX C. SIMPLE FILE INPUT/OUTPUT EXAMPLES. List.c. C program to read and display the contents of a file. List.cpp. C++ program to read and display the contents of a file. Person.h. Definition for class Person, including code for constructor. Writestr.cpp. Write Person objects into a stream file Readdel.cpp. Read Person objects with fields delimited by ''. Readvar.cpp. Read variable length records and break up into Person objects. Writeper.cpp. Function to write a person to a text file. Readper.cpp. Function to prompt user and read fields of a Person.
APPENDIX D. CLASSES FOR BUFFER MANIPULATION. Person.h. Definition for class Person. Person.cpp. Code for class Person. Deltext.h. Definition for class DelimitedTextBuffer. Deltext.cpp. Code for class DelimitedTextBuffer. Lentext.h. Definition for class LengthTextBuffer. Lentext.cpp. Code for class LengthTextBuffer. Fixtext.h. Definition for class FixedTextBuffer. Fixtext.cpp. Code for class FixedTextBuffer. Test.cpp. Test program for all buffer classes.
APPENDIX E. A CLASS HIERARCHY FOR BUFFER INPUT/OUTPUT. Person.h. Definition for class Person. Person.cpp. Code for class Person. Iobuffer.h. Definition for class IOBuffer. Iobuffer.cpp. Code for class IOBuffer. Varlen.h. Definition for class VariableLengthBuffer. Varlen.cpp. Code for class VariableLengthBuffer. Delim.h. Definition for class DelimFieldBuffer. Delim.cpp. Code for class DelimFieldBuffer. Length.h. Definition for class LengthFieldBuffer. Length.cpp. Code for class LengthFieldBuffer. Fixlen.h. Definition for class FixedLengthBuffer. Fixlen.cpp. Code for class FixedLengthBuffer. Fixfld.h. Definition for class FixedFieldBuffer. Fixfld.cpp. Code for class FixedFieldBuffer. Buffile.h. Definition for class BufferFile. Buffile.cpp. Code for class BufferFile. Recfile.h. Template class RecordFile. Test.cpp. Test program for buffer classes and RecordFile including template function.
APPENDIX F. SIMPLE INDEXING AND TEMPLATE CLASSES. APPENDIX G. MULTI-LEVEL INDEXING: B+ TREE CLASSES. APPENDIX H. CLASSES TO SUPPORT HASHING. 0201874016T04062001
Book Info Presents file structure techniques, including direct access I/O, buffer packing and unpacking, indexing, consequential processing, B-trees, and external hashing. Covers secondary storage devices such as disk, tape and CD ROM. DLC: C++ (Computer program language)
From the Inside Flap
The first and second editions of File Structures by Michael Folk and Bill Zoellick established a standard for teaching and learning about file structures. The authors helped many students and computing professionals gain familiarity with the tools used to organize files.
This book extends the presentation of file structure design that has been so successful for twelve years with an object-oriented approach to implementing file structures using C++. It demonstrates how the object-oriented approach can be successfully applied to complex implementation problems. It is intended for students in computing classes who have had at least one programming course and for computing professionals who want to improve their skills in using files.
This book shows you how to design and implement efficient file structures that are easy for application programmers to use. All you need is a compiler for C++ or other object-oriented programming language and an operating system. This book provides the conceptual tools that enable you to think through alternative file structure designs that apply to the task at hand. It also develops the programming skills necessary to produce quality implementations.
The coverage of the C++ language in this book is suitable for readers with a basic knowledge of the language. Readers who have a working familiarity with C++ should have no problem understanding the programming examples. Those who have not programmed in C++ will benefit from access to an introductory textbook.
The first programming examples in the book use very simple C++ classes to develop implementations of fundamental file structure tools. One by one, advanced features of C++ appear in the context of implementations of more complex file structure tools. Each feature is fully explained when it is introduced. Readers gain familiarity with inheritance, overloading, virtual methods, and templates and see examples of why these features are so useful to object-oriented programming. Organization of the Book
The first six chapters of this book give you the tools to design and implement simple file structures from the ground up: simple I/O, methods for transferring objects between memory and files, sequential and direct access, and the characteristics of secondary storage. The last six chapters build on this foundation and introduce you to the most important high-level file structure tools, including indexing, cosequential processing, B-trees, B+ trees, hashing, and extendible hashing.
The book includes extensive discussion of the object-oriented approach to representing information and algorithms and the features of C++ that support this approach. Each of the topics in the text is accompanied by object-oriented representations. The full C++ class definitions and code are included as appendices and are available on the Internet. This code has been developed and tested using Microsoft Visual C++ and the Gnu C++ compilers on a variety of operating systems including Windows 95, Windows NT, Linux, Sun Solaris, and IBM AIX. Object-Oriented File Structures
There are two reasons we have added the strong object-oriented programming component to this book. First, it allows us to be more specific, and more helpful, in illustrating the tools of file structure design. For each tool, we give very specific algorithms and explain the options that are available to implementers. We are also able to build full implementations of complex file structure tools that are suitable for solving file design problems. By the time we get to B-tree indexing, for instance, we are able to use previous tools for defining object types, moving data between memory and files, and simple indexing. This makes it possible for the B-tree classes to have simple implementations and for the book to explain the features of B-trees as enhancements of previous tools.
The second purpose of the programming component of the book is to illustrate the proper use of object-oriented methods. Students are often exposed to object-oriented techniques through simple examples. However, it is only in complex systems that the advantages of object-oriented techniques become clear. In this book, we have taken advantage of the orderly presentation of file structure tools to build a complex software system as a sequence of relatively simple design and implementation steps. Through this approach, students get specific examples of the advantages of object-oriented methods and are able to improve their own programming skills. A Progressive Presentation of C++
We cover the principles of design and implementation in a progressive fashion. Simple concepts come first and form the foundation for more complex concepts. Simple classes are designed and implemented in the early chapters, then are used extensively for the implementation topics of the later chapters. The most complex file structure tools have simple implementations because they extend the solid foundation of the early chapters.
We also present the features of C++ and the techniques of object-oriented programming in a progressive fashion. The use of C++ begins with the simplest class definitions. Next comes the use of stream classes for input and output. Further examples introduce inheritance, then virtual functions, and finally templates.
Each new feature is introduced and explained in the context of a useful file structure application. Readers see how to apply object-oriented techniques to programming problems and learn firsthand how object-oriented techniques can make complex programming tasks simpler. Exercises and Programming Problems
The book includes a wealth of new analytical and programming exercises. The programming exercises include extensions and enhancements to the file structure tools and the application of those tools. The tools in the book are working software, but some operations have been left as programming problems. The deletion of records from files, for instance, is discussed in the text but not implemented. Specific programming problems fill in the gaps in the implementations and investigate some of the alternatives that are presented in the text.
An application of information processing is included as a series of programming projects in the exercise sets of appropriate chapters. This application begins in Chapter 1 with the representation of students and courses registrations as objects of C++ classes. In Chapter 2, the project asks for simple input and output of these objects. Later projects include implementing files of objects (Chapter 4), indexes to files (Chapter 7), grade reports and transcripts (Chapter 8), B-tree indexes (Chapter 9), and hashed indexes (Chapter 12). Using the Book as a College Text
The first two editions of File Structures have been used extensively as a text in many colleges and universities. Because the book is quite readable, students typically are expected to read the entire book over the course of a semester. The text covers the basics; class lectures can expand and supplement the material. The professor is free to explore more complex topics and applications, relying on the text to supply the fundamentals.
A word of caution: It is easy to spend too much time on the low-level issues presented in the first seven chapters. Move quickly through this material. The relatively large number of pages devoted to these matters is not a reflection of the percentage of the course that should be spent on them. The intent is to provide thorough coverage in the text so the instructor can assign these chapters as background reading, saving precious lecture time for more important topics.
It is important to get students involved in the development of file processing software early in the course. Instructors may choose some combination of file tool implementation problems from the programming exercises and applications of the tools from the programming projects. Each of the programming problems and projects included in the exercises is intended to be of short duration with specific deliverables. Students can be assigned programming problems of one to three weeks in duration. It is typical for one assignment to depend on previous assignments. By conducting a sequence of related software developments, the students finish the semester with extensive experience in object-oriented software development. A Book for Computing Professionals
We wrote and revised this book with our professional colleagues in mind. The style is conversational; the intent is to provide a book that you can read over a number of evenings, coming away with a good sense of how to approach file structure design problems. Some computing professionals may choose to skip the extensive programming examples and concentrate on the conceptual tools of file structure design. Others may want to use the C++ class definitions and code as the basis for their own implementations of file structure tools.
If you are already familiar with basic file structure design concepts and programming in C++, skim through the first six chapters and begin reading about indexing in Chapter 7. Subsequent chapters
From the Back Cover
This book teaches design by putting the hands-on work of constructing and running programs at the center of the learning process. By following the many programming examples included in the book and in the exercise sets, readers will gain a significant understanding of object-oriented techniques and will see how C++ can be an effective software development tool. HIGHLIGHTS
* Presents file structures techniques, including direct access I/O, buffer packing and unpacking, indexing, cosequential processing, B-trees, and external hashing. * Includes extensive coverage of secondary storage devices, including disk, tape, and CD-ROM. * Covers the practice of object-oriented design and programming with complete implementations in C++. Every line of code in the book has been tested on a variety of C++ systems and is available on the Internet. * Develops a collection of C++ classes that provide a framework for solving file structure problems. * Includes class definitions, sample applications and programming problems and exercises, making this book a valuable learning and reference tool.
** Instructor's materials are available from your sales rep. If you do not know your local sales representative, please call 1-800-552-2499 for assistance, or use the Addison Wesley Longman rep-locator at hepg.awl/rep-locator. 0201874016B04062001
About the Authors
Michael J. Folk manages the Scientific Data Technologies Group at the National Center for Supercomputing Applications at the University of Illinois in Urbana. He has been responsible for developing a general purpose scientific data file format called HDF and software for managing data in high-performance, high-volume computing environments. Prior to his work at Illinois, Dr. Folk was a professor of computer science for fifteen years at Oklahoma State and Drake Universities.
Bill Zoellick is currently a partner in and founder of Fastwater LLP, a consultancy focusing on helping companies build effective web businesses. He frequently writes about the issues addressed in Web Engagement and speaks on them at user conferences such as Seybold and Internet World and at various user associations and seminars. He has been a software developer, business owner, executive in a $100 million software company, and, most recently, a management consultant and business analyst.
Greg Riccardi is a professor of computer science at Florida State University and an associate of the Supercomputer Computations Research Institute. Professor Riccardiis research interests include scientific databases, object-oriented databases, and parallel computation. He is also affiliated with the Thomas Jefferson National Accelerator Facility where he works on the acquisition, management, and analysis of data for experimental physics. He received a University Teaching Award in 1997 from Florida State University.
|