Large Data Sets

David F. Wiley

Too big to fit into the main memory of a workstation, large data poses several problems that limits one’s ability to view, analyze, and work with it. Methods for compression, partitioning, organization, and levels of detail yield smaller more manageable representations, parallelization across several processors, and even sequential access via a single processor.

Real time exploration is achieved by tuning the combination of these methods. Analysis methods can be previewed on a smaller version of the data allowing one to tune method parameters quickly by not wasting valuable time on fruitlessly long computations on the entire data.

These methods are out-of-core, meaning that only portions small enough to fit into main memory are loaded at any one time, thus methods that operate on such data must consider access patterns to the data that follow the out-of-core framework in order to achieve the best possible access time available.