|
Many files are very compressible and so NTFS has an optional built in compression routine. This will invisibly compress files when writing to the hard disk drive, and expand on reading. The compression routine used is based on LZ77 and is very fast to compress, and fast to expand. It is not as efficient as dedicated programs such as PKZIP but it does have to work in real time, and not affect the performance of the PC. Typically, a compressed disk will hold 50% more than an uncompressed disk, but this ratio is very data dependent. A disk full of JPEGs or Zipped files will see no gain, while a disk full of text documents will see a much better gain.
The compression is on a file by file basis, though it is possible to set the complete drive to compress everything, or maybe just a few subdirectories to apply compression.
Although the compression implementation may look very complex, it is actually quite straightforward and care is take to ensure that compression does not expand files on the disk. There has been a deliberate design choice so that speed, and searching to a location in the file is fast, rather than the highest possible compression.
Compression is always done on a data length of 16 clusters. This is typically 16K, 32K or 64K. The operating system keeps track of each cluster start as part of run tables, and so may seek to a location within a file very quickly. Within a compression run of 16 clusters, data is then compressed on a 4K basis. Two important algorithms ensure that data does not expand. If a 4K block does not compress, it is left uncompressed and this adds just 2 bytes over head. It often happens on a JPEG file that the headers will compress well, but the image data is already compressed, and so can not be compressed any further. In this case the first few 4K blocks may be compressed, but the image data will be left in an original format. If a 16 cluster block does not compress by at least one cluster, again the whole 16 clusters are written to the disk uncompressed. The data run table in the MFT indicates which data runs are compressed by following such a run by blank sectors - similar to sparse sectors.
Data recovery on a compressed disk can have certain problems. If a MFT entries are still intact then decoding and expanding compressed data is OK. If however it is necessary to scan for compressed data that has no associated MFT it does become rather harder. It does take an element of computer guess work to determine if a cluster is compressed or not, though there are some good pointers. It is then necessary to try and determine if each successive cluster run is also compressed. If a disk has just be compressed with a large number of files on it, then it does tend to become rather fragmented, as the compressed data is written to the original location, followed by free clusters.
CnW Recovery Lewes East Sussex, UK
|