These classnotes are depreciated. As of 2005, I no longer teach the classes. Notes will remain online for legacy purposes

UNIX01/Dealing With Tar Files

Classnotes | UNIX01 | RecentChanges | Preferences

Many UNIX applications are distributed as source code. The most widely used archive format under UNIX for distributing source code is the tar-file.

tar actually dates back to the early days of UNIX. It literally means "Tape Archive" and was originally used to archive file systems into a single file which could, in turn, be backed up to a tape.

tar should not be confused with ZIP or SIT in the Windows and Macintosh worlds, as it only archives and does not compress. For compression, we use other utilities as specified below.

To create a tar file from a sub-directory, you simply use the 'tar c' command. For example, if I had a sub-directory called 'backups',

 $ ls -la
 total 0
 drwxr-xr-x    3 sam      sam         20 May  9 16:51 .
 drwxr-xr-x    4 sam      sam         85 May  9 16:51 ..
 drwxr-xr-x    2 sam      sam          0 May  9 16:51 backups

I could archive it with the following command:

 $ tar c backups/ > backups.tar

If I then wanted to extract the files in this archive back out, I would do so with the 'x' (for extract) and 'f' (for "from file and not stdin") thusly:

 $ tar xf backups.tar

Compression with gzip

Gzip reduces the size of the named files using Lempel-Ziv coding (LZ77). Whenever possible, each file is replaced by one with the extension .gz, while keeping the same ownership modes, access and modification times. Gzip is ever-so-slightly more efficient than ZIP under Windows.

If we wanted to Gzip the backups.tar file we created above, we would type

 $ gzip backups.tar

If we then wanted to decompress it, we could type

 $ gunzip backups.tar.gz

however, tar can handle gzipped files natively with the 'z' option, so we could just use the following to extract everything:

 $ tar xzf backups.tar.gz

Compression with bzip2

bzip2 compresses files using the Burrows-Wheeler block sorting text compression algorithm, and Huffman coding. Compression is generally considerably better than that achieved by more conventional LZ77/LZ78-based compressors. bzip2 usually produces the best compression out of gzip, Windows ZIP, or Mac SIT, however it can be the slowest out of all of them. When a file has been compressed with bzip2, it usually has the .bz2 extension.

If we wanted to compress the backups.tar file from above using bzip2, we would

 $ bzip2 backups.tar

Then, to decompress it, we could type

 $ buznip2 backups.tar.bz2

We could also decompress it to stdout thusly

 $ bzcat2 backups.tar.bz2

which we could use to pipe through tar for extraction:

 $ bzcat2 backups.tar.bz2 | tar x

Compressions with 'compress'

The traditional UNIX utility for compression is called 'compress'. It has been increasingly depreciated since gzip first came on the scene, and you will hardly find any more archives compressed using it. 'compress' files end with the .Z extension.

If you do have a need to use compress, see it's man page. You will find that it is very straight-forward to use. Also, bear in mind that with the '-Z' command, tar can compress or uncompress files using 'compress' automatically.



Classnotes | UNIX01 | RecentChanges | Preferences
This page is read-only | View other revisions
Last edited May 10, 2003 12:04 am (diff)
Search:
(C) Copyright 2003 Samuel Hart
Creative Commons License
This work is licensed under a Creative Commons License.