previous contents up next

Unix for Advanced Users

6. Manipulating Files

6.5. How do I compare files?

Unix systems provide an automatic way to compare files and find their differences. Given the differences between an old file and a new file, it is possible to take copies of the old file and update only those parts that need changing. This technique is extremely useful for updating large collections of files over the network.

6.5.1. diff

diff is the command to find the differences between two files. diff is line-oriented: it finds each line that differs between two files and prints the version from each file. To indicate which file is the source of a line, diff uses the greater-than and less-than symbols. For instance, diff file1 file2 might produce the following output:

70,72c70
< #The next line had to be changed:
<
<   if ($sum1 != $sum2) {
---
>   if (sum1 != $sum2) {
The first and second lines listed come from file, since the arrow points to the left. The third line comes from file2; it has been changed. The line at the very top means indicates where the other two lines originated. It provides context.

6.5.2. patch

patch is used with the output of diff, the so-called context diffs between two items, to make selective changes. A typical use of patch is to update your copy of a program's source code. The code has changed very little, so it makes more sense to download only context diffs, or patches, rather than downloading the entire distribution. In this case, the diffs cover entire directories, not just individual files.

Almost all context diffs are applied to a subdirectory. The syntax is then patch -pnumber < patch_name. (That is, the file is used as redirected input to the command.) The number depends on how deep the subdirectory lies below the current directory. A patch will generally include instructions for where to place it and what number to use.

6.5.3. cmp

It is also possible to find out if two files differ at all, without looking at the differences: cmp file1 file2 exits silently if the two files are the same, but prints out a message if they are different.

Because it exits as soon as it finds a difference, cmp is more efficient than diff. It is useful for telling if two binary files--for instance, two versions of a program--are the same.

previous contents up next