Lab 12: Checksums

Due by the end of class

Introduction

In network communication and other situations, it's useful to know if a file has been corrupted during the transmission process. To solve this problem, it is common to generate a checksum, a short code generated by combining all the values of the file together. The checksum can be generated before and after file transmission. If both of the checksums match, there's a good chance that there were no transmission errors. Of course, there can be problems sending checksums, but they are much smaller and therefore less likely to be corrupted.

Because of their properties, cryptographic hash functions are often used for checksums. MD5, SHA-1, and SHA-2 have been popular choices, although MD5 is no longer secure and theoretical attacks exist against SHA-1. In other words, these hash functions can be used as checksums if you are trying to guard against accidental file corruption but not intentional file corruption that is crafty enough to keep the checksum the same.

Lab Exercise

The cryptographic hash functions mentioned above are too difficult to implement in a lab. Even the cksum command which produces a standard checksum for Linux systems is too involved. However, an older program, sum, is about our speed. We want to reproduce most of the functionality of the sum command called with the -s option (which specifies one of a couple of different checksum algorithms it can perform). Below is an example of how it could be used to find the checksum for a file called wombat.dat.

sum -s wombat.dat
6892 213 wombat.dat

The first number output is the checksum. The second number is the size of the file in blocks. For historical reasons, the size of a block is 512 bytes. The last piece of output is the name of the file itself.

Your goal is to imitate this behavior with a program called checksum. Note that you should not include the -s option for your program. Its output should look as follows.

./checksum wombat.dat
6892 213 wombat.dat

Implementation Details

For this lab, all file access should use low-level functions. You will need to use the following functions.

  • open()
  • read()
  • close()

Please consult man pages or the lecture notes for how to use these. Note that you will also need to include appropriate header files.

The file name will be given as a command-line argument which you should read from argv. Open the file and begin reading data from it, one byte at a time.

The algorithm for the checksum is as follows.

  1. Add up the values of all the bytes, storing this sum in an unsigned int variable
  2. Let r = sum mod 216 + sum / 216
  3. Let s = r mod 216 + r / 216
  4. The final checksum is s

Note that 216 is easy to generate using bitwise shifts.

Keep a count of all the bytes you read from the file. The final output is the checksum, the number of blocks, and the name of the file, all separated by spaces. You can use the sum -s command to find the checksum for arbitrary files on your system and see if your output matches. Note that the number of blocks is number of bytes divided by 512 but rounded up if the number of bytes is not perfectly divisible by 512.

Error Cases

There are two error cases that you should handle. The first error case is when there are not exactly two command-line arguments. There is always at least one, the name of the program being run. The second should be the name of the file you're trying to find a checksum for. If there are more or fewer than two arguments, print an error reading Usage: executable <filename> where executable is the name of the command invoked by the user (the first element in argv). An example follows in which a superfluous -s has been added.

./checksum -s wombat.dat 
Usage: ./checksum <filename>

The second error case is when the user specifies a file that doesn't exist. In that case, the file descriptor returned by open() will be negative. The error message should read executable: file: No such file or directory where executable is the name of the command invoked by the user (the first element in argv), and file is the name of the file that does not exist (the second element in argv). An example follows in which the file combat.dat is not present.

./checksum combat.dat 
./checksum: combat.dat: No such file or directory

In each error case, print the error message and quit. If there aren't exactly two command-line arguments, don't even try to open the file. If there are two command-line arguments but the file doesn't exist, you should obviously not try to find the checksum of a non-existent file.

Turn In

Zip the contents of your lab directory, including the makefile and the source C file. Upload this zip file to Brightspace. Do not include any object files or executables. Running the make command must compile the required C source code file and generate an executable named checksum.

All work must be done individually. Never look at someone else's code. Please refer to the course policies if you have any questions about academic integrity. If you have trouble with the assignment, I am always available for assistance.