Project 2: One Time, One Time

Due by: Monday, February 10, 2025 at 11:59 p.m.

He has pointed out to Avi, in an encrypted e-mail message, that if every particle of matter in the universe could be used to construct one single cosmic supercomputer, and this computer was put to work trying to break a 4096-bit encryption key, it would take longer than the lifespan of the universe.

"Using today’s technology," Avi shot back, "that is true. But what about quantum computers? And what if new mathematical techniques are developed that can simplify the factoring of large numbers?"

"How long do you want these messages to remain secret?" Randy asked, in his last message before leaving San Francisco. "Five years? Ten years? Twenty-five years?"

After he got to the hotel this afternoon, Randy decrypted and read Avi’s answer. It is still hanging in front of his eyes, like the afterimage of a strobe:

"I want them to remain secret for as long as men are capable of evil."

-Neal Stephenson, Cryptonomicon

Introduction

Barry Wittman hates Wyclef Jean

Did you ever need to send a message to someone in code so hard to crack that major governments couldn't read your message? All the time, right? Well, the science of cryptography was invented for precisely that purpose.

But there are so many different ways to encrypt a message. Which way is best? One method that has a few theoretical advantages over others is the One Time Pad. The One Time Pad is a surprisingly simple way to encrypt a message. You take the message, take another completely random message, and exclusive-or (XOR) their values together. To decrypt, you XOR the random message with the ciphertext. Below is standard cryptographic notation for taking a message M and XORing it with a random message R (that circle with an X in it is often used to refer to XOR) and producing a ciphertext C. The next line shows the opposite operation: Taking a ciphertext C and decrypting it by XORing it with the original random message R to reconstruct the original message M.

If you use the One Time Pad correctly, it preserves a property called perfect secrecy. Since you could use any random message for encrypting, anyone who gets their hands on the ciphertext learns absolutely nothing about the original message.

Drawbacks

Unfortunately, the One Time Pad has many drawbacks. Some include:

  • The random message needed for encryption and decryption must be as long as the message you want to encrypt.
  • Some secure way of transferring the random message must be found. If you have a secure way of transferring long messages, why not send your original message that way?
  • Truly random messages are hard to get a hold of. If your randomness is predictable, attacks on the One Time Pad become possible.
  • If a single One Time Pad is used a second time, very powerful attacks exist which can recover the two messages and the random message used as a key. "One time" pad, indeed.

How do you fit in?

Well, you've got to code up a One Time Pad encryption system, of course! But, we are going to add a few extra features because XORing two messages together is far too easy.

Tiling

The first feature we are going to add is something I call "tiling." If the random key supplied is too short, repeat it as many times as necessary (including a final partial repetition, if needed) to be exactly as long as the message. If the key is longer than the message, just use characters from the key up to the length of the message. This change allows our program to compute a form of the Vigenère cipher, in which a short key phrase is repeated over and over to encrypt the message.

Chaining

The problem with tiling is the problem with the Vigenère cipher: the shorter the key is with respect to the message, the easier the code is to crack. So, we are going to take a key composed of a short repeating phrase and turn it into a full-length pseudorandom key.

This is how we do it. Let's call the i th character of the key keyi. Iterate through the whole key in order doing the following:

keyi = rotate( keyi XOR keysum , bits( keyi - 1 ))

I've already mentioned the XOR operation (implemented with the ^ operator in C), but a lot of other things require explanation. Here is the list of the expressions used above and what they mean.

  • rotate( character, count ) - The rotate() function takes a value character of type unsigned char and rotates its 7 least significant digits to the right by count steps.
    For example, the value of the character 'a' is 97, which is 01100001 in binary. Below shows what happens if we rotate its 7 lower bits to the right by 2.
    01100001 = 'a'
    00111000 = '8'
    

    Note that we rotate only within the 7 least significant bits. It is very important that we don't touch the most significant bit because we are working only within the ASCII set and not the extended ASCII set. For a rotation, unlike a shift, the values wrap around and are not lost.


  • sum - As you loop through all the characters in the key, keep a running sum of their values. After you process keyi , add its value to sum. So that you don't access memory outside of the array, set sum to be sum mod length after adding to it, where length is the length of the message (which is also the length of the key now) and mod is, of course, the modulus operator %. Before you start processing the first character in the key, set sum to be the value of the last character in the key (mod length, of course).

  • bits( character ) - The bits() function counts the number of 1 bits in the binary representation of the value of character. Using our example above, bits( 'a' ) would return 3. The bits() function always takes the previous character as input. For the first character, use the very last character in the key as input, since there is no previous character.

Encryption and Decryption

Finally, once you have tiled the key and then chained its values together to make them more random looking, it's time to encrypt. Since tiling has made the key and the message the same length, all you have to do is run through the two arrays, XOR each pair of characters together, and output the result.

As an added bonus, the XOR operation is its own inverse. So, given message M and random key R , M  XOR R  XOR R  = M . This means that the encryption and decryption operations are exactly the same. So, once you encrypt a message with a given random key, encrypting the output with the same random key will give you the original message (unless of course the original message was longer than 2048 characters, in which case the decrypted message will be truncated).

Implementation Details

File Input

It's tedious to type in long messages to encrypt. It's tedious to type in long keys as well. Naturally, we want to use Linux's built-in file redirection to send files to standard input. Unfortunately, file redirection can only redirect one file to your program. We need to send two: the message and the key. Here's how we are going to get around it. Linux has utility called cat which concatenates files together. So, you can send several files to your program all stuck together as one file. Now, you've got another problem! Where does one file end and the next begin?

Well, we will have to limit the messages and keys we use, but we can do it. If you refer to the ASCII table, you'll notice that the standard typable characters are only in the range from 0-127. If we restrict our messages and keys to those values (still pretty practical: any message you can type is a legal message), we can use one of the other keys as a marker. Use the character value 255 as a delimiter to separate your message and your key. For your convenience, we are providing a file called delimiter.txt which contains only a single character with the value 255. Your program should be called onetimepad. If you have a message file called message.txt and a key file called key.txt, you can use the Linux cat utility to pipe input to your program as follows.

cat message.txt delimiter.txt key.txt | ./onetimepad > cipher.txt

Message Length

You should allocate storage for messages and keys of lengths up to 2048 characters. If a message is more than 2048 character, ignore the rest until you get to the delimiter character (255). If a key is more than 2048 characters, ignore the rest until you get to the end of file marker (EOF).

Data Representation

Make sure that you store your characters with the type unsigned char. Doing so will make your bitwise operations behave as you expect. You should be making arrays with these unsigned char values, but these arrays are not strings! A string has a null character ('\0') to mark the end, but your arrays should not use null characters. After all, in the process of encrypting characters, some might become the null character by accident. In order to keep track of where the ends of your arrays are, you need a separate variable to record their lengths.

Again, make sure that your message doesn't contain any extended ASCII characters (those with values between 128 and 255). Otherwise, you might get a value of 255 in your characters, which will confuse the part of your code looking for the marker.

Note: Sometimes students ask, "How do I convert characters to and from binary?" This question doesn't make sense because all characters (and everything else) in a computer are stored in binary. Other representations are visual devices to help us humans know what is going on. You do not need to convert anything, just perform bitwise operations directly on the characters.

Forbidden Fruit

You are only allowed to use two library functions in this program: getchar() and putchar(). The use of any other library functions may earn you a zero on this assignment. Don't worry, you won't need printf(). It's only good for strings, and you aren't going to have any.

Testing

Make sure you test your program heavily for a wide number of cases. We are providing you with a number of files to assist your testing. Please use the diff utility to compare your output with the output of the sample executable.

Test your program as you work. For example, first get your code to just output the first 2048 characters of the message. Then do a simple XOR between your message and a key that is the same length as the message. (If you can do that, you should be able to run the program again on the output and get the original message again.) After that, get tiling working. Finally, when everything else is working perfectly, add the chaining step to your program.

If you want to know what the output of the key processing is, fill your message file with characters that have the value zero. Then, the key output XORed with all zeroes will give only the key output.


Provided Files

To help your testing, we are giving you the following files:

  • message1.txt
    Short test message file.

  • key1.txt
    Short test key file the same length as message1.txt.

  • cipher1.txt
    The result of encrypting message1.txt with key key1.txt.

  • message2.txt
    Long test message file. This file is the short story "An Occurrence at Owl Creek Bridge" by Ambrose Bierce. Excellent read if you have the time.

  • key2.txt
    Long test key file.

  • cipher2.txt - The result of encrypting message2.txt with key key2.txt.

  • delimiter.txt
    The delimiter file containing a single character whose value is 255. Use it for marking the end of your message text and the beginning of your key txt.

  • onetimepad.exe
    The sample executable which your code is supposed to mimic, provided for your convenience.

Turn In

Zip the contents of your project directory, including the makefile, the source C file(s), and header files. Upload this zip file to Brightspace. Do not include any object files or executables. Running the make command must compile all the required C source code files and generate an executable named onetimepad. Running the executable onetimepad with piped input as explained above must output the required encrypted (or decrypted) message.

All work must be submitted before Monday, February 10, 2025 at 11:59 p.m. unless you are going to use a grace day.

Grading

Your grade will be determined by the following categories.

Category Description Weight
Program Compiles Free points from the point store. 10%
Correct Makefile Makefile makes executable called onetimepad when make is invoked. Command make clean should also remove all object and executable files. 10%
Correct Output for Standard Cases These points will be awarded for correct output cases where the key and the message are both shorter than the maximum. Testing will cover cases when the key is the same length as the message and when it must be tiled. 45%
Error Checking Your solution must deal with messages and keys which are too long. Note: you don't need to worry about zero length files. 25%
Programming Style The usual: points for comments and good style. Don't throw them away. Refer to the coding standards. 10%

Under no circumstances should any member of one team look at the code written by another team. Tools will be used to detect code similarity automatically.

Code that does not compile will automatically score zero points.