Lab 2: Windows, Unix, and Input Redirection

Due by the end of class

One rather irritating issue that a programmer may run into when developing software is discrepancies between character encodings for different operating systems. One of these issues has to do with the characters used to indicate the beginning of a new line in ASCII text files.

  • Windows and DOS machines use the two characters '\r' followed by '\n' (linefeed + newline) to indicate a new line.
  • Linux and other Unix-based machines use only the character '\n' to indicate a new line.

Editing a text file created on a Windows machine on a Unix machine will often show the character ^M at the end of each line.

Note: some editors do not display this character, to view the hidden characters, type the following.

cat -v textfile.txt

To deal with these differences in encoding, we can use the programs dos2unix and unix2dos to replace files with appropriate encoding changes made. For instance, the following command converts a Unix-based text file to DOS encoding.

unix2dos textfile.txt

On the other hand, the following command converts a DOS-based text file to a Unix encoding.

dos2unix textfile.txt

Specification

Your task is to write a program to convert text files from Windows-style encoding to Unix style encoding, and create a makefile to build the program using make. You should include the header file convert.h in your program and makefile and use the symbolic constants defined in the header file in your conversion program. Do not simply copy the contents of the header file into your program. One of the requirements of this lab is that you become familiar with using make with multiple input files.

You should compile your program using the GNU C compiler gcc, creating an executable named convert. Please include this compilation task in a makefile called makefile whose default rule is convert. The dependency of convert will be the name of your .c file and convert.h. As always, add a clean rule that removes convert using the rm -f command.

Read every character from the input file. If it is a '\r', ignore it. Otherwise, simply output each character you read directly. Stop when you get to the end of the file. (A more sophisticated algorithm would only ignore a '\r' if it was followed by a '\n', but the approach above is good enough for most cases.)

Everything you need to write the program is described in Chapter 1 of Kernighan and Ritchie. The file copying program on page 16 is of particular relevance to this lab. To read an individual character, use the getchar() function. To output an individual character, use the putchar() function. These are the only functions you should use for input and output in this program.

Note: getchar() returns an int value, even though its job is to give back a char. Why? Well, we're going to be reading from a file. When the file is over, the getchar() function returns the special value EOF which is a constant defined in stdio.h. Usually, the value of EOF is the int value -1. By giving back an int, we can return the entire range of char values as well as this special message. Ideally, you should store the return value as an int and check to see if it is EOF. If it isn't, then store it into a char variable.

Input Redirection

The command line is less convenient than a GUI for some tasks, but it's also much more efficient for others. The command line gives a smooth way to transition between input that you type directly into the program and input read from a file.

You should use this input redirection feature to pass input to your program. If you run a program and use the less-than symbol (<) followed by the name of an input text file, the contents of that file will be used exactly as if a user had typed the data in. For instance, if you compile the aforementioned file copying program from the textbook, you can redirect the input from a file to the program by running the following command.

./convert < textfile.txt

Unsurprisingly, you can also redirect the output of any program or command to an output file instead of being printed onto the screen. To do so, use the greater-than symbol (>) followed by the name of an output text file.

./convert < textfile.txt > copiedfile.txt

Debugging

Your program will be tested using automated scripts. This means that the output of your program should match the output of our implementation exactly. In order to make sure your implementation matches ours, you can use the unix2dos command to create text files that need to be converted to Unix encoding, and then use your program to convert to Unix format. Note that the unix2dos command does not use input or output redirection because it opens the file directly, which we have not yet covered in class.

Compare the results of your program to the results of the dos2unix program with the diff command. The diff command compares two files to see if they have any differences. If there are no differences, there is no output. If there are differences, it will print the differing lines from the two files. The following is an example of how you might use diff to test your code.

diff unix2dos_output.txt your_output.txt

Type man diff for more information about the diff utility.

If you're too lazy to create your own text files, below is the Declaration of Independence, in Unix and DOS formats, respectively. If you want to use these files for testing, be sure to right-click on the links and save them to your local directory. Do not copy and paste the text in these files into new files. Doing so will likely destroy precisely the special characters that we're trying to manipulate in this lab.

Turn In

Zip the contents of your lab directory, including the makefile and the source C file. Upload this zip file to Brightspace. Do not include any object files or executables. It is not necessary to turn in convert.h. Running the make command must compile the required C source code file and generate an executable named convert.

All work must be done individually. Never look at someone else's code. Please refer to the course policies if you have any questions about academic integrity. If you have trouble with the assignment, I am always available for assistance.