Lab 2: Windows, Unix, and Input Redirection
Due by the end of class
One rather irritating issue that a programmer may run into when developing software is discrepancies between character encodings for different operating systems. One of these issues has to do with the characters used to indicate the beginning of a new line in ASCII text files.
- Windows and DOS machines use the two characters
'\r'
followed by'\n'
(linefeed + newline) to indicate a new line. - Linux and other Unix-based machines use only the character
'\n'
to indicate a new line.
Editing a text file created on a Windows machine on a Unix machine will often show the character ^M
at the end of each line.
Note: some editors do not display this character, to view the hidden characters, type the following.
cat -v textfile.txt
To deal with these differences in encoding, we can use the programs dos2unix
and unix2dos
to replace files with appropriate encoding changes made. For instance, the following command converts a Unix-based text file to DOS encoding.
unix2dos textfile.txt
On the other hand, the following command converts a DOS-based text file to a Unix encoding.
dos2unix textfile.txt
Specification
Your task is to write a program to convert text files from Windows-style encoding to Unix style encoding, and create a makefile to build the program using make
. You should include the header file convert.h
in your program and makefile
and use the symbolic constants defined in the header file in your conversion program. Do not simply copy the contents of the header file into your program. One of the requirements of this lab is that you become familiar with using make
with multiple input files.
You should compile your program using the GNU C compiler gcc
, creating an executable named convert
. Please include this compilation task in a makefile called makefile
whose default rule is convert
. The dependency of convert
will be the name of your .c
file and convert.h
. As always, add a clean
rule that removes convert
using the rm -f
command.
Read every character from the input file. If it is a '\r'
, ignore it. Otherwise, simply output each character you read directly. Stop when you get to the end of the file. (A more sophisticated algorithm would only ignore a '\r'
if it was followed by a '\n'
, but the approach above is good enough for most cases.)
Everything you need to write the program is described in Chapter 1 of Kernighan and Ritchie. The file copying program on page 16 is of particular relevance to this lab. To read an individual character, use the getchar()
function. To output an individual character, use the putchar()
function. These are the only functions you should use for input and output in this program.
Note: getchar()
returns an int
value, even though its job is to give back a char
. Why? Well, we're going to be reading from a file. When the file is over, the getchar()
function returns the special value EOF
which is a constant defined in stdio.h
. Usually, the value of EOF
is the int
value -1
. By giving back an int
, we can return the entire range of char
values as well as this special message. Ideally, you should store the return value as an int
and check to see if it is EOF
. If it isn't, then store it into a char
variable.
Input Redirection
The command line is less convenient than a GUI for some tasks, but it's also much more efficient for others. The command line gives a smooth way to transition between input that you type directly into the program and input read from a file.
You should use this input redirection feature to pass input to your program. If you run a program and use the less-than symbol (<
) followed by the name of an input text file, the contents of that file will be used exactly as if a user had typed the data in. For instance, if you compile the aforementioned file copying program from the textbook, you can redirect the input from a file to the program by running the following command.
./convert < textfile.txt
Unsurprisingly, you can also redirect the output of any program or command to an output file instead of being printed onto the screen. To do so, use the greater-than symbol (>
) followed by the name of an output text file.
./convert < textfile.txt > copiedfile.txt
Debugging
Your program will be tested using automated scripts. This means that the output of your program should match the output of our implementation exactly. In order to make sure your implementation matches ours, you can use the unix2dos
command to create text files that need to be converted to Unix encoding, and then use your program to convert to Unix format. Note that the unix2dos
command does not use input or output redirection because it opens the file directly, which we have not yet covered in class.
Compare the results of your program to the results of the dos2unix
program with the diff
command. The diff
command compares two files to see if they have any differences. If there are no differences, there is no output. If there are differences, it will print the differing lines from the two files. The following is an example of how you might use diff
to test your code.
diff unix2dos_output.txt your_output.txt
Type man diff
for more information about the diff
utility.
If you're too lazy to create your own text files, below is the Declaration of Independence, in Unix and DOS formats, respectively. If you want to use these files for testing, be sure to right-click on the links and save them to your local directory. Do not copy and paste the text in these files into new files. Doing so will likely destroy precisely the special characters that we're trying to manipulate in this lab.
Turn In
Zip the contents of your lab directory, including the makefile and the source C file. Upload this zip file to Brightspace. Do not include any object files or executables. It is not necessary to turn in convert.h
. Running the make
command must compile the required C source code file and generate an executable named convert
.
All work must be done individually. Never look at someone else's code. Please refer to the course policies if you have any questions about academic integrity. If you have trouble with the assignment, I am always available for assistance.