C SC 340 Project 4: The Hybrid Sort-Merger
Spring 2009
Due: Tuesday 9 June 2009
No late submissions accepted.
Worth 15 points of your 100 project points
This project uses the C language on Linux. "Individual with a partner"

Introduction

The "bounded buffer" process synchronization problem is embodied in Unix by the pipe construct, which manages the synchronization for you. You will utilize pipes to solve the following problem.

This program will sort and merge two ASCII files, sending the result either to a file or to standard output. It is called a hybrid because the sorting will be performed by shell commands activated within your C program, and the merging will be handled by your code.

Specifications

Your program runs with the command:

     smerge file1 file2 [ file3 ]

Your program must produce sorted versions of file1 and file2 contents, then merge their lines so the result is also sorted, then output the result either to file3 if specified, or to standard output (the screen). The [ and ] indicate that file3 is optional; they are not part of the command. Each file consists of one or more lines of text; each line is terminated with the end-of-line character. Detailed specifications are given here.

The program must:

  1. Verify the command arguments. Notice that main has this header: int main(int argc, char *argv[])
    Inside main, argc contains the count of command line arguments, including the command name. It should have the value 3 or 4, depending on whether or not a destination file is specified. The second parameter argv is an array of strings, each of which contains one word from the command line. argv[0] contains "smerge". The files whose names are in argv[1] and argv[2] must exist or else the program will terminate with an appropriate error message. If a destination file is given in argv[3] and it already exists, it will be overwritten with no message to the user.
  2. Call pipe() twice to create two pipes. This function requires one argument, an array of two int. The array values do not matter because they will be produced by the function. It returns an int that indicates the success or failure of the operation. After a successful call, the argument array elements [0] and [1] will contain integer file descriptors that represent the read and write pipe channel, respectively.
  3. Call fork() twice to create two child processes. The parent and children proceed "in parallel". Since the first fork creates a child, structure your logic so that the second fork is performed only by the original parent.
  4. The first child:
    1. closes the read file descriptor of the first pipe
    2. closes both the read and write file descriptors of the second pipe
    3. creates a temporary file using mkstemp(). This is a function that creates and opens a file whose name is randomly generated based on a template that you provide.
    4. calls system(). This is a function that takes a string parameter where the string is a shell command, and carries out the command. You will give it the command to read the first file (file1) and write the sorted version of it contents to the temporary file. This is done using the shell command sort. The shell command needs to be something like "sort file > tempfile", where file is the input file, and tempfile, which will receive the sorted contents, is the name of the file returned by the call to mkstemp(). The ">" symbol is essential; it redirects output to the file rather than the screen. Your program will have to build the argument string before calling system(). You can build formatted strings pretty nicely using sprintf(), which has the formatting features of printf() but puts the results in a string instead to the screen.
    5. opens the temporary file for reading, then writes its contents to the first pipe (use a loop that reads then writes 20 bytes at a time).
    6. When finished writing, the child should close the pipe, then close and delete the temporary file.
  5. The second child similarly sorts and outputs the second file (file2) via the second pipe in a similar manner (described in step 4).
  6. Let's get back to the parent! After the two forks, the parent will read lines from the two pipes and write the sorted results to the output file or screen. Specifically, it:
    1. closes the write end of both pipes
    2. reads a line from each pipe. I am providing you a function to read and return one line of text from a given file descriptor.
    3. performs a "while" loop that terminates when both pipes are empty. In each iteration of this loop, it
      • compares the two lines using the strcmp() function,
      • writes the "smaller" to file3 or the screen using the write() function. When your program runs, standard output (the screen) is automatically opened for output at file descriptor 1. So all you need is an int variable to contain either the standard output file descriptor (which is 1) or the file3 file descriptor (which is the return value from the open() or creat() function), then write to that file descriptor.
      • reads the next line from the smaller one's pipe
      This is just like the merge operation in the merge sort you studied in the olden days. I will give you a quick demo if you have forgotten it or have not studied it before.
    4. closes the read end of the pipes and
    5. terminates gracefully.

Additional Specifications

The functions and system calls below are more fully described and illustrated with example programs in a supplementary page.

Suggested Approach

It would be wise to adopt a top-down approach to this project. Write your program to fork off a couple of child processes which simply announce their existence then croak. Then you know the fork part is working. Then add the pipes. Have each child process write a single string to its pipe, and have the parent read them then shut down the pipes. At this point, you know that both the fork and pipe parts are working. Then work on getting the command line arguments into your program (if you don't already know how to do this). Then have the children read from their respective files and echo file contents to the parent . . . and so forth. The idea is to start with something simple, get it working, then refine one step at a time, testing and debugging as you go.

Scoring

The maximum 15 points are allocated as follows:

TO TURN IN

Submit your smerge.c file as an email attachment to psanderson@otterbein.edu. No late submissions will be accepted; June 9 is the day before our (8 am!) final exam.


[ C SC 340 | Peter Sanderson | Math Sciences server  | Math Sciences home page | Otterbein ]

Last updated:
Peter Sanderson (PSanderson@otterbein.edu)