COMP 3400 Project 3: The Hybrid Sort-Merger
Spring 2017
Due: Thursday 23 March 2017
Worth 50 points of your 200 project points
This project uses the C language on Linux. "Individual with a partner"

Introduction

The "bounded buffer" process synchronization problem is embodied in Unix by the pipe construct, which manages the synchronization for you. You will utilize pipes to solve the following problem.

This program will sort and merge the contents of two ASCII files, sending the result either to a file or to standard output. It is called a hybrid because the sorting will be performed by shell commands activated within your C program, and the merging will be handled by your code.

Basics of composing, compiling and running C programs from a Linux command interpreter

You will develop this project using Cygwin. I am giving you a flash drive that has Cygwin pre-installed. The Cygwin home page describes it as "a large collection of GNU and Open Source tools which provide functionality similary to a Linux distribution on Windows."

Although Linux has a graphical user interface (several, actually), you will use a command interface for this project. By working with the textual command-based interface you are immersed into a development environment used by many operating system developers. If you have used DOS command interfaces on Windows systems, you will be familiar with the concepts. The interface issues a prompt, you type in a command, the command is carried out, its results (if any) are displayed, and another prompt is issued. Unix/Linux command interfaces are called shells, and several are available. The default for our installation is bash, the Bourne Again SHell. This is a pun on the Bourne shell developed for Unix by Steven Bourne.

Here's how to get to the Linux command interface.

Once there, there are a few basic things you need to do to compose, compile and run C programs.

Specifications

Your program will be run with the command:

     smerge file1 file2 [ file3 ]

Your program must produce sorted versions of file1 and file2 contents, then merge their lines so the result is also sorted, then output the result either to file3 if specified, or to standard output (the screen). The square brackets [ and ] indicate that file3 is optional; they are not part of the command. Each file consists of one or more lines of text; each line is terminated with the end-of-line character. Detailed specifications are given here.

The program must:

  1. Verify the command arguments. Notice that main has this header: int main(int argc, char *argv[])
    Inside main, argc contains the count of command line arguments, including the command name. It should have the value 3 or 4, depending on whether or not a destination file is specified. The second parameter argv is an array of strings, each of which contains one word from the command line. argv[0] contains "smerge". The files whose names are in argv[1] and argv[2] must exist or else the program will terminate with an appropriate error message. If a destination file is given in argv[3] and it already exists, it will be overwritten with no message to the user.
  2. Call pipe() twice to create two pipes. The pipe() function requires one argument, an array of two int. The two arrays are provided for you (pipe1 and pipe2). You do not need to initialize the array elements because their values will be produced by the function. The pipe() function returns an int that indicates the success or failure of the operation. After a successful call, the argument array elements [0] and [1] will contain integer file descriptors that represent the read and write pipe channel, respectively.
  3. Call fork() twice to create two child processes. The parent and children proceed "in parallel". Since the first fork creates a child, structure your logic so that the second fork is performed only by the original parent.
  4. The first child:
    1. closes the read file descriptor of the first pipe
    2. closes both the read and write file descriptors of the second pipe
    3. creates a temporary file using mkstemp(). This is a function that creates and opens a file whose name is randomly generated based on a template that you provide.
    4. calls system(). This is a function that takes a string parameter where the string is a shell command, and carries out the command. You will give it the command to read the first file (file1) and write the sorted version of its contents to the temporary file. This is done using the shell command sort. The shell command needs to be something like "sort file > tempfile", where file is the input file, and tempfile, which will receive the sorted contents, is the name of the file returned by the call to mkstemp(). The ">" symbol is essential; it redirects output to the file rather than the screen. Your program will have to build the command string before calling system(). You can build formatted strings pretty nicely using sprintf(), which has the formatting features of printf() but puts the results in a string instead to the screen.
    5. opens the temporary file for reading, then writes its contents to the first pipe (use a loop that reads then writes 20 bytes at a time).
    6. When finished writing, the child should close the pipe, then close and delete the temporary file.
  5. The second child similarly sorts and outputs the second file (file2) via the second pipe in a similar manner (described in step 4).
  6. Let's get back to the parent! After the two forks, the parent will read lines from the two pipes and write the sorted results to the output file or screen. Specifically, it:
    1. closes the write end of both pipes
    2. reads a line from each pipe. I am providing you a function to read and return one line of text from a given file descriptor.
    3. performs a "while" loop that terminates when both pipes are empty. In each iteration of this loop, it
      • compares the two lines using the strcmp() function,
      • writes the "smaller" to file3 or the screen using the write() function. When your program runs, standard output (the screen) is automatically opened for output at file descriptor 1. So all you need is an int variable to contain either the standard output file descriptor (which is 1) or the file3 file descriptor (which is the return value from the open() or creat() function), then write to that file descriptor.
      • reads the next line from the smaller one's pipe
      This is just like the merge operation in the merge sort you studied in the olden days. I will give you a quick demo if you have forgotten it or have not studied it before.
    4. closes the read end of both pipes and
    5. terminates gracefully.

Additional Specifications

The functions and system calls below are more fully described and illustrated with example programs in a supplementary page. Note that the example programs, and smerge.c itself, will not compile or run on a Windows system.

Recommended Approach

It would be wise to adopt a top-down approach to this project. Write your program to fork off a couple of child processes which simply announce their existence then exit. Then you know the fork part is working. Then add the pipes. Have each child process write a single string to its pipe, and have the parent read them then shut down the pipes. At this point, you know that both the fork and pipe parts are working. Then work on getting the command line arguments into your program (if you don't already know how to do this). Then have the children read from their respective files and echo file contents to the parent . . . and so forth. The idea is to start with something simple, get it working, then refine one step at a time, testing and debugging as you go.

Scoring

The maximum 50 points are allocated as follows:

To Turn In

Send me your completed smerge.c file as an email attachment to psanderson@otterbein.edu.


[ COMP 3400 | Peter Sanderson | Math Sciences home page | Otterbein ]

Last updated:
Peter Sanderson (PSanderson@otterbein.edu)