C SC 340 Project 4: The Hybrid Sort-Merger
Spring 2009
Due: Tuesday 9 June 2009
No late submissions accepted.
Worth 15 points of your 100 project points
This project uses the C language on Linux.  "Individual with a partner" 
 
  
  Introduction
The "bounded buffer" process synchronization problem
is embodied in Unix by the pipe construct, which manages
the synchronization for you.  You will utilize pipes to solve
the following problem.
This program will sort and merge two ASCII files, sending the
result either to a file or to standard output.  It is called a
hybrid because the sorting will be performed by shell commands
activated within your C program, and the merging will be handled
by your code.  
Specifications
Your program runs with the command:
     smerge   file1   file2  [ file3 ]
 Your program must produce sorted versions of file1 and file2 
  contents, then merge their lines so the result is also sorted, then output the 
  result either to file3 if specified, or to standard output (the screen). 
  The [ and ] indicate that file3 is optional; they are not part of the 
  command. Each file consists of one or more lines of text; each line is terminated 
  with the end-of-line character. Detailed specifications are given here. 
The program must: 
- Verify the command arguments. Notice that main has this header:
  int main(int argc, char *argv[])
 Inside main, argc contains the count of command line arguments, including 
    the command name. It should have the value 3 or 4, depending on whether 
    or not a destination file is specified. The second parameter argv 
    is an array of strings, each of which contains one word from the command line. 
    argv[0] contains "smerge". The files whose names are in argv[1] 
    and argv[2] must exist or else the program will terminate with an 
    appropriate error message. If a destination file is given in argv[3] 
    and it already exists, it will be overwritten with no message to the user.
- Call pipe() twice to create two 
  pipes. This function requires one argument, an array of two int.  The array values do not matter because
  they will be produced by the function. 
  It returns an int that indicates the success or failure of the operation. After 
  a successful call, the argument array elements [0] and [1] will contain integer file 
  descriptors that represent the read and write pipe channel, respectively. 
- Call fork() twice
to create two child processes.  The parent and children proceed
"in parallel".  Since the first fork creates a child, structure your logic so that the second fork
is performed only by the original parent.
- The first child:
   
   - closes the read file descriptor of the first pipe
- closes both the read and write file descriptors of the second pipe
- creates a temporary file using mkstemp().
    This is a function that creates and opens a file whose name is randomly generated based on a template that you provide.
   
- calls system().  This is a function that takes a string parameter
    where the string is a shell command, and carries out the command.  You will give it the command to read the first
    file (file1) and write the sorted version of it contents to the temporary file.  This is done using the shell 
    command sort.
	The shell command needs to be something like "sort file 
  > tempfile", where file is the input file, 
  and tempfile, which will receive the sorted contents, is the name of the file returned by the call 
  to mkstemp(). The ">" symbol is essential; it redirects 
  output to the file rather than the screen. Your program will have to build the argument string before 
  calling system(). You can build formatted strings pretty nicely using 
  sprintf(), which has the formatting features of printf() but 
  puts the results in a string instead to the screen.
- opens the temporary file for reading, then writes its contents to the first pipe (use a loop that reads then writes 20 bytes at a time). 
- When finished writing, the child should close the pipe, then 
  close and delete the temporary file.
 
- The second child similarly sorts and outputs the second file (file2)
via the second pipe in a similar manner (described in step 4).
- Let's get back to the parent! After the two forks, the parent will read 
    lines from the two pipes and write the sorted results to the output file or 
    screen. Specifically, it: 
    
  - closes the write end of both pipes 
- reads a line from each pipe.  I am providing you a function to read and return one line of
  text from a given file descriptor.  
- performs a "while" loop that terminates when both pipes are empty.  In each iteration of this loop, it
     
	 - compares the two lines using the strcmp() function,
     
- writes the "smaller" to file3 or the screen using the write() function.
	 When your program runs, standard output (the screen) is automatically opened for output at file 
     descriptor 1. So all you need is an int variable to contain either 
     the standard output file descriptor (which is 1) or the file3 file descriptor (which is the 
     return value from the open() or creat() function), then write to that 
     file descriptor.
	 
- reads the next line from the smaller one's pipe
	 
 This is just like the merge operation in the merge sort you studied in the olden days.  I will give you a 
  quick demo if you have forgotten it or have not studied it before.
- closes the read end of the pipes and
- terminates gracefully. 
 
Additional Specifications
The functions and system calls below are more fully described and illustrated 
  with example programs in a supplementary page.
  
  
  - Use only creat(), open(), 
    read(), write() 
    and close() for I/O, including output 
    to the screen. Use open() to open 
    an existing file and creat() to create 
    and open a file that does not exist. Each of them returns a file descriptor 
    of type int, which is subsequently used as the first argument for 
    read(), write(), 
    and close(). Pipe() 
    also returns file descriptors, so you use read() 
    and write() with pipes as well as 
    with files. When finished with a file descriptor, call close(). 
    Delete a file using unlink(). When 
    creating a file with creat(), use 
    the file name as the first argument and 0755 as the second argument (must 
    have the leading zero to be interpreted as octal). 
 
- Your C program should use system calls in this list. The header file is 
    given for each one. To use open(), for example, your program needs 
    to do: #include <fcntl.h> at the top. The perror, system 
    and mkstemp system calls are described in manual section 3 (to see 
    documentation, type for example: man 3 system); the others are in 
    section 2 (for example: man 2 fork). 
 
       
        | pipe() | <unistd.h> | close() | <unistd.h> |   
        | fork() | <unistd.h> | read() | <unistd.h> |   
        | perror() | <stdio.h> | write() | <unistd.h> |   
        | system() | <stdlib.h> | mkstemp() | <stdlib.h> |   
        | open() | <fcntl.h> | unlink() | <unistd.h> |   
        | creat() | <fcntl.h> |  |  |  
 
 The help text produced by man is displayed one screen at a time.  Notice the cursor 
	will be next to a colon at the bottom of the screen.  Press the Enter or 'f' key to see
	the next screen of text (forward), the 'b' key to see the previous screen (backward), and 'q' to quit.
 
- Make the program as robust as you can. Check for erroneous return codes 
    from functions and output appropriate messages to standard error. The perror() 
    function can be used for this. 
 
- When a pipe is created, file descriptors are created for both a reader and 
    a writer. Proper pipe etiquette requires that the writer close its read pipe 
    before proceeding, and the reader first closes its write pipe. Pipes must 
    be created before the forks, so the parent and child will share the 
    file descriptors. 
 
- Your solution must be well-structured.  To help you with this, I have defined functions having the
  following specifications.  Right-click here while running Firefox within your Linux guest OS, to download smerge.c.  The getLine() function is provided for you and your assignment is to fill in details of the others.
  
    |  |  | // Read one line of text into char array, excluding newline character. Array terminated with '\0'. // Return: 0 if end of file, 1 otherwise.
 // Parameter: fd is file descriptor from which to read,
 // Parameter: line is char array to contain
		         the text being read.
 |  | int getLine(int fd, char *line) |  |  |  | // Merge contents of two input streams and write results to output stream. // Parameter: fd1 is file descriptor for one input stream,
 // Parameter: fd2 is file descriptor for the other input stream,
 // Parameter: out is file descriptor for output stream.
 |  | void merge(int fd1, int fd2, int out) |  |  |  | // Sort contents of specified file and write to specified pipe. // Parameter: file is char array containing file name,
 // Parameter: outPipe is file descriptor for the write pipe
 |  | void sortAndPipe(char *file, int outPipe) |  |  |  | // Perform a child's task: do pipe maintenance, call sortAndPipe() function. // Parameter: myPipe array containing my pipe's file descriptors
 // Parameter: otherPipe array containing other pipe's file descriptors
 // Parameter: fileName char array containing my input file name
 |  | void child(int *myPipe, int *otherPipe, char *fileName) |  |  |  | // Perform the parent's task: create output file if necessary, do pipe
	maintenance, call merge() function. // Parameter: argc argument count received by main()
 // Parameter: argv argument list received by main()
 // Parameter: pipe1 array containing first pipe's file descriptors
 // Parameter: pipe2 array containing second pipe's file descriptors
 |  | void parent(int argc, char *argv[], int *pipe1, int *pipe2) |  |  |  | // Check smerge command line for validity. // Return:  0 if command line valid else -1
 // Parameter: argc argument count received by main()
 // Parameter: argv argument list received by main()
 |  | int checkArgs(int argc, char *argv[]) |  |  |  | // Perform main task: validate arguments, create two pipes, fork two children, parent calls parent(),
	  both children call child(). // Return: 0 if command worked else 1
 // Parameter: argc number of command arguments, including command name
 // Parameter: argv array containing command words
 |  | int main(int argc, char *argv[]) |  |  |  
 
Suggested  Approach
It would be wise to adopt a top-down approach to this project.
 Write your program to fork off a couple of child processes which
simply announce their existence then croak.  Then you know the
fork part is working.  Then add the pipes.  Have each child process
write a single string to its pipe, and have the parent read them
then shut down the pipes.  At this point, you know that both the
fork and pipe parts are working.  Then work on getting the command
line arguments into your program (if you don't already
know how to do this).  Then have the children read from their
respective files and echo file contents to the parent . . . and
so forth.  The idea is to start with something simple, get it
working, then refine one step at a time, testing and debugging
as you go.
Scoring
The maximum 15 points are allocated as follows:
  - 6 points Program structure (structured code, follows described algorithm, correct 
    use of fork, pipe and I/O system calls)
- 6 points Functionality (works correctly given valid command arguments)
- 2 points Robustness (handles invalid command arguments, non-existing 
    files, and error return values gracefully)
- 1 point Documentation, including comments and self-documenting identifiers
TO TURN IN
 Submit your smerge.c file as an email attachment to psanderson@otterbein.edu. 
  No late submissions will be accepted; June 9 is the day before our (8 am!) final exam. 
  
[ C
SC 340 | Peter
Sanderson | Math Sciences server 
| Math Sciences home page
| Otterbein ]
Last updated: 
Peter Sanderson (PSanderson@otterbein.edu)