C SC 340 Project 4: The Hybrid Sort-Merger
Spring 2009
Due: Tuesday 9 June 2009
No late submissions accepted.
Worth 15 points of your 100 project points
This project uses the C language on Linux. "Individual with a partner"
Introduction
The "bounded buffer" process synchronization problem
is embodied in Unix by the pipe construct, which manages
the synchronization for you. You will utilize pipes to solve
the following problem.
This program will sort and merge two ASCII files, sending the
result either to a file or to standard output. It is called a
hybrid because the sorting will be performed by shell commands
activated within your C program, and the merging will be handled
by your code.
Specifications
Your program runs with the command:
smerge file1 file2 [ file3 ]
Your program must produce sorted versions of file1 and file2
contents, then merge their lines so the result is also sorted, then output the
result either to file3 if specified, or to standard output (the screen).
The [ and ] indicate that file3 is optional; they are not part of the
command. Each file consists of one or more lines of text; each line is terminated
with the end-of-line character. Detailed specifications are given here.
The program must:
- Verify the command arguments. Notice that main has this header:
int main(int argc, char *argv[])
Inside main, argc contains the count of command line arguments, including
the command name. It should have the value 3 or 4, depending on whether
or not a destination file is specified. The second parameter argv
is an array of strings, each of which contains one word from the command line.
argv[0] contains "smerge". The files whose names are in argv[1]
and argv[2] must exist or else the program will terminate with an
appropriate error message. If a destination file is given in argv[3]
and it already exists, it will be overwritten with no message to the user.
- Call pipe() twice to create two
pipes. This function requires one argument, an array of two int. The array values do not matter because
they will be produced by the function.
It returns an int that indicates the success or failure of the operation. After
a successful call, the argument array elements [0] and [1] will contain integer file
descriptors that represent the read and write pipe channel, respectively.
- Call fork() twice
to create two child processes. The parent and children proceed
"in parallel". Since the first fork creates a child, structure your logic so that the second fork
is performed only by the original parent.
- The first child:
- closes the read file descriptor of the first pipe
- closes both the read and write file descriptors of the second pipe
- creates a temporary file using mkstemp().
This is a function that creates and opens a file whose name is randomly generated based on a template that you provide.
- calls system(). This is a function that takes a string parameter
where the string is a shell command, and carries out the command. You will give it the command to read the first
file (file1) and write the sorted version of it contents to the temporary file. This is done using the shell
command sort.
The shell command needs to be something like "sort file
> tempfile", where file is the input file,
and tempfile, which will receive the sorted contents, is the name of the file returned by the call
to mkstemp(). The ">" symbol is essential; it redirects
output to the file rather than the screen. Your program will have to build the argument string before
calling system(). You can build formatted strings pretty nicely using
sprintf(), which has the formatting features of printf() but
puts the results in a string instead to the screen.
- opens the temporary file for reading, then writes its contents to the first pipe (use a loop that reads then writes 20 bytes at a time).
- When finished writing, the child should close the pipe, then
close and delete the temporary file.
- The second child similarly sorts and outputs the second file (file2)
via the second pipe in a similar manner (described in step 4).
- Let's get back to the parent! After the two forks, the parent will read
lines from the two pipes and write the sorted results to the output file or
screen. Specifically, it:
- closes the write end of both pipes
- reads a line from each pipe. I am providing you a function to read and return one line of
text from a given file descriptor.
- performs a "while" loop that terminates when both pipes are empty. In each iteration of this loop, it
- compares the two lines using the strcmp() function,
- writes the "smaller" to file3 or the screen using the write() function.
When your program runs, standard output (the screen) is automatically opened for output at file
descriptor 1. So all you need is an int variable to contain either
the standard output file descriptor (which is 1) or the file3 file descriptor (which is the
return value from the open() or creat() function), then write to that
file descriptor.
- reads the next line from the smaller one's pipe
This is just like the merge operation in the merge sort you studied in the olden days. I will give you a
quick demo if you have forgotten it or have not studied it before.
- closes the read end of the pipes and
- terminates gracefully.
Additional Specifications
The functions and system calls below are more fully described and illustrated
with example programs in a supplementary page.
- Use only creat(), open(),
read(), write()
and close() for I/O, including output
to the screen. Use open() to open
an existing file and creat() to create
and open a file that does not exist. Each of them returns a file descriptor
of type int, which is subsequently used as the first argument for
read(), write(),
and close(). Pipe()
also returns file descriptors, so you use read()
and write() with pipes as well as
with files. When finished with a file descriptor, call close().
Delete a file using unlink(). When
creating a file with creat(), use
the file name as the first argument and 0755 as the second argument (must
have the leading zero to be interpreted as octal).
- Your C program should use system calls in this list. The header file is
given for each one. To use open(), for example, your program needs
to do: #include <fcntl.h> at the top. The perror, system
and mkstemp system calls are described in manual section 3 (to see
documentation, type for example: man 3 system); the others are in
section 2 (for example: man 2 fork).
pipe() |
<unistd.h> |
close() |
<unistd.h> |
fork() |
<unistd.h> |
read() |
<unistd.h> |
perror() |
<stdio.h> |
write() |
<unistd.h> |
system() |
<stdlib.h> |
mkstemp() |
<stdlib.h> |
open() |
<fcntl.h> |
unlink() |
<unistd.h> |
creat() |
<fcntl.h> |
|
|
The help text produced by man is displayed one screen at a time. Notice the cursor
will be next to a colon at the bottom of the screen. Press the Enter or 'f' key to see
the next screen of text (forward), the 'b' key to see the previous screen (backward), and 'q' to quit.
- Make the program as robust as you can. Check for erroneous return codes
from functions and output appropriate messages to standard error. The perror()
function can be used for this.
- When a pipe is created, file descriptors are created for both a reader and
a writer. Proper pipe etiquette requires that the writer close its read pipe
before proceeding, and the reader first closes its write pipe. Pipes must
be created before the forks, so the parent and child will share the
file descriptors.
- Your solution must be well-structured. To help you with this, I have defined functions having the
following specifications. Right-click here while running Firefox within your Linux guest OS, to download smerge.c. The getLine() function is provided for you and your assignment is to fill in details of the others.
|
// Read one line of text into char array, excluding newline character. Array terminated with '\0'.
// Return: 0 if end of file, 1 otherwise.
// Parameter: fd is file descriptor from which to read,
// Parameter: line is char array to contain
the text being read. |
int getLine(int fd, char *line) |
|
// Merge contents of two input streams and write results to output stream.
// Parameter: fd1 is file descriptor for one input stream,
// Parameter: fd2 is file descriptor for the other input stream,
// Parameter: out is file descriptor for output stream. |
void merge(int fd1, int fd2, int out) |
|
// Sort contents of specified file and write to specified pipe.
// Parameter: file is char array containing file name,
// Parameter: outPipe is file descriptor for the write pipe |
void sortAndPipe(char *file, int outPipe) |
|
// Perform a child's task: do pipe maintenance, call sortAndPipe() function.
// Parameter: myPipe array containing my pipe's file descriptors
// Parameter: otherPipe array containing other pipe's file descriptors
// Parameter: fileName char array containing my input file name |
void child(int *myPipe, int *otherPipe, char *fileName) |
|
// Perform the parent's task: create output file if necessary, do pipe
maintenance, call merge() function.
// Parameter: argc argument count received by main()
// Parameter: argv argument list received by main()
// Parameter: pipe1 array containing first pipe's file descriptors
// Parameter: pipe2 array containing second pipe's file descriptors |
void parent(int argc, char *argv[], int *pipe1, int *pipe2) |
|
// Check smerge command line for validity.
// Return: 0 if command line valid else -1
// Parameter: argc argument count received by main()
// Parameter: argv argument list received by main() |
int checkArgs(int argc, char *argv[]) |
|
// Perform main task: validate arguments, create two pipes, fork two children, parent calls parent(),
both children call child().
// Return: 0 if command worked else 1
// Parameter: argc number of command arguments, including command name
// Parameter: argv array containing command words |
int main(int argc, char *argv[]) |
|
Suggested Approach
It would be wise to adopt a top-down approach to this project.
Write your program to fork off a couple of child processes which
simply announce their existence then croak. Then you know the
fork part is working. Then add the pipes. Have each child process
write a single string to its pipe, and have the parent read them
then shut down the pipes. At this point, you know that both the
fork and pipe parts are working. Then work on getting the command
line arguments into your program (if you don't already
know how to do this). Then have the children read from their
respective files and echo file contents to the parent . . . and
so forth. The idea is to start with something simple, get it
working, then refine one step at a time, testing and debugging
as you go.
Scoring
The maximum 15 points are allocated as follows:
- 6 points Program structure (structured code, follows described algorithm, correct
use of fork, pipe and I/O system calls)
- 6 points Functionality (works correctly given valid command arguments)
- 2 points Robustness (handles invalid command arguments, non-existing
files, and error return values gracefully)
- 1 point Documentation, including comments and self-documenting identifiers
TO TURN IN
Submit your smerge.c file as an email attachment to psanderson@otterbein.edu.
No late submissions will be accepted; June 9 is the day before our (8 am!) final exam.
[ C
SC 340 | Peter
Sanderson | Math Sciences server
| Math Sciences home page
| Otterbein ]
Last updated:
Peter Sanderson (PSanderson@otterbein.edu)