Project 3: CGI

Due by: Friday, April 11, 2025 at 11:59 p.m.

Introduction

You might not be familiar with CGI, but it forms the basis of dynamic web programming. CGI is based on the idea that HTML (the language of the web) is just text. As such, it does not have to be stored in a persistent file. Instead, you can create a program that produces the desired HTML when it runs. Take the following program for example:

#include <stdio.h>

int
main (void)
{
  printf ("<html>\n");
  printf ("<head>\n  <title>Hello world demo</title>\n</head>\n\n");
  printf ("<h2>Hello world!</h2>\n");
  printf ("</body>\n");
  printf ("</html>\n");
  return 0;
}

When this program runs, it would produce the following HTML code:

<html>
<head>
  <title>Hello world demo</title>
</head>

<h2>Hello world!</h2>
</body>
</html>

As with any C program, the CGI source code has to be compiled. These compiled executables are then placed in a directory (called cgi-bin in our case), and they often have a .cgi file name extension added to them. Then, the following HTTP request could be used to request the execution of the CGI program:

GET /cgi-bin/hello.cgi HTTP/1.1\r\n
Host: localhost\r\n
Connection: close\r\n
\r\n

Project Structure

Download the archive given below and unzip it into an appropriate directory.

Note that this project requires a considerable amount of string processing. Getting to know the following functions will make this work considerably more manageable:

realloc()
Useful for concatenating dynamically allocated strings. First, use realloc() to increase the space allocated for the first string, and then use strncat() to append the second string to the first.
strdup()
Creates a dynamically allocated copy of a string on the heap. This is useful when you are starting with a pointer to read-only memory like a string constant or to a string on the stack. Using strdup() to copy it to the heap will allow you to increase its size, change characters (converting a '\r' character into '\0', for example), split it into multiple strings, and so on.
strncat() and strncpy()
Standard functions for concatenating two strings or copying one into a buffer.
strchr() and strstr()
Can be used to find a pointer to the first occurrence of a character or substring within a string.
strtok() and strtok_r()
These functions split strings into tokens based on a delimiter, returning a pointer to the token within the original string. Caution: These functions both require that the original string is modifiable because they change characters to null bytes. For example, using either function on the string "hi bye now" with the delimiter " " will return a pointer to "hi" but also change the first space to '\0', making the string "hi\0bye now". The difference between the two functions is that strtok() uses a static variable to know where to continue in the original string, but strtok_r() requires you to pass in the address of a pointer that says where the function should continue.

CGI Source Files

Your first task is to create a couple of small CGI programs that perform dynamic computation. Doing so requires you to work in a new directory structure.

Within the project structure, you will find a cgi-src directory that contains framework C source files. To compile these programs, you can either run make from the cgi-src directory, or you can run make cgibin from the main project directory. Note that the CGI programs have their own integration tests. You can run just these tests with make test in the cgi-src directory, or you can run make cgitest from the main project directory.

You will also need to become familiar with the concept of environment variables. Recall from Section 2.6 of the textbook that there are variants of exec() that allow you to set environment variables in a third parameter. These variables are traditionally set as KEY=value, where value is the rest of that string (which could possibly include additional equal signs, as well). The last element of these environment arrays is always NULL, as show in the following example:

char *args[] = { NULL };
char *env[] = { "EVAR=x=1&y=txt", NULL };
execve (myexec, args, env);

After you set the environment variables this way, you can access them from inside the CGI program as a string as follows:

char *value = getenv ("EVAR");
printf ("%s\n", value); // Prints "x=1&y=txt"

Stage 1: CGI Programs

Begin by working on the following CGI programs:

  • Implement cgi-src/shutdown.cgi to print the specified HTML code to stdout. In server.c, before returning from serve_request(), raise the SIGUSR1 signal on the server (the current process) when this specific URI is received as a request.
  • Implement cgi-src/show.cgi to read in the contents of a database file (data/data.txt) containing hash values and file names. After reading in this file, produce the HTML as specified. You will add more functionality to this program in later stages.

Next, you will need to modify server.c and cgi_response.c to execute the CGI programs as needed. Specifically, serve_request() makes a call to retrieve_request(), which will retrieve the HTTP request (and helpfully parses HTTP headers, getting the URI and other details for you). You will use the data you get back from that function call to craft the CGI response. Recall that you can use dup2() to redirect the stdout from a program. Once you have the program's output, you can use strlen() to set the HTTP Content-Length header.

Stage 2: Dynamically Generating HTML

For this next stage, you will need to modify cgi-src/show.cgi to read parameters based on environment variables and display records from the data files differently, based on those environment variables. Here are examples of how to react to environment variables being set in various ways:

db=foo.txt
Use data/foo.txt as the database.
record=2
Show only the second record. Our implementation uses a 1-based system where 1 = the first row of the file.
hash=f1f4b8705111a70c176a942d26f765fb548e2be9
Check to see if the record's hash matches the specified value.
QUERY_STRING=db=foo.txt&record=2
A special case where all environment variables are passed in a single string.

Refer to the earlier section on CGI Source Files for information about how to read environment variables.

Note: All tests for this stage are in the cgi-src directory, not the main project directory. You can run make test there to test only the CGI programs. You can also run make test from the base directory, but it will run other tests as well.

Stage 3: Setting Environment Variables

Stage 2 focuses on writing CGI programs that use environment variables. This final stage focuses on setting the variables. How you do this depends on the incoming HTTP request. One way is through an HTTP GET request, which uses the query string structure:

GET /cgi-bin/show.cgi?db=foo.txt&record=2 HTTP/1.1\r\n
Host: localhost\r\n
Connection: close\r\n
\r\n

Recall from earlier that the call to retrieve_request() sets several variables. In this case, the query variable would contain the string "db=foo.txt&record=2". You can use this information to set the QUERY_STRING environment variable when you run the appropriate CGI file.

The other (more complicated) way to get information about how to set environment variables is through a POST request, which might look like this:

POST /cgi-bin/show.cgi HTTP/1.1\r\n
Host: localhost\r\n
Connection: close\r\n
Content-Type: multipart/form-data; boundary=------WebKitFormBoundary4XdOKY1sHBOLMWEE\r\n
[other HTTP headers we are ignoring...]\r\n
\r\n
--------WebKitFormBoundary4XdOKY1sHBOLMWEE\r\n
Content-Disposition: form-data; name="db"\r\n
\r\n
foo.txt\r\n
--------WebKitFormBoundary4XdOKY1sHBOLMWEE\r\n
Content-Disposition: form-data; name="hash"\r\n
\r\n
f1f4b8705111a70c176a942d26f765fb548e2be9\r\n
--------WebKitFormBoundary4XdOKY1sHBOLMWEE\r\n
Content-Disposition: form-data; name="record"\r\n
\r\n
2\r\n
--------WebKitFormBoundary4XdOKY1sHBOLMWEE--\r\n

With POST requests, the variables are passed in the HTTP body, not the header. The body is provided as the body variable from retrieve_request(). Each environment variable corresponds to a separate Content-Disposition entry separated by boundaries. The boundary is declared in one of the HTTP headers and is provided for you in the boundary variable from retrieve_request(). Between boundaries, the name field specifies the name of the environment variable to set, and its value appears by itself after a blank line.

There are two small things to note about the boundaries. First, each boundary line in the body begins with two additional hyphens ("--"). Second, the last line has two additional hyphens after the boundary. Use this information when parsing this data.

The forms used in the test cases can be found in tests/forms. We strongly recommend that you write a small program to read in one of these forms and parse it correctly. Parsing this data is one of the trickiest parts of this project. Once you have a working algorithm for parsing, you can copy and paste from your sample program into your main CGI processing file.

Turn In

First, run make clean to remove all of the executable and object files from your directory structure. Then, zip up the contents of your project directory in either a .zip or a .tar.xz file. Upload this archive to Brightspace. Your project must be submitted by Friday, April 11, 2025 at 11:59 p.m. unless you're going to use a grace day.

All work must be done within assigned teams, though you may discuss general concepts with your classmates. Please refer to the course policies if you have any questions about academic integrity. If you have trouble with the assignment, I am always available for assistance.

Under no circumstances should any member of one team look at the code written by another team. Tools will be used to detect code similarity automatically.

Code that does not compile will automatically score zero points.

Grading

Your grade will be determined by the following categories.

Category Description Weight
CGI Integration Tests Your solution passes the 11 integration tests for the CGI programs. 33%
Unit Tests Your solution passes the two unit tests. 14%
Server Integration Tests Your solution passes the 11 integration tests for the server. 33%
Memory Leaks Your code has no memory leaks. 10%
Style Your solution conforms to GNU style, passes the style checks, and most importantly, is readable. 10%

Credits

This assignment is based on material from CS 361 by Michael S. Kirkpatrick. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.