Project 5: LMDb: Local Movie Database
Due by: Wednesday, April 9, 2025 at 11:59 p.m.
Overview
You're going to write a program that stores the name, year, running length (in minutes), genre, and revenue earned (if available) of a huge number of movies. This program will allow you to search for movies with a particular name as well as display the highest earning movie in each genre and give overall statistics for each genre. Of course, it will also let you add and delete movies from the database. The movies will be stored in 25 binary search trees, one for each of the 25 genres.
The commands for this program can be entered from the keyboard (or, of course, redirected from a file). Two of these commands will involve file I/O so that the database can be saved between executions of the program.
The roster program must implement the following commands:
add
Adds a movie to the current list. This command will read in the name, year, length, genre, and revenue (if any) on separate lines. Follow the formatting in the sample executable. Use the genre information to decide which of the 25 binary search trees to add the movie to. The movie name will be no longer than 1024 charaters, including a null character. The genre name will be no longer than 100 characters, including a null character.clear
Clears the current list of movies. This command will free all the memory allocated in all 25 binary search trees and set each of those pointers toNULL
.delete
Deletes a movie from the current list. This command will read in the name, year, and genre of a movie and attempt to remove it from the binary search corresponding to its genre. It should also free the memory associated with the movie.find
Searches for a movie. This command will read in a name and print all the movies that match, starting with the first genre and moving to the last. In each binary search tree, use an inorder traversal so that matching movies are printed in sorted order by year. It should also print the total number of movies whose names match.help
Displays help information. Use the sample executable to determine the output of this command.highest
Prints the highest revenue movie for each genre. The movie name (if known) will take up 20 spaces, followed by a space. For more details, follow the formatting in the sample executable.load
Adds the contents of a file to the list of movies. This command will prompt for the name of a file. The fields of each movie will be found in the file in this order: name, year, length, genre, and revenue. These fields will be separated by tabs, followed by a newline after all the data in each movie. The data is guaranteed to be correctly formatted, but some of the values might not be legal. The file name will be no longer than 100 characters, including a null character. The movie name will be no longer than 1024 charaters, including a null character. The genre name will be no longer than 100 characters, including a null character.quit
Quits the program. All memory should be freed.revenue
Changes the revenue of a movie to the specified amount. This command will read in the name, year, genre, and new revenue value of a movie. Follow the formatting in the sample executable.save
Saves the current list of movies to a file. This command will prompt for the name of a file. The fields of each movie should be printed in the same format as they are read in the load command. Print the tree for each genre in order, and then use a preorder traversal within each tree.statistics
Prints out statistics for each genre. The genre name will take up 12 spaces, followed by a tab. The count number will take up 10 spaces, followed by a tab. For more details, follow the formatting in the sample executable.
The ultimate authority for the behavior of the commands and the output format will be the sample executable.
Implementation Details
The Forest for the Trees
You must store the databases of movies in binary search trees. If you're rusty on binary search trees, here is a decent Wikipedia article to refresh you. Likewise, a lab will focus on trees.
The nodes you will be using to build your trees with are of type Movie
, defined in movie.h. Note that any string without a tab or a newline is a legal movie name. However, a movie's year must be between 1900 and 2050, inclusive. Likewise, both a movie's running time in minutes and its revenue must be non-negative. Finally, the genre of a movie must be specified by one of the 25 names given in the GENRE_NAMES
array provided in names.c. Note that this array is a global variable and should be declared as extern
in any other source file that uses it.
Binary search trees depend on the ability to order nodes. "Lesser" nodes go to the left. "Greater" nodes go to the right. In order to compare two Movie
nodes, use strcmp()
to compare names (stored in the name
field). If the two names are equal, use the years (stored in the year
field) as a tie-breaker, with movies made in earlier years ordered before movies with the same name but a later year.
You should have a binary search tree for each of the 25 genres, stored in an array holding Movie*
values as the root of each tree. In order to implement your trees, I recommend that you implement some version of the following functions, at a minimum. Note that these functions are prototyped in movie.h, which you are allowed to change (but probably shouldn't). The functions are grouped into three categories below purely for organizational purposes.
Tree Structure
Movie* insert (Movie* root, char* name, int year, int minutes, int genre, long long revenue);
Insert a movie into a tree, obeying ordering constraints. Note: If a movie already exists in the given genre tree with the same name and the same year, simply return without changing the tree.Movie* search (Movie* root, char* name, int year);
Search in a tree for a movie with a given name and year. Return a pointer to the movie if found,NULL
otherwise. This function is useful for updating the revenue.Movie* delete (Movie* root, char* name, int year);
Delete a movie in the tree with a given name and year, freeing the memory. If the tree doesn't contain a matching movie, this function has no effect. The deletion procedure I'm asking you to implement is one of those most difficult parts of this assignment, even though it is one of the simpler tree deletion algorithms. When deleting a node, if it has no children, simply remove it. If it only has one child, move that child to where the deleted node was. Finally, if both children of the node are present, find the leftmost child of the right subtree under the deleted node and move that child to where the deleted node was. If the leftmost child of the right subtree has a right child, move it up to where the leftmost child of the right subtree was.void clear (Movie* root);
Free all the nodes in a tree.
Information
void printMovie(Movie* movie);
Print information about the given movie to stdout. This information includes name, year, length, genre, and revenue. Each element should be labeled and printed on a separate line. Refer to the sample executable for exact output formatting.int printMatches (Movie* root, char* name);
Print all movies (using the previous function) whose name matches the given name. Use an inorder traversal so that movies with an earlier year will come first.void printAll (FILE* file, Movie* root);
Print all movies to the given open file. The fields of each movie should be printed in the order: name, year, length, genre, and revenue. These fields should be separated by tabs, followed by a newline after all the data in each movie. Use a preorder traversal to maintain the shape of the tree.
Statistics
int count (Movie* root);
Return the number of nodes in the tree.long long totalRevenue (Movie* root);
Return the total revenue in the tree.Movie* highestGrossing (Movie* root);
Find the movie with the highest revenue in the tree orNULL
if no movie has a revenue greater than 0. If there are ties (which is extraordinarily unlikely), take the highest revenue movie that comes earliest in an inorder traversal.
Errors
There are many possible error cases in this assignment. You can assume that there are no formatting errors. In other words, movie data will always contain a string, two integers, another string, and a long long integer (representing name, year, minutes of length, genre, and revenue, respectively). In a file, these elements will be separated by tabs. When entered directly, each element will be entered on a separate line. The input will always be correctly formatted; however, you cannot assume that the year will be in a legal range, the minutes will be non-negative, the genre will be one of the 25 legal genres, or the revenue will be non-negative.
Here is a list of all errors you need to report.
Load failed: File <filename> not found
Printed if file to load cannot be opened for readingSave failed: File <filename> not writable
Printed if file to save cannot be opened for writingAdd failed: Invalid year <year>
Printed if year is invalid for a movieAdd failed: Invalid length of <minutes> minutes
Printed if length is negativeAdd failed: Invalid genre <genre>
Printed if genre is not one of the 25 supported genresAdd failed: Invalid revenue <revenue>
Printed if revenue is negativeDelete failed: Invalid year <year>
Printed if year is invalid for a movieDelete failed: Invalid genre <minutes>
Printed if genre is not one of the 25 supported genresDelete failed: Movie <name> (<year>) not found in genre <genre>
Printed if movie cannot be foundRevenue change failed: Invalid year <year>
Printed if year is invalid for a movieRevenue change failed: Invalid genre <genre>
Printed if genre is not one of the 25 supported genresRevenue change failed: Invalid revenue <revenue>
Printed if revenue is negativeRevenue change failed: Movie <name> (<year>) not found in genre <genre>
Printed if movie cannot be foundUnknown command: <command>
Printed if command is not recognized
Miscellaneous Tricks
You should use the scanf()
function for most of your input. Chapter 7 in Kernighan and Ritchie covers it extensively. Its return value is the number of items it read successfully, and it rather nicely reads items using whitespace as a separator. Likewise, you should use the similar fscanf()
function to perform file input. It might also be useful to use getchar()
or fgetc()
to do single-character input to read names until a tab delimeter or EOF.
Faithful printf()
will let you do the screen output you need to do, but some of the output formatting might look difficult to pull off. The only really tricky thing I did in output formatting was the way I formatted some names. I used the following formatting string for the genre names: "%-12s"
. That formatting string makes a string take up 12 spaces, and the "-
" at the beginning makes the string left-justified instead of the default right justification. The fprintf()
function should be your main tool for file output.
Feel free to use the string library all you want. The strcmp()
function is going to be useful. Also, don't overlook strdup()
. Recall that, given a string as input, strdup()
allocates just enough memory to hold the string, copies the string over, and returns a pointer to the newly made copy. It's a great choice for copying strings over to name
field in the Movie
structure. If you use strdup()
, remember to free the memory created for the strings when you dispose of them.
Provided Files
- movie.h - Header file defining the
Movie
structure and some useful constants - names.c - Source file containing the global genre names array
- movies.tsv - Sample movie input file
- movies.exe - Sample executable
Turn In
Zip the contents of your project directory, including the makefile, the source C file(s), and header files. Upload this zip file to Brightspace. Do not include any object files or executables. Running the make
command must compile all the required C source code files and generate an executable named movies
.
All work must be submitted before Wednesday, April 9, 2025 at 11:59 p.m. unless you are going to use a grace day.
Grading
Your grade will be determined by the following categories.
Category | Weight |
---|---|
add Command
|
10% |
clear Command
|
5% |
delete Command
|
10% |
find Command
|
5% |
help Command
|
5% |
highest Command
|
5% |
load Command
|
10% |
quit Command
|
5% |
revenue Command
|
5% |
save Command
|
10% |
statistics Command
|
5% |
Formatting | 10% |
Error Cases | 10% |
Programming Style | 5% |
Under no circumstances should any member of one team look at the code written by another team. Tools will be used to detect code similarity automatically.
Code that does not compile will automatically score zero points.