Project 1: JSON Parsing
Due by: Monday, February 7, 2025 at 11:59 p.m.
Introduction
JSON files are text files that persistently store JavaScript objects in a human-readable fashion. In this project, the following syntax rules are allowed:
- Positive or negative integer values in decimal or octal format (37, -4, 024). For octal values, multiple leading 0s will be ignored, making 024 the same as 00000024.
- Strings must begin and end with quotation marks. The only escape characters allowed are backslash (\\) and quotes (\").
- Objects begin and end with curly braces, with commas between key-value pairs. The keys must always be strings, but the values can be either a string or an integer.
- Whitespace is ignored. Repeated whitespace characters are removed.
Examples:
- { "key" : "value" }
- {"a":-2}
- { "a":"b", "c":4, "d": "This \"is\" valid \\ here." }
Project Structure
Download the archive given below and unzip it into an appropriate directory.
This project uses a finite state machine implementation like Assignment 2 as the basis for parsing the JSON file. Parsing source code using an FSM is the basis for all compiler design, and modern tools can automate this process. The main idea is that each character fires an event in the FSM to process that character appropriately. Your primary task is to build the FSM based on the given state models and to build a parser driver that triggers an appropriate event based on a given character. In the code base you will be working with the following files:
- statemodel.h, statemodel.c
- The header file is more important here. It defines the types for all the state machines you need to implement. It also contains a generic fsm_t type that must be extended with additional fields to store internal data in the state machines. The statemodel.c file mostly serves as a front-end interface to the state machines below.
- stringmodel.c, intmodel.c, valmodel.c, and objmodel.c
- Implement the state machines here. For each one, put all the necessary code (including the effect and entry functions) in one file.
- parser.c
- This file contains the main driver of the state machines. For each machine, there is an accept_*() function used to send events to the FSM based on reading a character from input. Note that there will be some back-and-forth between these functions and the effect/entry functions regarding when the FSM's internal pointer is advanced. Use your own judgment to decide when to adjust the pointer here and when to do so in the FSM.
- main.c
- This file is partially complete. It's set up to parse the command-line arguments to determine which file to try to open and how to process its contents (integer, string, value, or object). There is also a -d option you can use to turn on a global debug variable which is helpful for printing out the names of the states as they transition. In this file, you'll need to open the file, read in its contents, and then try to accept the contents using the appropriate parser driver.
This project is designed to be completed iteratively. Start with the string-processing FSM which has some embedded comments to help you. Once you feel that FSM is complete, move onto the accept_string() function and implement that. Once you're passing those unit tests, edit main.c to process the string input files to work on those integration tests. Until you're passing all of the string components, don't even look at any of the other state machines.
Also note that the provided test cases do not explicitly test your implementations of the FSM for correctness. If you want to do so, you can extend the tests/public.c test cases as needed. For instance, the following code could be used as a test for first transition in the string FSM:
START_TEST (basic_string_test) { fsm_t string; initialize_str_fsm (&string); event_t event; event.string = OPEN_QUOTE; handle_event (&string, event); ck_assert_int_eq (string.state.string, BUILDING); } END_TEST
This test could be added using the tcase_add_test() function in the public_tests() at the bottom of the test file. Note that the second parameter is the name of the test (basic_string_test in this case).
Stage 1: Strings and Integers
Consider the state model for strings given below:
Consider the state model for integers given below:
Work through the following steps:
- Implement the state models for stringmodel.c and intmodel.c.
- Implement the accept_string() and accept_integer() parser drivers to fire events based on the characters in the input string.
- Open the file in main.c and parse input files when given the -s and -i flags as appropriate. For -v and -o, main() should silently return without crashing.
The string model's effect functions are already stubbed out with a comment explaining what the effect should do. For the integer model, the functions should behave as follows:
- SetNegative: Indicate that the final result should be a negative value
- SetMultiplier_*: Indicate whether the characters should be processed as decimal or octal numbers
- MultAndAdd: Since we are reading the number from left to right, for each digit, we multiply the current value by the base (10 or 8) and add the new digit
- SyntaxError: Print a message indicating the error
Stage 2: Values
Consider the state model for values given below:
Work through the following steps:
- Implement the state model for valmodel.c.
- Implement the accept_value() parser driver and extend main() to support the -v flag.
Note that this state model uses entry functions on the BUILD_STR and BUILD_INT states to run the string and integer FSMs as needed. That is, for string values, in the ActivateString() function, there should be a local variable created as an instance of a string-processing FSM. That state machine should be run by calling accept_string() to completion. The result of that call should be stored in a way that the accept_value() parser driver can determine if it was successful. Something similar should be done for ActivateInteger. As above, the SyntaxError effect should produce the message indicated in the file.
Stage 3: Objects
Consider the state model for JSON objects given to the right:
Work through the following steps:
- Implement the state model for objmodel.c.
- Implement the accept_value() parser driver and extend main() to support the -v flag.
- Eliminate all memory leaks reported by valgrind.
The entry functions and effects should be implemented as follows:
- ActivateString: Create an instance of a string FSM and run accept_string() on it to get the identifier
- SetIdent: Store the copy of the identifier string (if not done already)
- AdvancePointer: Move the current character pointer past the colon
- AppendKeyValuePair: Appends the key-value pair in a specified format to a string containing all pairs, using realloc() to grow the result string
- SyntaxError: Print a message indicating the error
Turn In
First, run make clean to remove all of the executable and object files from your directory structure. Then, zip up the contents of your project directory in either a .zip or a .tar.xz file. Upload this archive to Brightspace. Your project must be submitted by Monday, February 7, 2025 at 11:59 p.m. unless you're going to use a grace day.
All work must be done within assigned teams, though you may discuss general concepts with your classmates. Please refer to the course policies if you have any questions about academic integrity. If you have trouble with the project, I am always available for assistance.
Under no circumstances should any member of one team look at the code written by another team. Tools will be used to detect code similarity automatically.
Code that does not compile will automatically score zero points.
Grading
Your grade will be determined by the following categories.
Category | Description | Weight |
---|---|---|
Unit Tests | Your solution passes the 20 unit tests. | 40% |
Integration Tests | Your solution passes the 24 integration tests. | 40% |
Memory Leaks | Your code has no memory leaks. | 10% |
Style | Your solution conforms to GNU style, passes the style checks, and most importantly, is readable. | 10% |
Credits
This assignment is based on material from CS 361 by Michael S. Kirkpatrick. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.