Assignment 7: Ghostwriter

Due by: Friday, October 27, 2023 at 11:59 p.m.

A ghostwriter is a professional writer who is hired to write content on behalf of someone else. This content could include books, articles, speeches, social media posts, or any other form of written material. The person who hires the ghostwriter is typically credited as the author or creator of the work, while the ghostwriter remains anonymous or receives limited recognition.

Specification

Your mission is to complete a Python program that acts as an automated ghostwriter. It will read in a text sample and then produce a random sequence of words that mimics the style of the sample. All the code, including that which you write and that which is provided, should be stored in a file called assignment7.py.

Provided Code

Processing the contents of a file to remove punctuation requires a fiddly bit of Python that we have not yet introduced. For that reason, the following function is provided that will take a file name, open the file with the given name, and return a list of all the words in the file, in order, in lower case, with punctuation stripped out. (For those who are interested, a word, for our purposes, is defined as a sequence of letters that may optionally contain a single hyphen or apostrophe but starts and ends with a letter.)

def parseFile(filename):
    import re
    with open(filename, 'r') as file:
        text = file.read().lower()
        return re.findall("[a-z]+['-]?[a-z]+|[a-z]+", text)

Making a Dictionary

The first function you should write builds a dictionary out of the list of words that parseFile() generates. This dictionary will map each word in the list to a list of all the words that immediately follow that word. The list of words can be in any order and should include duplicates. For example, the key 'and' might have the list ['then', 'best', 'then', 'after', …], listing all the words that came after 'and' in the text.

Important: So that we have a way of dealing with words that are not in the text, make an entry in the dictionary whose key is the empty string and whose list contains the first word in the text.

Give this function the following header:

def ghostDictionary(words):

If printed, the output for such a function is large, since it contains every word in the file, repeated twice (except for the last word), plus dictionary formatting. For testing purposes, here is the output produced for the short poem "This Is Just To Say," by William Carlos Williams, whose text is available here.

{'': ['i'], 'i': ['have'], 'have': ['eaten'], 'eaten': ['the'], 'the': ['plums', 'icebox'], 'plums': ['that'], 'that': ['were'], 'were': ['in', 'probably', 'delicious'], 'in': ['the'], 'icebox': ['and'], 'and': ['which', 'so'], 'which': ['you'], 'you': ['were'], 'probably': ['saving'], 'saving': ['for'], 'for': ['breakfast'], 'breakfast': ['forgive'], 'forgive': ['me'], 'me': ['they'], 'they': ['were'], 'delicious': ['so'], 'so': ['sweet', 'cold'], 'sweet': ['and']}

You could produce this output with the following code:

words = parseFile('justtosay.txt')
dictionary = ghostDictionary(words)
print(dictionary)

Before moving on, be sure to test this function thoroughly to be certain it works correctly.

Creating the Ghostwriter

The second function you should write will print a random sequence of words, beginning with word, and of length number. It will operate in this way:

  1. Print the current word
  2. Look up the list of words associated with the current word in the dictionary (or with the empty string if the word isn't in the dictionary)
  3. Randomly choose one of these words from this list to be next
  4. Repeat until the specified number of words has been printed

Give this function the following header:

def ghostWriter(dictionary, word, number):

Although your powers of random number generation and list manipulation are certainly capable of dealing with the problem, there is a useful method random.choice(list) in the random module that selects a random element from list.

Note that using the print() function will normally print each item on a separate line. You can either build one large string and then output it, or you can call a special version of print() that specifies that each item will have a space printed after it instead of moving to the next line. You can do so as follows:

print(output, end=' ')

Although the output is always random, here's a sample of the output of 15 words from this function on a dictionary made from the text of Alice in Wonderland, starting with the word 'good':

good deal to wish i beg your finger very gravely and said to himself upon

Putting It All Together

At the bottom of your program, you should have the following code, which prompts the user for a file name, a starting word, and the number of words of output.

filename = input('Enter file name: ')
words = parseFile(filename)
dictionary = ghostDictionary(words)
word = input('Enter starting word: ')
number = int(input('Enter number of words: '))
ghostWriter(dictionary, word, number)

Provided Files

For testing purposes, here are three files you can use with your program.

A wonderful source for copyright-free text is Project Gutenberg. You can find many books from the early 20th century, 19th century, and before, for both testing and personal enjoyment.

Turn In

Upload assignment7.py to Blackboard.

All work must be submitted before Friday, October 27, 2023 at 11:59 p.m. unless you are going to use a grace day.

All work must be done individually. You may discuss general concepts with your classmates, but it is never acceptable for you to look at someone else's code. Please refer to the course policies if you have any questions about academic integrity. If you have trouble with the assignment, I am always available for assistance.

Grading

Your grade will be determined by the following weights:

Category Weight
Correct functioning of ghostDictionary() 45%
Correct functioning of ghostWriter() 45%
Organizing the code into a functioning program 10%

Under no circumstances should any student look at the code written by another student. Tools will be used to detect code similarity automatically.