lab07 : String Formatting and File IO

num ready? description assigned due
lab07 true String Formatting and File IO Tue 02/25 09:00AM Tue 03/03 08:59AM

In this lab, you’ll get to practice:

This lab should be done solo.

About this lab

In this lab, you will be opening/closing text files, writing code to read the file contents, and counting the total and the unique words in the file.

Instructions

In this lab, you will need to create the following files:

Starter code is provided for you at the bottom of this page.

Some notes about the File I/O functions

Here are simple examples you should try:

How to read a file in Python?

The code below that opens, reads, and closes the file.

If you have a file called “input1.txt” in the same directory as the Python code in which you want to open this file, then you would substitute “input1.txt” for “filename”. If your file is in a different directory, then in order for the Python code to open this file, you need to use a path to the file instead of the "filename".

file = open("filename")
content = file.read()
file.close()

Note that in the functions below, instead of hard-coding the file name, you pass it as an input argument called filepath.

input1.txt

hello
hello
hello world

input2.txt

hello world
world
world

input3.txt

Hello! Today is a lovely day, isn't it?

input4.txt

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Quid est, quod ab ea absolvi et perfici debeat? Philosophi autem in suis lectulis plerumque moriuntur. Paulum, cum regem Persem captum adduceret, eodem flumine invectio? Urgent tamen et nihil remittunt. 

Hoc positum in Phaedro a Platone probavit Epicurus sensitque in omni disputatione id fieri oportere. Eorum enim omnium multa praetermittentium, dum eligant aliquid, quod sequantur, quasi curta sententia; Duo Reges: constructio interrete. O magnam vim ingenii causamque iustam, cur nova existeret disciplina! Perge porro.

Sed ea mala virtuti magnitudine obruebantur. Iam in altera philosophiae parte. Apud ceteros autem philosophos, qui quaesivit aliquid, tacet; Itaque sensibus rationem adiunxit et ratione effecta sensus non reliquit. Sed ille, ut dixi, vitiose. Ergo ita: non posse honeste vivi, nisi honeste vivatur? 

These files are used when running pytest functions in lab07_tests.py.

If you want to test your functions with the additional files, call them input5.txt, etc., so that the tests below still work for you.

We will be using additional input files to test your submission on Gradescope.

Read all words into a list

To open and close the file within every function to get the list of words, let’s write a helper function getAllWords(). The split() function will be useful to achieve this.

def getAllWords(filepath):
    '''
    Returns a list of all words 
    from the given filepath.
    The function opens the file for reading,
    and closes the file before returning.
    '''
    return "stub"

For example,

Clean the resulting words

getCleanWordList reads a file with given filepath and return a list of all words in the file with all the specified punctuation characters (,.!?;) removed (you might find the strip() function helpful).

Note that all words are separated by whitespace characters, and a word contains only characters that do not include the following punctuation characters: ,.!?;". Your code will need to split and strip the strings from the text file appropriately.

For example, getCleanWordList("input3.txt", ",.!?;") returns ["Hello", "Today", "is", "a", "lovely", "day", "isn't", "it"].

Note that if we exclude the exclamation mark ! and the question mark ? from the string that’s substituted for the charsToRemove parameter, they should be included as part of the returned words:

Hint: you need to call getAllWords inside getCleanWordList.

def getCleanWordList(filepath, charsToRemove):
    '''
    Read a file with given filepath, return a list
    of all words from the file with the specified
    characters that are stored in the string called
    charsToRemove are removed.
    Empty strings should not be added to the
    resulting list of cleaned words.
    Use getAllWords function as a helper function 
    to get the list of all words.
    '''
    return "stub"

Get unique words (a.k.a. Remove the duplicates)

getUniqueWords reads a file with given filepath and returns a list of all unique words that appeared in the file.

For example,

Hint: you need to call getCleanWordList first.

def getUniqueWords(filepath, charsToRemove):
    '''
    Return a list of unique words that appeared 
    in the file with the given filepath.
    '''
    return "stub"

Count the number of words

getWordCount reads a file with given filepath and returns a list of lists, where each element is a list of two elements in the format [word, count].

For example,

Hint: you need to call getCleanWordList and getUniqueWords in this function.

def getWordCount(filepath, charsToRemove):
    '''
    Get the frequency of each unique word 
    in the file with given filepath, and return 
    a list of lists where each element is a list 
    of two elements in the format [<word>, <count>].
    '''
    return "stub"

Find frequently occuring words

Finally, mostCommonWord reads a file with given filepath, and returns the most common word in the file.

def mostCommonWord(filepath, charsToRemove):
    '''
    Reads the file from filepath in your function
    and returns the most common word in 
    the file (i.e.,the word with the highest frequency).
    In the case of ties (i.e., more than one word
    with the same max count, then return the word
    that occurs earliest in dictionary order
    (remember string comparisons).
    - Use getWordCount() helper function to count 
    the frequency of each word.
     '''
    return "stub"

Notes on computing the most common words

If you run mostCommonWord("input1.txt", ",.!?;"), this function should return the mode value from the file (the word that occurs most often). Use getWordCount to help you first count the words in a file, then mostCommonWord() can find the max count, save all words with that count and return the word that occurs first in dictionary order

Test the other functions accordingly, verifying on a simple input file that the results are correct.

Upload lab07.py and lab07_tests.py and .txt files to Gradescope.

Once you’re done with writing your functions, navigate to the Lab assignment “lab07” on Gradescope and upload your lab07.py and lab07_tests.py files together with your .txt input files. Submit all your files all at once, instead of uploading them one by one: each submission overwrites the previous files. You should be able to go to Gradescope to see both Python files and the text files in your submission.

Congratulations! You are finished with the lab!

Congratulations, you are finished with this lab!