Defining the problem
We can now view the input as a sequence of words intermixed with paragraph breaks
The characters between two adjacent words are interesting only if they are a paragraph break
We can define a token as either a word or a paragraph break