In Quest of Making a Master Poet

You might have noticed, I titled this post “... of Making ...” instead of “... of Being ...” and you'll get to know why. So, I thought, who would write poetry like me after I die? To address the issue at hand, I started thinking of ways I could use my knowledge of Machine Learning to build a bot that could be trained by my poems and the way I write, and be smart enough to spit out rhymes when required.

The simplest and the most trivial step in this quest is to write a Markov chain text generator. A Markov chain (in layman language) is a chain in which state changes are probabilistic and a future state depends just on the present state and is independent of past states.

So, how will this work?

The first step will be to get a collection of poems (or any text for that matter) which will be used as the corpus. Now, using our training database (the corpus) we will look for all triplets (three words in succession) and will make a map of all words which can come after two words. The current state of our Markov chain will be represented by the last two words in our sentence.

To begin, choose two successive words from the database at random. Look into the map and check which possible words can come after those two words. Choose one of them at random and continue.

An example of the method follows.

Input text: Hope is a good thing, maybe the best of things, and no good thing ever dies.


Triplets:
[['Hope', 'is', 'a'],
 ['is', 'a', 'good'],
 ['a', 'good', 'thing,'],
 ['good', 'thing,', 'maybe'],
 ['thing,', 'maybe', 'the'],
 ['maybe', 'the', 'best'],
 ['the', 'best', 'of'],
 ['best', 'of', 'things,'],
 ['of', 'things,', 'and'],
 ['things,', 'and', 'no'],
 ['and', 'no', 'good'],
 ['no', 'good', 'thing'],
 ['good', 'thing', 'ever'],
 ['thing', 'ever', 'dies.']]

Map:
{('maybe', 'the'): ['best'],
 ('no', 'good'): ['thing'],
 ('thing', 'ever'): ['dies.'],
 ('good', 'thing,'): ['maybe'],
 ('the', 'best'): ['of'],
 ('best', 'of'): ['things,'],
 ('things,', 'and'): ['no'],
 ('and', 'no'): ['good'],
 ('thing,', 'maybe'): ['the'],
 ('Hope', 'is'): ['a'],
 ('is', 'a'): ['good'], 
 ('a', 'good'): ['thing,'],
 ('good', 'thing'): ['ever'],
 ('of', 'things,'): ['and']}

As I did not have a huge collection of my own poems, I used Sir Robert Frost's poetry collection North of Boston as the corpus. It will generate gibberish most of the time but can generate sensible statements once in a while. A kind of sensible one generated by the algorithm follows.

Who else will harbour him At his age for pair,
the pair, you know. We sha'n't have the of
art of what mean
I mean by home. Of the
course the easy job For the next forty it
summers--call it forty. But not
I'm not so drunk I can't here:
stay here: Estelle's take
to take it as much wishing
as wishing him good-night. He went on: sure--I'm
'I'm sure--I'm sure'--as polite as be.
could be. He spoke to his door.

Yes, that's how a drunk Robert Frost writes.

Furthur work includes the use of Backus–Naur context-free grammar and Natural Language Processing techniques to make the outputs more meaningful.

Fork on Github.