So what I want is an LSTM like Karpathy's at
http://karpathy.github.io/2015/05/21/rnn-effectiveness/ github at
https://github.com/karpathy/char-rnn My ideal situtation is an LSTM that takes any string of any characters perferably millions of characters, such as:
letters, numbers and or other symbols and generates a similar string after being seeded.
So lets try Karpathy's suggestion of the HELLO lstm, monitoring only 3 characters at a time
Lets one hot encode the letters
h = [1,0,0,0,0]
e = [0,1,0,0,0]
l = [0,0,1,0,0]
o = [0,0,0,1,0]
_ = [0,0,0,0,1] (note: empty space not underscore)
For each input letter should be an output for the next letter.
So I got "Hello" working and have now tried a harder test. This time "The Road Not Taken"
Robert Frost, 1874 - 1963.
Warning, this is slow to train. I think there are better ways to do this now.
...