Memory Network: A class of neural networks with an explicit external memory component that can be read from and written to
Softmax: A function that converts a vector of numbers into a probability distribution
bAbI: A set of 20 synthetic question-answering tasks designed to test different types of reasoning (deduction, induction, counting, etc.)
Perplexity: A measurement of how well a probability model predicts a sample; lower values indicate better performance
Strong supervision: Training where the model is told exactly which sentences in a story are relevant to the answer
Weak supervision: Training where the model is only given the final answer and must figure out which inputs were relevant
Hop: A single computational step of reading from memory and updating the internal state
BoW: Bag-of-Words—a representation of text that disregards grammar and word order but keeps multiplicity
PE: Position Encoding—a method to inject word ordering information into the embedding by weighting words based on their position in the sentence
RNNsearch: An earlier neural machine translation architecture using attention, similar to the mechanism used here
LS: Linear Start—a training trick where softmax layers are initially removed (making the model linear) to avoid local minima