Words or Characters? Fine-grained Gating for reading comprehension

Zhilin Yang, Bhuwan Dhingra, Ye Yuan, Junjie Hu, William W. Cohen, Ruslan Salakhutdinov

ICLR2017 | arxiv | openreview code (theano) |

Problematic:

Word level capture the semantics while token-level are suitable for sub-word morphologies ad handling out-of-vocabulary tokens. Combining the both has been approached in early work as a simple concatenation or scalar weighting.

Contributions:

Propose a fine-grained gating mechanism to dynamically combine the two.

Let \(v\) denote a feature vector that encodes the token properties: - Part-of-Speech(POS) tagging identifies the grammatical group a word belongs to (NOUN, ADJECTIVE, VERB, ADVERBS etc.) based on the context. - Named Entity Recognition (NER) tries to find out whether or not a word is a named entity (persons, locations, organizations, time expressions etc) - Binned document frequencies - Word-level representations The gate vector is evaluated as: \[ g = \sigma(W_gv+b_g) \]

The final embedding is a combination: \[ h = g\odot w_c + (1-g)\odot w_w \] #### Experiments: Evaluated on a social media tag prediction task (Twitter)

Issues & comments:

Related work: check Gated Word-Character Recurrent Language Model scalar gate:

Check Attending to Characters in Neural Sequence Labeling Models