Zhilin Yang, Bhuwan Dhingra, Ye Yuan, Junjie Hu, William W. Cohen, Ruslan Salakhutdinov
ICLR2017 | arxiv | openreview code (theano) |
Word level capture the semantics while token-level are suitable for sub-word morphologies ad handling out-of-vocabulary tokens. Combining the both has been approached in early work as a simple concatenation or scalar weighting.
Propose a fine-grained gating mechanism to dynamically combine the two.
Let \(v\) denote a feature vector that encodes the token properties: - Part-of-Speech(POS) tagging identifies the grammatical group a word belongs to (NOUN, ADJECTIVE, VERB, ADVERBS etc.) based on the context. - Named Entity Recognition (NER) tries to find out whether or not a word is a named entity (persons, locations, organizations, time expressions etc) - Binned document frequencies - Word-level representations The gate vector is evaluated as: \[ g = \sigma(W_gv+b_g) \]
The final embedding is a combination: \[
h = g\odot w_c + (1-g)\odot w_w
\] #### Experiments: Evaluated on a social media tag prediction task (Twitter)
Related work: check Gated Word-Character Recurrent Language Model scalar gate:
Check Attending to Characters in Neural Sequence Labeling Models