Sequence modeling via segmentation (SWAN)

Chong Wang, Yining Wang, Po-Sen Huang, Abdelrahman Mohamed, Dengyong Zhou, Li Deng

ICML 2017 | arXiv | video | slides

Problematic:

Phrases make much more sense than words when generating sequences. How to segment and generate on the fly.

Contributions:

Propose a dynamic agorithm to segment and estimate the sequence probability by summing over all valid segmentations.

Non-sequential input:

For a segmentation \(a_{1:\tau_a}\) of \(y_{1:T}\) i.e \(\tau_a\) segments such that their concatenation (operator \(\pi\)) form the sequence \(y\): (\(\pi(a_{1:\tau_a})=y_{1:T}\)).

Empty segments are not permitted in this version.

Each segment is conditionned on the input as well as the previous segments.

The full sequence probability is evaluated by summing over all possible segmentations in \(S_y\) (exponentially large).

Sequential input: