Abstract
We propose a simple and efficient approach for pre-training deep learning models with application to slot filling tasks in spoken language understanding. The proposed approach leverages unlabeled data to train the models and is generic enough to work with any deep learning model. In this study, we consider the CNN2CRF architecture that contains Convolutional Neural Network (CNN) with Conditional Random Fields (CRF) as top layer, since it has shown great potential for learning useful representations for supervised sequence learning tasks. The proposed pre-training approach with this architecture learns the feature representations from both labeled and unlabeled data at the CNN layer, covering features that would not be observed in limited labeled data. At the CRF layer, the unlabeled data uses predicted classes of words as latent sequence labels together with labeled sequences. Latent labeled sequences, in principle, has the regularization effect on the labeled sequences, yielding a better generalized model. This allows the network to learn representations that are useful for not only slot tagging using labeled data but also learning dependencies both within and between latent clusters of unseen words. The proposed pre-training method with the CRF2CNN architecture achieves significant gains with respect to the strongest semi-supervised baseline.
Original language | English (US) |
---|---|
Pages (from-to) | 3255-3259 |
Number of pages | 5 |
Journal | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH |
Volume | 08-12-September-2016 |
DOIs | |
State | Published - 2016 |
Externally published | Yes |
Event | 17th Annual Conference of the International Speech Communication Association, INTERSPEECH 2016 - San Francisco, United States Duration: Sep 8 2016 → Sep 16 2016 |
Keywords
- Convolutional neural network
- Semi-supervised slot filling
- Triangular CRF
- Unsupervised pre-training
ASJC Scopus subject areas
- Language and Linguistics
- Human-Computer Interaction
- Signal Processing
- Software
- Modeling and Simulation