Abstract
At present, automatic speech recognition technology is based upon constructing models of the various levels of linguistic structure assumed to compose spoken language. These models are either constructed manually or automatically trained by example. A major impediment is the cost, or even the feasibility, of producing models of sufficient fidelity to enable the desired level of performance. The proposed alternative is to build a device capable of acquiring the necessary linguistic skills in the course of performing its task. We call this learning by doing, and contrast it with learning by example. The purpose of this paper is to describe some basic principles and mechanisms upon which such a device might be based, and to recount a rudimentary experiment evaluating their utility. Spoken language, the original natural language, evolved in order for humans to convey importnat messages to each other. A first principle, then, is that the primary function of language is to communicate. A consequence of this principle is that language acquisition involves gaining the capability of decoding the message. This is in contrast to much of the research on automated language acquisition, which focuses on discovering syntactic structure, often specifically to the exclusion of meaning. Our first principle leads us to investigate a language acquisition mechanism based on connectionist methods, in which the network builds associations between messages and meaningful responses to them. People learn while performing a task by receiving feedback as to the appropriateness of their actions. A second principle, then, is that the actual construction of the mapping from messages to meaning should be governed by a feedback control system where the error signal is at the level of meaning. This is in contrast to some learning research, which is governed by providing input/output pairs, and where the error signal is a parameter-space distortion measure. Our second principle leads us to investigate a mechanism for human-machine interaction based on control-theory methods, where the system input is the message and the error signal is a measure of appropriateness of the machine's response. The utility of these principles is demonstrated and evaluated by applying them to an elementary inward-call-management task, the object of which is to connect a caller to the department of a large organization appropriate to his inquiry. Initially, the system knows nothing about the language for its task, that is no vocabulary, no grammer, and no semantic associations. In the course of directing incoming calls, the system acquires a vocabulary, learns the meaning of words and some rudimentary grammatical relationships relevant to its task. The mechanism used is a particular connectionist network embedded in a feedback control system which adjusts the connection weights of the network based on the success or failure of the machine's behavior, as evaluated by the caller's reaction to it. This mechanism has several intriguing mathematical properties. An experimental evaluation of the system has been conducted using typed rather than spoken input. The system was tested by 12 subjects over a 2-month period. Over 1000 conversations were held, during which the machine acquired a vocabulary of over 1500 words. Subsequent tests showed that the learning was stable, in that it retained 99% of the knowledge it had acquired in the interactions. Although the experiments conducted thus far are of a rudimentary nature, we consider them to be the early stages in a long-term study of automatic acquisition of intelligence by machines through interaction with a complex environment.
Original language | English (US) |
---|---|
Pages (from-to) | 101-132 |
Number of pages | 32 |
Journal | Computer Speech and Language |
Volume | 5 |
Issue number | 2 |
DOIs | |
State | Published - Apr 1991 |
Externally published | Yes |
ASJC Scopus subject areas
- Software
- Theoretical Computer Science
- Human-Computer Interaction