[NIFL-ESL:7013] Phonemes vs. gestural routines

From: Charles Jannuzi (jannuzi@edu00.f-edu.fukui-u.ac.jp)
Date: Thu Jan 24 2002 - 06:42:11 EST


Return-Path: <nifl-esl@literacy.nifl.gov>
Received: from literacy (localhost [127.0.0.1]) by literacy.nifl.gov (8.10.2/8.10.2) with SMTP id g0OBgBn19861; Thu, 24 Jan 2002 06:42:11 -0500 (EST)
Date: Thu, 24 Jan 2002 06:42:11 -0500 (EST)
Message-Id: <002b01c1a4cc$2cf01620$09150785@fedu.fukuiu.ac.jp>
Errors-To: listowner@literacy.nifl.gov
Reply-To: nifl-esl@literacy.nifl.gov
Originator: nifl-esl@literacy.nifl.gov
Sender: nifl-esl@literacy.nifl.gov
Precedence: bulk
From: "Charles Jannuzi" <jannuzi@edu00.f-edu.fukui-u.ac.jp>
To: Multiple recipients of list <nifl-esl@literacy.nifl.gov>
Subject: [NIFL-ESL:7013] Phonemes vs. gestural routines
X-Listprocessor-Version: 6.0c -- ListProcessor by Anastasios Kotsikonas
X-Mailer: Microsoft Outlook Express 5.50.4807.1700
Content-Transfer-Encoding: 7bit
Content-Type: text/plain;
Status: O
Content-Length: 5627
Lines: 127

Jannuzi vs. Nissen
Nissen vs. Jannuzi

Decide for yourselves. Actually, realize this is a good natured exchange
between John N. and me. My point is, there are theories and models which
support other ways to account for speech production, transmission and
perception.

Phonemic accounts:
http://www.arts.uwa.edu.au/LingWWW/LIN101-2001/NOTES-101/phonologyI.html

http://www.ling.umd.edu/pablos/Phony_h2.htm

http://www.ling.udel.edu/kabak/conf2001/abstracts/sung.html

Gestural

http://www.haskins.yale.edu/haskins/MISC/RESEARCH/GesturalModel.html

http://www.sign-lang.uni-hamburg.de/intersign/Workshop2/CrashbornHulstKooij/
crasbor_hulst_Kooij.html#A4_2

(this might support John, but it might support me, interesting anyway)

And this from

http://www.indiana.edu/~srlweb/publication/manuscript211.pdf


The process of speech perception may be limited to the auditory channel
alone as in the case of a telephone conversation. However, in everyday
spoken language the visual channel is also involved as well and the study of
multi-modal speech perception and spoken language processing is
one of the central areas of current research. While stimulus variability,
perceptual constancy, and
neural representation are core problems in all areas of perception research,
speech perception is
unlike other perceptual processes because the perceiver also produces spoken
language and therefore
has intimate knowledge of the signal source. This relationship, combined
with the high communicative load of speech constrains the signal
significantly and affects both perception and
production strategies (Lieberman 1963; Fowler & Housman, 1987; Lindblom,
1990). Speech perception is also unique in its remarkable robustness in the
face of a wide range of environmental
and communicative conditions. The listener’s remains remarkably constant in
the face of a
significant amount of production related variation in the signal.
Furthermore, even in the worst of
environmental conditions in which large portions of the signal are distorted
or masked, the spoken
message is recovered with little or no error. As we shall see, part of this
perceptual robustness derives
from the richness and redundancy of information in the signal, part of it
lies in the highly structured
nature of language, and part comes from the context dependent nature of
spoken language.
Extracting meaning from the acoustic signal may at first glance seem like a
relatively
straightforward task. It would seem to be simply a matter of identifying the
acoustically invariant
characteristics in the frequency and time domains of the signal that
correspond to the appropriate
serially ordered linguistic units (i.e. reversing the encoding of those
mental units by the production
process). From those units the hearer can then retrieve the appropriate
lexical entries from memory.
Although stated rather simply here, this approach is based on an assumption
about the process of
speech perception that has been at the core of most symbolic processing
approaches (Studdert-Kennedy,
1976). That is, the process involves the segmentation of the signal into
discrete and
abstract linguistic units such as features, phonemes, or syllables. Before
or during segmentation the
extra-linguistic information is segregated from the intended message and is
processed separately or
discarded. For this process to succeed, the spoken signal must meet two
conditions The first, known
as the invariance condition, is that there is invariant information in the
signal that is present in all
instances that correspond to the perceived linguistic unit. The second,
known as the linearity
condition, is that the information in the signal is serially ordered so that
information about the first
linguistic unit precedes and does not completely overlap or follow
information about the next
linguistic unit and so forth.

It has become apparent to speech researchers over the last 40 years that the
invariance and
linearity conditions are almost never met in the actual speech signal
(Liberman, 1957; Chomsky &
Miller, 1963; Liberman, Cooper, Shankweiler, & Studdert-Kennedy, 1967). This
has led to several
innovations that have achieved varying degrees of success in accommodating
some of the variability
and much of the nonlinearity inherent in the speech signal (Liberman,
Cooper, Harris, &
MacNeilage, 1963; Liberman & Mattingly, 1985; Blumstein & Stevens, 1980;
Stevens & Blumstein,
1981). However, inter- and intra-talker variability remains an intractable
problem within these
conceptual/theoretical frameworks. Recent approaches that treat the signal
holistically have proven
promising alternatives. Much of the variability that researchers sought to
strip away in traditional
approaches contains important information about the talker and about the
intended message. Recent
approaches, while differing significantly in their view of perception, treat
the signal as information
rich. The information in the speech signal is both ‘linguistic’, the
traditional message of the signal,
and ‘non-linguistic’ or ‘indexical’ (Abercrombie, 1967; Ladefoged &
Broadbent, 1957), information
about the talker’s immediate physical and emotional state, about the talker
’s relationship to the
environment, the social context, etc. (Pisoni, 1996). Much of the
variability and redundancy in the
signal can be used to enhance the perceptual process rather than being
discarded as noise (Klatt,
1976, 1989; Fowler, 1986; Goldinger, 1990; Johnson, 1997).

This whole paper is well worth reading, though it is a slog, I admit.

Charles Jannuzi



This archive was generated by hypermail 2b30 : Fri Jan 17 2003 - 14:43:56 EST