Acoustic blur, soundshapes, speech streams

I’ve been thinking about an interaction I had in class last week. I’ve transcribed it roughly below. For a bit of context, the language point was going to for future plans, and the language had been presented through a listening. This was a controlled practice stage.

Here’s how things played out (well, with real student names obviously!)…

Student A : (quite slowly) What are you going to do after class?

Student B : (quite carefully) I’m going to meet my friends

Me: OK, cool. That’s fine…. *thinks*. OK, Student A – woye.gunne.doowaf.teclass?

Student A: Er…I’m going to eat

Me: weye.gunneet?

Student A: Sorry?

Me: weye.gunneet?

Student A: I don’t… understand

Me: That’s ok. What might I ask you? You said that you’re going to eat…

Student A: Maybe… where?

Me: weye.guneet?

Student A: Oh! Where are you going to eat?

Me: weye.guneet?

Student A: Maybe… Sizzler

Me: Nice. Good steak. (To Student B) Ask me.

Student B: What are you going to /

Me: woye.gunne.doowaf.teclass?

Student B: *laughs* woye…gunnerrr

Me: it’s OK. Try this instead: watcher

Student B: watcher

Me: watcher.gunner


I find myself doing things like this more and more in class. I mean, if you were to pick this interaction apart, it’s not particularly good teaching to be fair. The whole interaction is staggered and unnatural, I’m modelling pronunciation with simple repetition, I’m leading the exchanges here too. But hey, I’m being honest about what happens in my class sometimes, I’m not gonna lie.

I’ve just finished reading this intriguing book by Richard Cauldwell – Phonology for Listening. I feel like this text aligns somewhat with my current thoughts on teaching pronunciation and listening, and helps me understand exactly what I’m trying to help the learners achieve at times in the classroom.

The learners’ interactions in the dialogue above, before I snuck my way into the conversation that is, were a direct result of the language modelled in the text. The listening from which the target language was picked out followed the ‘careful speech model’. Cauldwell summarises the features of this model really well in his book, comparing it to the much more natural ‘spontaneous speech model’.

The careful speech model is neat and tidy – Cauldwell refers to it as a greenhouse/garden, as opposed to the jungle of spontaneous speech. This nice, orderly model for listening texts may not be the most authentic, but it’s more intelligible for learners. It helps limit the difficulty of a text by using  soundshapes for words that are similar to citation forms, plus it limits the features of connected speech to those which are more easily observable (and often fairly teachable). All our lessons have a pronunciation focus, and it’s no surprise that the use of contractions in phrases such as ‘I’m going to…’ was the focus here. It’s manageable enough in relation to the target language.

There are a lot of features of pronunciation we teach in our lessons that Cauldwell would probably question. For example, intonation patterns for question forms feature in a fair few of our lessons, but always seem ambiguous to me (and to Cauldwell!). Sentence stress is often far to simplified – there’s far more going on than just (all) content words being stressed and function words generally being weak – I’m with Cauldwell on that one. Plus, the very fact I’m even talking about sentences instead of utterances, or speech units, suggests to me that my approach (or the approach of our materials) might not be bringing pronunciation/listening alive, or at least appreciating the transience of it.

Far from criticising the materials I teach, I understand why this is the case. Our listening texts model a task that we want learners to replicate. The listening includes target language that they might need, we focus on using this effectively, with a relevant focus on pronunciation (normally suprasegmental). If the model text was spontaneous speech then we would be spending far too long on helping learners decode it (rather than doing a quick comprehension test), and it might not be particularly clear, given the veritable jungle of language that would appear in a spontaneous and truly authentic text, exactly what we want learners to understand and use.

The problem is, we only have one type of lesson. Nearly every lesson we teach has a listening text, but we don’t factor in any real focus on decoding a speech stream. The poor learners. They must feel like I do when I’m learning Thai. Chuck me in a controlled classroom environment and I understand most of what people are saying (well, for my level). But get through the initial exchange of pleasantries with a Thai taxi driver in the real world, and the rest of what he/she’s saying is just a complete blur, which I am constantly asking them to repeat in a more segmented fashion.

Alright, so we focus on connected speech, and that goes some way to dealing with decoding this speech stream. But it doesn’t thoroughly account for the amount of variation in spontaneous speech. You don’t just get a ‘I’m going …’ You might get a ‘um.gonne’, ‘ugunna’, ‘eye-m.gowner’, or even a ‘um.prollijus.gunegowohm’ (that’s meant to be ‘I’m probably just going to go home’…!). And I’m just talking about the stream without tones, keys, prominence and all the rest that Cauldwell mentions.

I find myself feeling the need to draw attention to this type of stuff often in class. When I do so, I’m always reiterating the same point to the learners:

‘Look. No one is saying that you have to speak like this. Sure, I’m making you practise saying these phrases as it might help you understand them a bit better. How they flow and how they feel. But the important thing is not using them – if you can, great, if you can’t, just make yourself understood in the best way possible. Still, you may well hear this type of phrase in everyday English. People won’t speak like ‘I am going to see a…’, they’ll say ‘Um gonnerseeyer…’.

In time, maybe we can analyse that utterance (‘Um gonnerseeyer’) with the teachable stuff. The contraction, the elision, the intrusive sounds, whatever. But in the immediacy, well I just try and teach the learners how things are going to sound when they encounter them in spontaneous speech. That’s the biggest barrier I find to learning other languages so I’m really glad that Cauldwell tries to address it. It’s also the biggest barrier I find in our ‘careful speech model’ listening tasks. They’re just too… careful?!


There’s one thing I find really difficult about teaching acoustic blur and speech streams – avoiding the tendency to make things visible. My learners by nature seem to want some citation form of a phrase. As Cauldwell mentions, the written form of an utterance is a metaphor, and once written down (in whichever way) a stream of speech becomes de-natured. I agree, but this doesn’t serve as the best route in for the teacher/learner. I’ve seen teachers use mondegreen-style transcription and transcription like I’ve used in this post (Cauldwell-style). I’ve done so myself and have at times found it effective while at others misleading.

When I was studying my DipTESOL I dabbled with tasks focused on understanding ‘fast colloquial speech’, as Marks and Bowen put it, but which I’d probably refer to now as a speech stream. This did involve dictation – write down what the teacher says, how many words did you hear? Etc. This type of task progresses on to a focus on how words are connected, squeezed, and so on. I found that learners still wanted to amend their written guesses and create their own written version of these ‘speech squeezes’. I guess it was nature for them to have something tangible to take away, but it rather devalues the importance of ‘invisible, transient and speedy’ speech.

Reading Cauldwell’s book has come at the right time for me, as I’d realised the need to address the issue of decoding with my learners. I’d begun to do so anyway, perhaps realising that this has been a feature of my classroom practice for a while, and probably came about more due to my experiences as a language learner than as a teacher. Still, I’m yet to find the right method to help learners deal with decoding the speech stream effectively. Even if I find that, when will I find the time in class to apply it?



  1. I’ve just started looking into my COI for the DipTesol – I wanted to focus on teaching listening skills, and my tutor recommended Richard Cauldwell. I wasn’t really sure where to begin with it, but this has really helped focus my thoughts – perfect timing, thanks so much!

  2. Hi Pete,
    I found this book fascinating, and really want to read his new book (A Syllabus for Listening, I think) It really helped me to think about teaching pronunciation and listening in a more varied way, as did Field as mentioned in the other comments. You might like this blog, which explores a lot of the same issues, and includes some ideas for how to teach them. Cauldwell’s Speech in Action site is also very useful. I’ll be interested to see what you do with these ideas in the future.

