Chatting with my brother-in-law the other day, I commented on a conference session I had just given on the value of AI-powered pronunciation apps. He smiled and reminisced about the very first computers in junior school classrooms, and the idea that was sold with them that they would soon replace teachers. Thirty years later, here we are again. Or are we? Will AI revolutionise pronunciation and eliminate the physical teacher this time, or is it just part of an evolution that has been going on for decades? Superficially, these apps look like revolution – learn what you want, where you want, when you want, and all with your personal coach. Dig a bit deeper and we might find all that glitters is not gold.
Skills acquisition and pronunciation
One thing sets learning pronunciation apart from everything else we do when we are learning a language. Pronunciation involves muscle movement, and that means acquiring new habits in terms of the muscles of the face and mouth. Because of this, it is essential that both teachers and learners understand how muscles can be trained to do things they have never done before. To understand this, we need to look at training in areas such as sport, music or dance. Typically, such training is seen to develop in three stages: the cognitive stage, the associative stage and the autonomous stage.
In the cognitive stage, the coach-teacher gives the learner specific instructions on how a target action is to be performed. In the case of a new consonant, for example, this might be indications of where the tongue is in relation to the teeth or the lips, and whether to use voicing or not. During this initial stage, most learners make frequent mistakes. They need to be monitored closely by the coach-teacher, who has to give immediate, precise, meaningful feedback on what the learner is not doing correctly. In addition to indicating a problem exists, the feedback must include clear guidance on how to correct it. The importance of meaningful, useable feedback during the cognitive stage cannot be overemphasised.
In the associative stage, learners need to be given abundant practice within a narrow context. In terms of pronunciation, this might mean practising a target feature in a drill, a game or a tongue twister, or in any other activity where learners can give a lot of attention to the pronunciation feature that they are trying to learn. Genuine communication is not the aim at this point. This is the stage in which learners slowly convert what they know into what they can do. Feedback from the teacher is needed from time to time, but at the same time learners will be developing proprioception. This is the ability to sense your own muscle movements, and hence know if you are doing something correctly or not.
The final stage of skills acquisition is the autonomous stage. Here production of the target action is (almost) automatic and, even if it is wrong, the learner is able to correct it without external help. In terms of pronunciation, for example, the production of the target feature(s) has become automatic and rapid, and happens without speakers having to think consciously about what is happening inside their mouths. This is success!
As I suggested in an article on technology and pronunciation back in 2014, there are other factors that need to be taken into account when we assess the usefulness of apps, and these, together with what we have just seen about skills acquisition, mean that good, AI-powered apps will respond well in these ways.
- Choice and sequence Not all learners have the same problems; AI apps need to adapt to each learner. Even if they share the same first language, what is vital for one user is a minor issue for another. In addition, the chance to choose what a learner considers important (choice) when the learner thinks it is important (sequence) is essential for motivation, especially during the repetitive tasks of the associative stage.
- Place and pace The whole value of apps, as their publicity constantly points out, is to allow learning to take place wherever the learner chooses – at school, at home, on the bus, in the countryside (place) – and at whatever speed the learner is comfortable with (pace).
- Explicit instructions A major selling point of apps is that they allow learners to work on their own. Because of this, they need to give an explicit explanation about what is being practised and why, as well as clear indications how to do the activity.
- Abundant repetition As we have just seen, abundant repetition is essential for skills acquisition. Consequently, apps have to offer multiple opportunities for the repetition of a target feature in order to bring about automation. In that respect, one of the joys of ‘AI teachers’ is that they never lose their patience.
- Feedback and correction Quality feedback is vital in the cognitive stage, where incomplete learning or first-language psychomotor habits can interfere. In the absence of good feedback, the neuronal pathway of the incorrect pronunciation becomes stronger and stronger, and the incorrect pronunciation action actually becomes reinforced. That is to say, repeated ‘off-target’ attempts of a merely listen-and-repeat type, not only fail to generate correct pronunciation, but actually make incorrect pronunciation harder and harder to eliminate.
- Assessment and progress If learners do not perceive progress, they can easily become so demotivated that they abandon the task. Clear indications of progress promote learning. In addition, clear indications of progress help to justify the cost of the learning, both in terms of any financial outlay and in terms of time and effort.
Some AI-driven apps
There is an increasing number of AI-driven apps for learning pronunciation. I am only going to look at three here: Speakometer, Bold Voice and ELSA. The three come from recommendations from colleagues, and from the fact that they offer enough free-of-charge access to allow me to get a reasonably good feel for them. All three start by asking a series of questions that allow the app to build up a learner profile; from that, they generate a course to suit what the app feels that learner needs. All three offer limited free access and then require you to subscribe for full access to any premium features. The free-access exercises on Speakometer do not require an internet connection, but both Bold Voice and ELSA only work with a good internet connection.
Speakometer
Speakometer is limited to individual words and sounds, although it does offer useful categories like ‘short vs long vowels’ or ‘voiced vs voiceless consonants’. A target word, or a word with a target sound, is presented and modelled. You repeat the word and your attempt is marked using a meter which swings from red through to green. A text comment appears above the meter to indicate how well you have done.
In my experience, the assessment of learner attempts is not totally reliable. I said ‘three’ in multiple ways, including an Irish accent, a Spanish-English accent and last of all, a standard British accent – I was given the same low score for each attempt. With the section ‘Mastery of sounds’, I provided the app with a number of variations of the word ‘worry’, some markedly nonstandard, and I got full marks for each.
The is no apparent feedback other than the meter’s visual score and the comment box. There is no immediate, meaningful indication as to what the user is doing wrong, or how they can go about improving their speech. The mastery of sounds feature does provide simple information and line diagrams to indicate how to make the sound, but without the aid of a trained teacher these might not be of much use.
Bold Voice
Bold Voice is an accent-training app aimed exclusively at a standard American English accent. No other accent is available. An AI-generated female face (Eliza) takes you through an initial process of analysing your pronunciation via set sentences that presumably force you to pronounce segmentals and suprasegmentals that you will find problematic because of your first language. (I told Eliza I was a Spanish B2 learner).
After recording the set sentences, Bold Voice takes you through the results, starting with what you did well and then pointing out where you need to improve. The free trail then leads you through some exercises based on what you need to improve. In the exercises you listen to a model voice (US white, female) and repeat. This generates immediate feedback, and if you’re not so good, it gives you the chance to try again.
A really interesting feature is ‘Quick fix’. If you are stuck on a word, clicking on this takes you to a video clip of a teacher who explains to you what you might be doing wrong. If this still doesn’t help you, you can go for a ‘Deep dive’. Here the video tutor, a professional, Hollywood-accented coach, gives a more detailed description of what you should try to do. Some of this feedback perfectly replicates what human teachers might do in class with their own students. That said, one explanation had the coach sticking his tongue out of his mouth directly at the camera. There are cultures where this would be unacceptable or even offensive.
ELSA
ELSA is not a pronunciation app. It is designed to help learners become more fluent in conversation, although it does offer specific feedback on pronunciation, grammar and vocabulary. The home page allows you to opt between role play, a course designed for the user based on their learner profile or the chance to talk to the AI coach.
I tried a role play, and went for the ‘Surprise me!’ option. This took me to a shop selling lipstick (Robin is a woman’s name in the US), but it corrected immediately to men’s toiletries when I said I was a man, and then to beard oils when I said I didn’t shave! The conversation then continued in a logical direction, so I switched into quite a strong Newcastle accent. This didn’t faze the app at all, so I then changed to increasingly Spanish-accented English. This was interesting as the app dealt successfully with /b/ for /v/ in ‘very’, but incorrectly interpreted ‘Are they expensive?’ as ‘Are they expenses?’ and ‘a for double’ for ‘affordable’. What was amply clear, nevertheless, was just how much the AI was dealing with my speech on the basis of its intelligibility rather than proximity to an L1 accent, as is overwhelmingly the case with Speakometer and Bold Voice. Given that the Council of Europe and all of the major international exam boards reference pronunciation to intelligibility (and not to ‘nativespeakerness’) I found this feature of ELSA very encouraging.
Revolution or evolution?
Will AI-powered apps replace human teachers when it comes to teaching and learning pronunciation? A quick look at Table 1 will make my own answer to this question more than apparent. Why do AI-apps constitute evolution rather than revolution? There are a number of reasons. Firstly, the apps I tried are totally, or almost totally, lacking in any choice of short- or long-term goals. As we have just seen, however, the primary goal of pronunciation teaching today is intelligibility, as opposed to an L1-speaker accent. In that respect, ELSA, though not specifically a pronunciation app, is closer to the goal of intelligibility than the other two apps, which are entirely about accent training.
Effective pronunciation practice apps provide users with the chance to: | Speakometer | Bold Voice | ELSA |
---|---|---|---|
1. Choose their own short- and long-term goals (choice) |
✗ |
✗ |
– |
2. Choose what they want to work on at a given moment in time (sequence) |
✗ |
✗ |
? |
3. Work wherever they want to (place) |
? |
✗ |
✗ |
4. Work at whatever speed they want to (pace) |
✔ |
✔ |
✔ |
5. Understand what they are going to and why, and how they are going to do it (instructions) |
✗ |
✗ |
✗ |
6. Repeat their attempts at a target feature as often as they want to (abundant repetition) |
✔ |
✔ |
? |
7. Receive immediate feedback and/or scoring (feedback) |
✔ |
✔ |
✔ |
8. Receive in-depth, accessible help with correcting problems (correction) |
✗ |
✔ |
✗ |
9. Receive clear indications of progress (progress) |
? |
✔ |
✔ |
A second issue with these apps is the lack of explicit help in the cognitive stage of the learning process. As we saw earlier, good modelling and careful, individualised guidance and correction are vital in this first stage of skills acquisition. None of these apps provide this. The danger of learners repeating incorrect pronunciation (and, therefore, reinforcing and fossilising it) cannot be overstated. That said, Bold Voice does give us an indication of just how good AI-powered feedback could get if later apps put their energies and creativity into providing it.
Do I see myself encouraging my learners to use apps like the three described here? Yes, but only after I am sure they have progressed deep enough into the associative stage, so as to have developed basic proprioception – and with this, the ability to correct themselves when using the app. In short, the cognitive stage of learning new pronunciation still needs the presence of a human teacher, with all of the pedagogical skills and experience they bring to it. AI-powered apps could be hugely useful in the associative stage, where abundant repetitive practice is needed, but only if they allow learners to choose what they want to do, which is not the case at the moment since all of these AI-apps create a programme for the user to follow.
Finally, the question everyone is asking – which is the best app of the three? Speakometer is useful in that some sections of the app work when you are offline. Bold Voice is excellent if you want an American accent, but for me the best is ELSA because it works with the user’s intelligibility.