第405課: Phonology 1.5: Consonants

In this lesson, we will learn about the particulars of Japanese consonant pronunciation and the rules behind it in the context of Japanese phonology. Terms will be defined as necessary, and connections to English will be made occasionally, though this will be limited due to the overwhelming dialectical diversity found across the English speaking world.

Lesson Note: This lesson assumes that you have basic knowledge of IPA symbols. However, as IPA symbols and terminology are defined in this lesson, you do not necessarily have to know them prior to reading this lesson. Additionally, [] will be used to encase phonemic descriptions whereas // will encase phonological phonemes of the language.

Consonants 子音


A stop is a sound that stops air flow completely. The Japanese non-nasal stops are [p, b, t, d, k, g]. If we were to organize these by voicing and place of articulation, we would get the following chart.

  Bilabial Alveolar Velar
 [-V] [p]  [t]  [k] 
 [+V] [b] [d] [g]

For this analysis of Japanese phonology, unvoiced consonants will be treated as the unmarked pronunciation of consonants in Japanese as there is convincing evidence to support that the addition of voicing in Japanese was an invent which the language did not start out with.

Bilabial sounds are made with the bringing together of the upper and lower lips, and the two languages don’t differ greatly here. Though the Japanese [t] and [d] are alveolar like in English, they are pronounced with the tongue tip almost touching the back of the upper teeth, which is not the case in English. For some speakers, they will sound fully dental like we find in Spanish.

The voiced velar [g] has an interesting [+nasal] allophone [ŋ] word medially and after the syllabic uvular nasal /ɴ/, which will be discussed in length later on. /ɴ/ is not in square brackets because it is important to view it here as a phoneme because it coarticulates with /g/, and its pronunciation is determined by how /g/ realizes. The two allophones of /g/, [g, ŋ] are in overlapping distribution. The latter is optional and must be only used in the said environments, but [g] can be freely used anywhere /g/ can appear.

In English, the unvoiced stops are heavily aspirated, especially in word initial position with a vowel following. Japanese does not have this default per say, and if a Japanese person does aspirate them, it is never going to be as strong as the average English speaker. However, aspiration is stronger in Japanese on average than Spanish.  Sometimes, you will hear people especially when singing in particular genres with well-pronounced aspiration, which may in fact be a sign that they are aware of the general means of aspiration in English. As there is nothing in Japanese defining the use of aspiration, it will be all over the place. So, just because you find a speaker with a lot of aspiration or a speaker with no aspiration does not negate this statement. Rather, it would be a validation of this not being an important or working aspect of Japanese phonology.


Fricatives are very noisy sounds. This turbulence which is often interpreted as hissing is created by forcing air flow in the oral cavity to go through a narrowed opening in the vocal tract, which then causes this major turbulence. You are essentially making friction with your tongue and mouth, which is why these sounds are called fricatives. Japanese has quite a few of them, and most are not quite exactly found in English.

  Bilabial Alveolar Alveolo-Palatal Palatal Velar Glottal
 [-V] ([ɸ]) [s] [ɕ] ([ç])  [h]
 [+V] ([β]) [z] [ʑ]  ([ɣ]) 

Chart Note: In this chart, we include phonemes and allophones of phonemes that happen to be pronounced as fricatives. To distinguish them in the chart, allophones of phonemes pronounced as fricatives are shown in parentheses. 

Alveolo-palatal sounds in Japanese involve no rounding of the lips with the blade of the tongue behind the alveolar ridge and the body of the tongue raised toward the palate. This is unlike alveolar sounds which are made on or slightly before the alveolar ridge.

Phonemically, there are only five fricatives in Japanese, but we see the rest of the space is nearly filled up when we include allophones of other (non-)fricative phonemes. In the first column, we see [ɸ] and [β]. The first is an allophone of /h/ when it is followed by the back unrounded vowel [ɯᵝ]. The latter is an optional allophone of /b/ in rapid speech in word medial position, but it is not that common, and even in such rapid and or vulgar speech in which it would appear, it is not a guarantee that it will even appear.

[s] and [z] are fundamental fricatives of Japanese. [s]’s hissing effect is slightly more amplified in Japanese to many English ears, but this is minute and would not be a feature shared by all Japanese speakers. [z] is often restricted by speakers in word medial position, and in Standard Japanese, it is not supposed to appear word initially. /z/ when word initial or after the uvular [ɴ] is pronounced with the affricate allophone [dz]. We will learn more about affricates in the next section. We will say that because this splicing is so dependent on speaker variation that these allophones are in free variation with [dz] being the most frequent pronunciation.

The alveolopalatal fricatives [ɕ] and [ʑ] have an odd status in Japanese phonology. Some reconstructions of older forms of Japanese suggest that [s] and [z] were in fact [ɕ] and [ʑ] or at least such with [-low] vowels such as [i] and [e] because to this day, there are dialects in which [si] and [se] are realized as [ɕi] and [ɕe] respectively. These sounds became phonemes in Japanese through the introduction of loanwords from Chinese.

In Standard Japanese, /s/ has the obligatory allophone [ɕ] before the high front vowel [i]. This makes /s/ partially neutralized with [ɕ]. This allophonic variation existed before the introduction of the phoneme [ɕ]. The same came be said for /z/ which must become [ʑ] before [i]. This means that the phoneme [ʑ] becomes indistinguishable from the allophone [ʑ] of /z/.

  Unlike [z] and [dz] which are still in free variation, [ʑ] is disappearing. Regardless of location, it is replaced entirely by the allophone [dʑ].

The palatal [ç] is an obligatory phoneme of [h] before the high vowel [i] in Standard Japanese, but this is only typical of Eastern Japanese dialects. So, [hi] can be found in other dialects.

Like the bilabial fricative, the velar fricative [ɣ] is an optional allophone of /g/ in rapid and or vulgar speech, but this is somewhat more common. As it is not an obligatory feature, many speakers cannot even pronounce it. This allophone only appears word medially.

The glottal fricative [h] is very similar to the English [h] and can be accompanied by vocal cord vibration intervocalically just like in English. To many English speakers, it may sound slightly more tense than the English [h] but relatively weak before the low-mid vowel [a].


We have actually already said a lot about the affricatives of Japanese. As far as manner of production is concerned, an affricate is the combination of a stop and a fricative. In Japanese, the voiced alveopalatal fricatives and voiced affricatives have essentially collapsed together with the alveopalatal fricative pronunciation being the standard pronunciation. The alveopalatal fricative pronunciations linger on as increasingly rarer allophones of the affricates.

  Alveolar Alveolopalatal
 [-V] [ts] [tɕ]
 [+V] [dz] [dʑ]

[ts] has traditionally been an allophone of /t/ before the back unrounded vowel [ɯᵝ], but due to the introduction of loanwords from modern foreign languages, it has now become a phoneme of Japanese. It has also independently become a phoneme in other Japanese dialects aside from the help of borrowings.

[tɕ] has traditionally been an allophone of /t/ before the high front vowel [i] but became a phoneme of Japanese through borrowings from Chinese, and thanks to borrowings from modern foreign languages, it may now be used with all of the vowels of Japanese. It has traditionally been not paired with the mid-vowel [e].

The voiced affricates [dʑ] and [dz] initially came in the language via Chinese borrowings in Middle Japanese. They were maintained as distinct sounds from the voiced fricatives, but the voiced fricatives began to partially neutralize with the voiced affricatives before high vowels. Complete collapse has occurred for many speakers in which all voiced fricatives become voiced affricatives. Once [z] disappears, we will then be able to say complete collapse has occurred for all speakers. It is important to note that in some reasons, [dz] has become [d]. This means that /d/, /z/, and /dz/ are completely neutralized for some speakers.  


Approximants is not the best term in the world, but it refers to liquids and glides which are all sonorants. Sonorants are sounds created with no obstruction to air flow and are continuant. Liquids differ from glides in that liquids are [+consonantal] and glides are not. Glides are treated phonologically as consonants in Japanese, but their articulation most resembles a vowel.

  Alveolar Palatal Velar
 liquid [+V] [ɾ]  
 glide [+V]  [j] [w]

The alveolar tap/flap [ɾ] does have allophones. It is neither the English [l] nor [ɻ]. It is often described as sounding like the English [d], though the English [d] is semi-voiced whereas the Japanese [ɾ] is fully voiced and created by merely tapping the alveolar ridge. This tap exists obligatorily in American English two sonorants in an unstressed syllable and in Canadian English optionally in the same environment. However, due to the striking differences in environments of this sound in the two languages, coarticulation makes them sound quite different. Before [i] and [j], it usually sounds like the tap [ɾ], but before [o], it often sounds like the alveolar approximant [ɺ], It may also be a trill, [r], in vulgar or casual speech.

The palatal glide [j] is seen only with [a, o, ɯᵝ] in native and Sino-Japanese vocabulary. [je] has been introduced in loans, but it is frequently replaced with [i.e] especially by the older generation. [yi] has not been successfully introduced though a Katakana diglyph does exist for it (イィ). All Japanese stops and fricatives are palatalized to enlarge the Japanese phonemic inventory.

  Bilabial Alveolar Velar
 Stop [-V] [pj] [tj= tɕ] [kj]
 Stop [+V] [bj] [dj= dʑ] [gj]
 Fricative [-V]  [sj=ɕ] 
 Fricative [+V]  [zj= ʑ] 

This chart shows that phonologically the sounds in the alveolar column are combinations of a stop/fricative with a palatal glide as the underlining representation. However, this underlining representation does not reflect the surface pronunciation, which is shown to the right. [ɕ] and [ʑ] with [i] is the result of palatalization, but it is not resultant from the juxtaposition of a palatal glide. After palatalized consonants, traditionally only the vowels [a, i, ɯᵝ] would follow. The vowel [e] has only been accepted after palatalized consonants in recent modern loanwords.

The Japanese [w] is not accompanied with the large protrusion and rounding of the lips like in English and is compressed like its true vowel counterpart [ɯᵝ]. An ad hoc IPA representation of this is a double arrow ⇔ below w, but this is not standard by any means. This phoneme is now mostly restricted to [a] in native words. It survives with [o] among a decent minority in but one morpheme, [-(w)o] (accusative marker). In loans, it is seen with all vowels but [ɯᵝ]. When wu is transcribed, it is typically spelled out as [ɯᵝ:], but attempts are being made to somehow introduce it as seen in the Katakana diglyph ウゥ.  

Labialization used to be a secondary feature of pronunciation in the past. [kw] and [gw] were once phonemes of Japanese but have since collapsed completely with [k] and [g] respectively. These phonemes have arguably been re-introduced via modern loanwords, but most speakers would pronounce something like kwo as [kɯᵝ.o] instead.


There were two stops which we did not see in our discussion above, [m, n]. These sounds are undoubtedly phonemes of Japanese, but the one nasal sound that causes headaches for learners and confusion for natives is the syllabic/moraic uvular nasal [ɴ]. In totality, it has at least seven allophones which are arguably in complementary distribution in normal circumstances. This sound coarticulates with the following sound, making this progressive nasal assimilation.

  Bilabial Alveolar Alveolopalatal Palatal Velar Uvular
 nasal [+V] [m] [n] [nj] [ɲ] [ŋ] [ɴ]

Before bilabials, /ɴ/ becomes [m]. Before non-approximant alveolars, it becomes [n]. These two cases are not examples of partial neutralization because they are [+syllabic] whereas the phonemes /m/ and /n/ are [-syllabic]. They are different by one feature. Before velars, it becomes [ŋ]. Before alveo-palatal and palatal sounds, it is respectively an alveo-palatal [nj] and palatal [ɲ] respectively. Before approximants and vowels, it is either [ĩ,ɯ̃ᵝ]. The first appears before [i], but the latter occurs before everything else. This shows us that [a], despite being a central vowel, is treated like a back vowel in Japanese phonology.

Consonant Gemination

Japanese arguably has long consonants/geminates. They can be transcribed like vowels with a colon (technically a symbol that looks like a colon with triangles on top of each other instead of circles), or by doubling the consonant letter. One can interpret this as consonant fortition or glottal stop inserting before the consonant or something like it because the result is a consonant that is arguably usually two morae (though it is internalized by natives as two morae regardless if it is truly phonetically uttered as such or not).

  Thus, the symbol Q has been used by some Japanese phonologists who believe it is a moraic obstruent. At the end of vowels in abrupt utterances, a glottal stop is realized, and because the Kana scripts treat these two things as the same sound, some have argued that underlining, a phonemic glottal stop precedes a consonant to make it a geminate in Japanese. The argument, though, that Q is an archiphoneme which realizes as the sound that follows next is more plausible.

  Anyway, there are restrictions to ‘consonant gemination in Japanese’. Aside from rare loans and geminate nasals from the juxtaposition of a nasal stop and the syllabic /ɴ/, geminates are supposed to be unvoiced. According to Kawahara (2006), Japanese has a suffix -ɾi that contains a “floating mora” that triggers gemination in certain cases (e.g. |tap| +|ri| > [tappɯᵝɾi] (‘a lot of’). When this leads to a geminated voiced obstruent, a moraic nasal appears instead as a sort of “partial gemination” (e.g. |zabu| + |ri| > [zambɯᵝɾi] (‘splashing’).

Summary Chart of All Japanese Sounds including those restricted as Allophones

  Bilabial Alveolar Alveolo-palatal Palatal Velar Uvular Glottal
 Nasal m n n̠ʲ ɲ ŋ ɴ 
 Plosive b, p t, d, ts  tɕ, dʑ  k, g  ʔ
 Fricative ɸ, β s, z, dz ɕ, (ʑ) ç (Ɣ)  h
 Trill  r     
 Liquid  ɽ, ɾ, ɺ     
 Glide    j w͍  
 Moraic Obstruent       Q


2. An Introduction To Japanese Linguistics, Natsuko Tsujimura (1996)
3. Kawahara, Shigeto (2006). “A Faithfulness ranking projected from a perceptibility scale: The case of [+ Voice] in Japanese”. Language 82