Example analysis of the usage of Chinese characters to pinyin4j in Java

Author：Eve Cole Update Time：2025-07-01 08:00:03

This article describes the usage of Chinese characters to pinyin4j in Java. Share it for your reference, as follows:

The conversion of Chinese characters to pinyin is a very common problem in daily development. For example, in our great 12306, if you enter "WH" into the place name, you will see place names such as "Wuhan", "Wuhu", and "Weihai", and if you enter "WUHU", you will see "Wuhu".

Java obtains the pinyin of Chinese characters, and the pinyin4j library can solve this problem well.

Download address: http://sourceforge.net/projects/pinyin4j/

Download and unzip it, there is a pinyin4j-2.5.0.jar inside, just use this library.

Chinese characters to pinyin:
The code copy is as follows: String[] pinyin = PinyinHelper.toHanyuPinyinStringArray('re-re-re-re-);

The above line of code is to convert a single Chinese character to a pinyin, such as the word "heavy", which returns an array of String type:

"zhong4"

"chong2"

"金" is a polyphonic character, and the return array of this method contains the pinyin of all pronunciations of this character. The last number of each pronunciation is the tone (the first, second, third, fourth, no need to explain this).

The above is the easiest way to get a single Chinese character. You can also use HanyuPinyinOutputFormat to format the format that returns the pinyin.

 HanyuPinyinOutputFormat format = new HanyuPinyinOutputFormat();// UPPERCASE: uppercase (ZHONG)// LOWERCASE: lowercase (zhong)format.setCaseType(HanyuPinyinCaseType.LOWERCASE);// WITHOUT_TONE: no phonetic symbol (zhong)// WITH_TONE_NUMBER: 1-4 numbers represent British standard (zhong4)// WITH_TONE_MARK: Use phonetic symbols directly (WITH_U_UNICODE must be used otherwise abnormal) (zhòng)format.setToneType(HanyuPinyinToneType.WITH_TONE_MARK);// WITH_V: Use v to represent ü (nv)// WITH_U_AND_COLON: Use "u:" to represent ü (nu:)// WITH_U_UNICODE: Use ü (nü)format.setVCharType(HanyuPinyinVCharType.WITH_U_UNICODE);String[] pinyin = PinyinHelper.toHanyuPinyinStringArray('re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re- format);toHanyuPinyinStringArray If the character passed in is not a Chinese character and cannot be converted into pinyin, it will directly return null.

Although pinyin4j is very useful, it still has limitations. The above code can only obtain the pinyin of a single Chinese character, but cannot obtain the pinyin of a word containing polyphonic characters. For example, "Chongqing", it is impossible to determine whether it is "chongqing" or "zhongqing", and pinyin4j cannot judge the pronunciation of polyphonic characters through context.

Therefore, when obtaining the pronunciation of a word containing polyphonic characters, you can return to a list, and the correct pronunciation can only be manually judged and selected.

I hope this article will be helpful to everyone's Java programming.