Tuesday, March 22, 2005
DHTML Hanyu Pinyin to Hanzi Converter
I needed a simple way to type a small amount of Chinese characters. I did a bit of looking around for a Chinese "input method" for Linux, and they all seemed very hard to install/configure. Many also only had documentation in Chinese for some reason.
Eventually I thought "why don't I build it myself?". After all, I know I can get the pinyin to hanzi conversion information from unicode.org. I just needed to build a UI around it. So that's what I did.
By the way, Hanyu pinyin (or just "pinyin" for short) is the standard romanization system for Mandarin Chinese in mainland China, and it mostly consists of characters that are relatively easy to type on a US keyboard. Hanzi is what Chinese characters are called in Chinese.
I first wrote a Python script which would extract the appropriate information from the Unihan database. That script's output is actually JavaScript for initializing a map (ie: an "associative array") from pinyin strings to hanzi characters.
With that I was then able to build a simple Hanyu Pinyin to Hanzi converter.
Note that this isn't meant to be a full substitute for a real Chinese input method, but it does the job if you only need a few characters, and it's a lot easier to install.
Yeah, it turns out that SCIM is only an "apt-get install" away on Debian, and it's pretty easy to use. Thanks for the tip, Evan.
"zuo4": "作做坐座",
}
That final comma causes syntax errors in those browsers. I haven't checked if it really is an ECMA standard violation.
It translates words or names from Japanese to English & romaji; and converts romanized characters (romaji) to Japanese katakana (results tend to show variations with other alphabets, too).
The way it works, it can be a little time-consuming to try to get the katakana/ hiragana/ kanji characters for the particular term you're looking for, plus the romaji, and matching it with the correct English definition. It doesn't list the romaji beside the katakana, hiragana, and kanji words. So it's a bit confusing if you don't know how to pronounce the characters.
But it works.