Tuesday, March 22, 2005

DHTML Hanyu Pinyin to Hanzi Converter

I needed a simple way to type a small amount of Chinese characters. I did a bit of looking around for a Chinese "input method" for Linux, and they all seemed very hard to install/configure. Many also only had documentation in Chinese for some reason.

Eventually I thought "why don't I build it myself?". After all, I know I can get the pinyin to hanzi conversion information from unicode.org. I just needed to build a UI around it. So that's what I did.

By the way, Hanyu pinyin (or just "pinyin" for short) is the standard romanization system for Mandarin Chinese in mainland China, and it mostly consists of characters that are relatively easy to type on a US keyboard. Hanzi is what Chinese characters are called in Chinese.

I first wrote a Python script which would extract the appropriate information from the Unihan database. That script's output is actually JavaScript for initializing a map (ie: an "associative array") from pinyin strings to hanzi characters.

With that I was then able to build a simple Hanyu Pinyin to Hanzi converter.

Note that this isn't meant to be a full substitute for a real Chinese input method, but it does the job if you only need a few characters, and it's a lot easier to install.

posted Tuesday, March 22, 2005
This reminds me of a site that I've used a few times for Japanese conversion, called JEDI (http://poets.notredame.ac.jp/cgi-bin/jedi).

It translates words or names from Japanese to English & romaji; and converts romanized characters (romaji) to Japanese katakana (results tend to show variations with other alphabets, too).

The way it works, it can be a little time-consuming to try to get the katakana/ hiragana/ kanji characters for the particular term you're looking for, plus the romaji, and matching it with the correct English definition. It doesn't list the romaji beside the katakana, hiragana, and kanji words. So it's a bit confusing if you don't know how to pronounce the characters.

But it works.  
  Anonymous Em on April 05, 2005
  Blogger Evan Martin on August 23, 2005
Why'd you have to destroy all of my fun? :-)

Yeah, it turns out that SCIM is only an "apt-get install" away on Debian, and it's pretty easy to use. Thanks for the tip, Evan.  
  Blogger Laurence on August 27, 2005
To make this work in Opera and IE you should remove the last comma in your hash array literal:

"zuo4": "作做坐座",

That final comma causes syntax errors in those browsers. I haven't checked if it really is an ECMA standard violation.  
  Anonymous Hallvord on September 01, 2005
OK, checked with Opera's Mr. ECMAScript and this is a standards violation tolerated by Gecko and nobody else.  
  Anonymous Hallvord on September 01, 2005
Thanks for the bug report, Hallvord. I've removed the spurious comma.  
  Blogger Laurence on September 02, 2005
seems not to be working still...please help  
  Anonymous cartomanzia on November 10, 2007
hallvord you kave done an excellent work, thank you  
  Anonymous cartomanzia telefonica on November 15, 2007
thank s you all for your suggestions  
  Anonymous auto on November 21, 2007
Do you have any idea which website has a hanzi to pinyin converter? btw, ur converter doesnt seem to work.thanks=)  
  Anonymous Jolisa on March 02, 2009