A system for encoding a Unicode string (which may include characters of just about any language, as well as other symbols and WeirdStuff? like combining codes) into 'normal' ASCII, and decoding it back again.

This, obviously, uses up more bytes than would be needful otherwise (since you are restricted to fewer symbols) but means that the resulting text can be used in lots more places (such as in a DomainName? - the original application) - which can be useful.  Kind of similar to UUEncode in intent.


So it's not just a silly nickname for Unicode?? YouLearnSomethingNewEveryDay. --AC

To expand on the meaning of 'normal' ASCII, from the aforementioned RFC: "It uniquely and reversibly transforms a Unicode string into an ASCII string.  ASCII characters in the Unicode string are represented literally, and non-ASCII characters are represented by ASCII characters that are allowed in host name labels (letters, digits, and hyphens)." -- i.e. it is not the same as UTF-7. --Bobacus
Huh, I didn't realise this - that means that some variety of funny characters are still allowed in puny-code.  Including ones we can't easily use here.  That limits its applicability.  --Vitenka
Well I think the idea is that ASCII characters that wouldn't be allowed before are still not allowed, so wouldn't effectively occur. --Bobacus
Well yes - but it'd be nice if it compressed extended ascii into ascii you can use in forums etc.  It would be more generally useful then.  --Vitenka

CategoryKittenTechnicalMatters, CategoryLanguage, CategoryComputing

August 12, 2005