O’Reilly news

Unicode Explained: New Guide Makes Unicode Accessible and Usable to All

June 27, 2006

For Immediate Release For more information, a review copy, cover art, or an interview with the author, contact: Kathryn Barrett (707) 827-7094 or kathrynb@oreilly.com

Sebastopol, CA--Characters often seem simple on the surface, but they are at the heart of a wide variety of data communications and data processing problems, including text processing, typesetting, styling text, text databases, and the transmission of textual information. With markets opening up the world over, gone are the days of the 127 characters of ASCII or the 255 characters of the ANSI set. Today's developers need to write software capable of coping with more and more languages and character sets. There are hundreds of encoding systems for mapping characters to numbers, but Unicode promises a single mapping, which makes a single product solution possible across a multitude of platforms, languages, and countries. It's no wonder that industry giants like Apple, Hewlett-Packard, IBM, and Microsoft have all adopted Unicode.

The emergence of the Unicode Standard, and the availability of tools supporting it, is among the most significant recent global software technology trends. By providing a unique number for every character--regardless of platform, program, or language--Unicode enables a single software product or web site to be targeted across multiple platforms, languages, and countries without re-engineering, and allows data to be transported through different systems without corruption. Incorporating Unicode into client-server or multi-tiered applications and web sites offers significant cost savings over the use of legacy character sets.

As Jukka K. Korpela, author of Unicode Explained (O'Reilly, US $59.99) observes, "The technological basis of using Unicode, though still imperfect, is much better than most people's capabilities for making use of it. Even computer professionals often don't know how to work with large repertoires of characters. The bottleneck is a lack of basic knowledge and skills, not a lack of hardware or software." Fortunately, Korpela's book remedies this problem, making Unicode accessible to anyone who cares to learn and use it.

"Developments in Unicode implementations, Unicode-aware software tools, and font technologies are making Unicode usable by everyone," says Korpela. "There is increasing demand for Unicode due to wider awareness of the need to support different languages as well as notational systems that utilize a large repertoire of characters."

Indeed, as Korpela points out, knowledge of Unicode has become necessary to anyone who wants to be (or is) and IT professional. "Especially since the topic is usually not well covered in the education of IT professionals," says Korpela. "There are short presentations of Unicode that are often quite unsatisfactory, relating to very old versions of Unicode and often containing fundamental errors."

And to end users of IT systems, Unicode and character encodings have an increasing importance as the use of Unicode becomes more common and even the default. "In particular, people who work with modern databases, publishing systems, and web publishing belong to the avant garde in this respect," says Korpela. He notes that Windows Vista will have considerably expanded coverage of the Unicode character repertoire in some important fonts.

Unicode Explained is a comprehensive reference that contains everything needed to understand Unicode. It takes readers on a detailed tour of the complex character world. For starters, it explains how to identify and classify characters--from the common to the specialized. Then it shows how to type these characters, interpret their properties, and process character data in a robust manner.

The first few chapters teach the basics of Unicode and character data. They provide a firm grasp of the terminology readers need to make reference to various components. Then Korpela offers more detailed information about using Unicode, examining the principles and methods behind defining character codes, code conversion techniques, properties of characters, and more. The final chapters of the book cover more advanced material, such as programming to support Unicode.

"There are great opportunities to improve the quality of text representation and processing using Unicode, and serious problems that arise from clashes between Unicode and 'old technology,'" says Korpela, referring to software that cannot handle Unicode or has serious limitations with regard to it. "This book will equip developers and other readers with the capabilities for handling these possibilities and problems, and therefore increase their skill level and ability to help others."

Additional Resources:

Unicode Explained
Jukka K. Korpela
ISBN: 0-596-10121-X, 678 pages, $59.99 US
1-800-998-9938; 1-707-827-7000

About O’Reilly

O’Reilly Media spreads the knowledge of innovators through its books, online services, magazines, and conferences. Since 1978, O’Reilly Media has been a chronicler and catalyst of cutting-edge development, homing in on the technology trends that really matter and spurring their adoption by amplifying “faint signals” from the alpha geeks who are creating the future. An active participant in the technology community, the company has a long history of advocacy, meme-making, and evangelism.

Email a link to this press release