Displaying Text

Copyright © 1998-2001 John O'Conner

 

H

ave you ever tried to display text from a different language, perhaps even from a different script, only to be disappointed by jumbled characters on the screen? For example, instead of seeing seeing 郵便番号in a label, you see ????. Knowing that the Java platform's native character set is Unicode, developers and customers expect to be able to view any character. The runtime is supposed to display these characters, right? The answer is yes, maybe. Correct character display depends on four elements of your environment:

Installing adequate physical fonts, properly configuring the font.properties file for logical fonts, and using Swing components can make all the difference in whether you see multilingual text or mangled gibberish in your application. Additionally, the Java platform's text rendering capabilities affect how accurately text appears. This chapter discusses what is required to correctly display a wide range of characters in graphical applications.

Physical Fonts

The Java platform relies on your underlying physical fonts to render character glyphs.1 Glyphs are stored in fonts, and the font is responsible for providing character glyphs to the display system. Without physical fonts, you would not be able to display any characters. Physical fonts are available in many formats. On Sun Solaris, Linux, or Microsoft Windows systems, typical fonts are TrueType, OpenType, PostScript, and bitmap fonts.

These systems store fonts in different places. For example, a Windows system's fonts typically exist in the %WINROOT%\fonts directory, where %WINROOT% is the base directory of your OS. Windows uses TrueType fonts, so you will see many files with a .TTF extension. Other systems use TrueType and OpenType fonts as well, but they may store them in several places. Solaris and Linux, for example, store fonts in several subdirectories. Font paths are often locale dependent. So, a different set of fonts are available on Solaris and Linux depending on locale settings.

To install the font on a Microsoft Windows host you should open your Control Panel , and double-click on the Fonts folder. This will launch an Explorer window that shows you the fonts on your system. When you select File-Install New Font..., you will find directions that guide you through the process.

Figure 1 Install new fonts on Windows by accessing the font installer through the Control Panel.

Normally, you would not want to rely upon a specific physical font in your application. Your application will run on various platforms that may not have the same available fonts. However, starting with the Java 2 platform, Sun includes a complete Lucida font family with with every runtime or SDK environment. This allows you to write multilingual applications using a font set available wherever the Java 2 runtime exists.

The runtime provides a set of TrueType physical fonts. Several varieties of the Lucida font are available in the JRE's lib/fonts directory:

Using these fonts has several benefits:

There are some limitations, however. These Lucida fonts do not contain Asian characters. However, future versions of the platform may indeed include Chinese, Japanese, and Korean fonts. Also, the Lucida fonts are a rather straight-forward, no-nonsense font set. If you need more elaborate fonts, you may have to install them yourself.

The Java runtime can find and use fonts that are on your host system's font path or that are copied into the JRE's lib/fonts directories. This means that you have a couple options when installing new fonts. You can use your host's installation process to place fonts in an appropriate system location, or you can simply copy them to your JRE's font subdirectory.

Since J2SE 1.2, applications have been able to select fonts by their physical font family names. This allows your application to choose from any available font on your system. However, if your font doesn't provide glyphs for those characters, the runtime cannot display them. If a font doesn't contain a particular character, it often substitutes a question mark `?' in its place. Various substitution characters are used, sometimes a black box or square, but most often a question mark. This acts as a replacement marker for the actual character in the displayed text, and it signals to you that the font doesn't contain the requested character.

Using physical fonts is simple. Create the font using the font family name, a style and size:

// Font must be imported from java.awt

Font aFont = new Font("Lucida Sans", Font.PLAIN, 12);

textComponent.setFont(aFont);

Instead of embedding a font choice into your application, however, store the font choices in a ResourceBundle . Allowing your users to select the font from a combobox or list may be an even better option. Users can select fonts that support their character set and other preferences. The following code snippet shows how you can display the available fonts to your customer using a font selector component:

JList fontList;

JScrollPane scrollPane = new JScrollPane();

...

public void run() {

GraphicsEnvironment g =

GraphicsEnvironment.getLocalGraphicsEnvironment();

String[] fontNames = g.getAvailableFontFamilyNames;

fontList = new JList(fontNames);

scrollPane.setViewportView(fontList);

pack();

}

 

The result of placing the scrollpane in a JFrame is shown in Figure 2. Your application can use a similar scrollable list to allow the user to select a preferred font.

Figure 2 Allow the  customer to select a font.

A JComboBox creates an even better font selector component. The following code snippet shows part of a new component that you can create to allow customers to select from available fonts on their system2:

public class FontSelector extends JPanel implements ActionListener {

// some code has been removed for brevity

// removed code indicated by ...

...

JComboBox cbFont;

JLabel labelFontSelect = new JLabel();

private java.awt.Font selectedFont;

GridBagLayout gridBagLayout1 = new GridBagLayout();

 

public FontSelector() {

try {

jbInit();

} catch(Exception ex) {

ex.printStackTrace();

}

}

 

void jbInit() throws Exception {

...

GraphicsEnvironment gr =

GraphicsEnvironment.getLocalGraphicsEnvironment();

String[] fontFamilyNames = gr.getAvailableFontFamilyNames();

cbFont = new JComboBox(fontFamilyNames);

this.setLayout(gridBagLayout1);

labelFontSelect.setLabelFor(cbFont);

cbFont.addActionListener(this);

add(labelFontSelect, new GridBagConstraints(0, 0, 1, 1, 0.0,

0.0, GridBagConstraints.WEST, GridBagConstraints.NONE,

new Insets(5, 5, 5, 5), 0, 0));

add(cbFont, new GridBagConstraints(1, 0, 1, 1, 0.5, 0.0,

GridBagConstraints.WEST, GridBagConstraints.HORIZONTAL,

new Insets(5, 5, 5, 5), 0, 0));

setSelectedFont("Lucida Sans", fontSize);

}

...

}

You can add the above FontSelector component to your own user interface. Figure 3 shows the FontSelector at work in a different application.

Figure 3 A JComboBox helps create a great font selector widget.

If asking for the user's intervention is not appropriate for some reason, your application may have to choose a font itself. Knowing that fonts do not typically support all characters, you need a way to ask a font if it supports a specific character or set of characters. You do this with the following methods of java.awt.Font :

Given a Font[] , you can iterate through each Font calling one or more of theabove methods to determine if the font is capable of displaying the given text. The following code retrieves a list of all fonts on your system. It then tries to find the first font that can display the given text.

GraphicsEnvironment g = GraphicsEnvironment.getLocalGraphicsEnvironment();

Font[] allFonts = g.getAllFonts();

String str = "abc日本語";

int x = 0;

for (x=0; x < allFonts.length; x++) {

if (allFonts[x].canDisplayUpTo(str) == -1) {

System.out.println(allFonts[x].getFontName());

break;

}

}

// at this point you will know if a font can support the text.3

The various forms of the canDisplay() method tell you if a specific font can display any of your text. Once you find a capable font, you can call setFont() on the GUI component that must display your text before actually assigning the text to the component. This particular algorithm is potentially quite intensive, so if you do use it, make sure you do so when the user is not expecting immediate feedback. You may also choose to limit your check to a subset of fonts. Also, the font you find at the end of the algorithm is probably not appropriate. It will be too small, for example. After you find a capable font family name, create a new Font that meets your size and style requirements. Assuming the font at allFonts[x] can display your characters, you might want to do something like this

String fontName = allFonts[x].getFontName();

// pick up style and size from a ResourceBundle or user preference

Font myFont = new Font(fontName, style, size);

Although some fonts may support all Unicode characters, most fonts only support a generous subset, usually the characters associated with a specific script or legacy character set. Providing a Unicode subset is not an attempt to cripple your multilingual abilities. Most users never need most Unicode characters, and supplying a complete Unicode set would mean creating a font of enormous size. Font vendors usually opt for a smaller file size and provide characters you are most likely to use from one or two scripts.

Logical Fonts

Before version 1.2 of the Java platform, applications could not directly interact with or load physical fonts. Instead, applications used logical fonts provided by the Java platform. Although you can now use specific physical fonts in JRE 1.2 and later, logical fonts are still available. Logical fonts are created by mapping one or more physical font names to a logical font name. Logical fonts are runtime objects, and don't exist on your host. However, logical fonts do reference physical fonts that actually exist on your host.

The runtime environment maps logical font names to specific physical fonts. On some platforms, this mapping is accomplished via a font.properties configuration file4, which is different for different host environments. This font mapping allows you use a logical font name in your applications without sacrificing platform independence. The logical font names are available to applications on any J2SE or J2EE environment, but they may be mapped to different physical fonts on different hosts. Five logical fonts are defined:

The font.properties file is important to you because it helps define your JRE's ability to display text with logical font names. When applications fail to display valid characters using logical fonts, sometimes the solution involves understanding and modifying the font.properties file to use more capable physical fonts. Typically you will never have to see or understand this file. In fact, you should try to avoid any dependence on this file since it is very platform dependent, and is subject to change without notice. However, if you have a multilingual or internationalized application that uses logical fonts instead of physical fonts, you can modify this file to enhance its displayment abilities.

The font.properties file's structure will be slightly different depending on your operating system (OS). However, this file's basic content and purpose remain the same on all platforms that implement it. Although the following description of the font.properties file is based on the Solaris and Windows implementations, you should be able to apply the concepts to your own platform's implementation whether you use Linux, HP-UX, AIX or something else entirely different.

Many font.properties files exist in your host's %JAVA_HOME%/lib subdirectory. When applications use logical fonts, the JRE searches for the file that is most appropriate for your default locale, file encoding, and operating system. Although they have a .properties extension, font.properties files are not PropertyResourceBundles. Although font.properties files are customized for specific locales, unlike resource bundles, the locale designation in their name is at the very end. So instead of accessing font_ja.properties for a Japanese locale, your JRE will look for font.properties.ja.

JVMs that implement font.properties search for a specific font.properties file in the following order5:

  1. font.properties.<lang>_<region>_<encoding>.<osName><osVersion>
  2. font.properties.<lang>_<region>_<encoding>.<osName>
  3. font.properties.<lang>_<region>_<encoding>
  4. font.properties.<lang>_<encoding>.<osName><osVersion>
  5. font.properties.<lang>_<encoding>.<osName>
  6. font.properties.<lang>_<encoding>
  7. font.properties.<lang>.<osName><osVersion>
  8. font.properties.<lang>.<osName>
  9. font.properties.<lang>
  10. font.properties.<encoding>.<osName><osVersion>
  11. font.properties.<encoding>.<osName>
  12. font.properties.<encoding>
  13. font.properties.<osName><osVersion>
  14. font.properties.<osName>
  15. font.properties

In the above search, the variable names have the following meanings:

Another significant difference between a font.properties file and a resource property file is that a resource property file inherits characteristics from its superclass but a font.properties file does not. So, font.properties.ja doesn't get any support from font.properties. Although it seems like a natural association because of the .properties extension and internal key=value structure, the font file is quite different from a PropertyResourceBundle.

As you learn more about this file, you may want to edit it to support your character display needs better. For example, you may have a set of preferred fonts that provide better Unicode character coverage than the default fonts in your OS installation. Make sure you edit the file with the correct locale, encoding, and OS extensions for your environment.

If you decide that you must modify or add a font.properties file, please note that you have choices about where to put the file. If you want your change to be a system wide modification, edit the files in the JRE's lib subdirectory. If the change is for a single user, create a file in the user's home directory. The JRE always prefers font.properties files from the user's home directory if the files exist.

The font file has several sections, each serving different but related purposes, each contributing to the correct display of characters in your application. This file changes over time as the platform becomes generally less dependent on it on all host systems. Since it changes rapidly from one version of the JRE to the next, any discussion of it will probably become outdated rather quickly. For this reason, this discussion will limit its description to this file as it exists for versions 1.3.x and 1.4 of the Java platform. The file has the following sections:

Font Map

The first part of the font.properties file is the font map section. This section maps the five logical font names to physical fonts on your system. Part of a font map for JRE 1.3.x is shown below from a typical Windows implementation:

dialog.0=Arial,ANSI_CHARSET

dialog.1=WingDings,SYMBOL_CHARSET,NEED_CONVERTED6

dialog.2=Symbol,SYMBOL_CHARSET,NEED_CONVERTED

 

serif.0=Times New Roman,ANSI_CHARSET

serif.1=WingDings,SYMBOL_CHARSET,NEED_CONVERTED

serif.2=Symbol,SYMBOL_CHARSET,NEED_CONVERTED

 

serif.italic.0=Times New Roman Italic,ANSI_CHARSET

serif.italic.1=WingDings,SYMBOL_CHARSET,NEED_CONVERTED

serif.italic.2=Symbol,SYMBOL_CHARSET,NEED_CONVERTED

 

The mapping for a Solaris implementation follows:

dialog.0=-monotype-arial-regular-r-normal--*-%d-*-*-p-*-iso8859-1

dialog.1=-urw-itc zapfdingbats-medium-r-normal--*-%d-*-*-p-*-sun-fontspecific

dialog.2=-*-symbol-medium-r-normal--*-%d-*-*-p-*-sun-fontspecific

serif.0=-monotype-times new roman-regular-r---*-%d-*-*-p-*-iso8859-1

serif.1=-urw-itc zapfdingbats-medium-r-normal--*-%d-*-*-p-*-sun-fontspecific

serif.2=-*-symbol-medium-r-normal--*-%d-*-*-p-*-sun-fontspecific

serif.italic.0=-monotype-times new roman-regular-i---*-%d-*-*-p-*-iso8859-1

serif.italic.1=-urw-itc zapfdingbats-medium-r-normal--*-%d-*-*-p-*-sun-fontspecific

serif.italic.2=-*-symbol-medium-r-normal--*-%d-*-*-p-*-sun-fontspecific

This file snippet maps the dialog and serif logical font names to physical fonts on the host system. Each line of the Windows implemention in this section has the following syntax:

<logical font>.<style>.<order>=<physical font name>,<charset>,<conversion info>

The Solaris implemention is similar, but the physical font names follow the X11 font naming standard. On a Solaris system, the font map lines will have the form

<logical font>.<style>.<order>=<X11 font name>

In the Solaris font naming system, the font name itself contains information about the font's size, character set, and stylistic attributes.

The <logical font> element is either dialog, dialoginput, serif, sansserif, or monospaced. This element always begins the font map.

The <style> element refers to the bold, italics, bold italic, or plain font styles. By changing this element of a font mapping, you can use different physical fonts for different styles of the same logical font.

The <order> element allows you to map multiple physical fonts to the same logical font. Each new physical font that you add to a logical font mapping should get the next number in the order sequence. For example, if you have two fonts, Arial and Lucida, that map to the sansserif logical font, the logical font mapping for the plain style could look like this:

sansserif.0=Arial,ANSI_CHARSET

sansserif.1=Lucida,ANSI_CHARSET

In the above mapping, the plain style is implicit in the definition. This mapping tells the runtime to look at the Arial font first when searching for a character glyph. If it doesn't exist in the Arial font, search the Lucida font next.

The <physical font> element is the name of a real font on your system. What font name should go here? You should use the physical font's name, which is not necessarily the filename. Figure 4 shows a listing of a %WINROOT%/fonts directory. The font names are under the "Font Name" column.

Figure 4 Font names are typically longer than their filename.

The <charset> element in the Windows implementation shows just how OS specific this file really is. The Windows mapping values come straight out of the Win32 Software Development Kit (SDK) WINGDI.H file, which defines these character set values as integer constants. Although this element is used for the Windows implementation, it doesn't exist in either the Solaris or Linux font file. This element is used in the LOGFONT and CHARSETINFO structures in Win32 API native code. This element defines the character set represented by the physical font.

Figure 5 Some of the charsets used in the Windows font.properties files.

ANSI_CHARSET

JOHAB_CHARSET

DEFAULT_CHARSET

HEBREW_CHARSET

SYMBOL_CHARSET

ARABIC_CHARSET

SHIFTJIS_CHARSET

GREEK_CHARSET

HANGEUL_CHARSET

TURKISH_CHARSET

HANGUL_CHARSET

VIETNAMESE_CHARSET

GB2312_CHARSET

THAI_CHARSET

CHINESEBIG5_CHARSET

EASTEUROPE_CHARSET

OEM_CHARSET

RUSSIAN_CHARSET

The last element is an optional <conversion info>7. The only valid value for this optional element is NEED_CONVERTED. If you add the NEED_CONVERTED element to a font map line, you tell the runtime environment that the physical font does not use a Unicode index (cmap) for finding character glyphs. Instead, the font uses a native or legacy character set index. You may have noticed that the Symbol font uses the NEED_CONVERTED element. When your application uses the Symbol font via the logical font Dialog, the Java runtime must use the Symbol font's non-Unicode indexing. The indices are determined by converting the Unicode char values to values in a different encoding. Most TrueType fonts will not require this item in the font map. However, if you use this element for a font, you must supply the encoding information later in the file.

One important thing to know about this section is that the Java runtime finds character glyphs by searching physical fonts in the order they exist in the logical font. For example, suppose you want to display the character '亦', which is CJK UNIFIED IDEOGRAPH-4EA6, in the dialog font. The runtime uses the <order> portion of the mapping to determine the order in which to search physical fonts. Using the dialog logical font, the runtime looks at dialog.0, dialog.1, dialog.2, and then dialog.3 in that order. The runtime will use the first glyph that it finds for the character. If it finds a glyph in dialog.0 and no exclusion range8 prevents its use, other dialog definitions won't be searched.

Alias Map

The next major section of the font file is the alias map. This section is for backward compatibility with the JDK 1.01, which allowed Java font names like timesroman and helvetica. You will never need to modify this section, but here is a sample from a Windows implementation:

# name aliases

alias.timesroman=serif

alias.helvetica=sansserif

alias.courier=monospaced

Older JRE versions defined these logical fonts. Defining them here allows older applications to migrate forward to newer runtime versions. However, your newer applications should not use the timesroman, helvetica, or courier logical fonts.

Default Character Map

The default character map section specifies a substitute character that should be displayed whenever the logical font doesn't contain a character's glyph. Also, this character should be used by AWT components when their host charset doesn't contain a character. The default character can be anything you like, but it should be a character that you will not confuse with commonly used characters. Also, your default character's glyph should be available in your fonts.

The preset default character is the character '' (\u2751). This character's name is LOWER RIGHT SHADOWED WHITE SQUARE in the Unicode Standard. Its appearance in a component should alert you that a character may not be displayed properly.

# Default font definition

#

default.char=2751

Both versions 1.3.x and 1.4 of the runtime environment contain flaws regarding this particular setting. This mapping does not work at all for Swing components. If a Swing component's font doesn't support a particular character, the component will display whatever character the font has specified as a default character. This is typically a box shaped character like the WHITE VERTICAL RECTANGLE(\u25AF). Although this mapping does work for some AWT components (like Label and Button), it doesn't work on TextField and TextArea components. Instead, TextField and TextArea tend to display a question mark '?' when the native peer's charset doesn't support a character. Unfortunately, there is no simple workaround to make this work correctly and consistently. The functionality of default.char has been deprecated beginning with version J2SE 1.4.

Character Conversion Map

Prior to J2SE, Version 1.3

Most TrueType fonts have a Unicode based index. That means that you can simply ask for a character's glyph by handing over the Unicode code point value. However, some fonts don't use a Unicode index. Instead they may use a legacy character set index. Some examples of non-Unicode indexed fonts are the Symbol and Wingdings fonts. These fonts have the NEED_CONVERTED property set in their font map line. Every line with this property requires an additional entry in the font.properties file. For example, since the Symbol and Wingdings fonts have the NEED_CONVERTED property in both the dialog and the serif composite font, you must have an entry for these fonts in the conversion map section of the font.properties file as shown below. The conversion map simply converts Unicode character values to legacy character set values.

fontcharset.dialog.1=sun.awt.windows.CharToByteWingDings

fontcharset.dialog.2=sun.awt.CharToByteSymbol

fontcharset.serif.1=sun.awt.windows.CharToByteWingDings

fontcharset.serif.2=sun.awt.CharToByteSymbol

Each line has the following syntax:

fontcharset.<composite font>.<order>=<converter>

The <composite font> and <order> values should be exactly the same values as used in the font map section of this file. The <composite font> and <order> must be the same so that an association can be made from the real font to a conversion class.

In most situations, you should not have any reason to modify this section since most TrueType fonts will have a Unicode index. Of course, if you add a font to this file and suspect that it needs a conversion map line, you should consult your font vendor for details about how it's indexed.

J2SE Version 1.3 and Later

This section of the font.properties file changed significantly with version 1.3. Now every font mapping requires a fontcharset map. Moreover, this mapping is only used for AWT components. This section is not utilized for Swing components.

This section no longer directly refers to a font's character index or cmap tables. Instead, this section associates a charset encoding to each physical font in a logic font, effectively representing the characters that a font can display in an AWT component. Before sending a character to an AWT component, the runtime checks whether the char can be converted to the font's charset encoding. If the encoding cannot represent the char, the associated font will not be used to display the character. Instead the next candidate physical font in the logical font will be checked similarly.

Although the fontcharset mappings work for AWT components, the runtime does not apply them to Swing components. Swing components are capable of displaying characters that the native host often cannot.

Exclusion Range Map

The exclusion range section allows you to specify the physical font that should be used to render a character or range of characters for a logical font definition. This is useful if you have several physical fonts that contain overlapping character ranges. If you prefer one font's glyphs over another's, then you should use this section to clarify that preference.

Imagine that you want to display the character Æ, which is CYRILLIC CAPITAL LETTER ZHE (\u 0416). You specify the serif font, and you know that both the Time New Roman and Cyberbit font support this character. If you prefer the Cyberbit font, you can edit the exclusion range section of the serif.0 font to prevent the Times font from providing a glyph. The file portion below shows that the serif.0 font is actually blocking out everything from \u0100 through \uffff. Since the character Æfalls in this range, the runtime skips serif.0 and searches serif.1, serif.2, and finally serif.3. The character exists in the Cyberbit font, so the environment uses the Cyberbit glyph to display the character. You can specify multiple ranges per font by simply separating the ranges or characters by commas. If you have added the Cyberbit font to the serif logical font as serif.3, you could then modify the exclusion ranges like this:

# Exclusion Range info.

#

exclusion.dialog.0=0100-ffff

exclusion.dialoginput.0=0100-ffff

exclusion.serif.0=0100-ffff

exclusion.serif.1=0416-00416

exclusion.serif.2=0416-0416

exclusion.sansserif.0=0100-ffff

exclusion.monospaced.0=0100-ffff

Input Charset Map

The last section defines the character set used for AWT text input fields. TextField and TextArea components are examples of areas where you can enter text into your application. This map simply tells the runtime to use a font that supports the character set you indicate. It does not apply to Swing components, and it is deprecated since version 1.4.

Modifying a font.properties File

Imagine that you need to add another font to your system to get better character coverage. You must first install the font using your host's font installation mechanisms or by copying the physical font to the JRE's lib/fonts directory. Then you must edit the font.properties file so that the runtime becomes aware of how you want to use it. You must associate the physical font with a serif, sansserif, dialog, dialoginput, or monospaced font and set the appropriate options.

If you are using a JRE on a U.S. English system, the runtime will use the default font.properties. In its default state, the file configuration typically supports only languages within the European and American countries. For example, depending on what physical fonts are mapped, none of the logical fonts will have glyphs for Japanese, Chinese, or Korean. With the default settings, you probably won't be able to display those scripts in either AWT or Swing components using logical fonts. However, you can modify the font.properties file .

The first step is to find a suitable font for the characters you want to see. Suppose that you want to add the Arial Unicode MS font to your dialog logical font. Next, you must install this font. Font installation procedures vary among operating systems, but the section 'Physical Fonts' describes this for a Microsoft Windows environment.

Next, you should add the new Arial font to the font.properties file. The first change to the font map section is shown here. The font map section is the first section of the font.properties file.

dialog.0=Arial,ANSI_CHARSET

dialog.1=WingDings,SYMBOL_CHARSET

dialog.2=Symbol,SYMBOL_CHARSET

dialog.3=Arial Unicode MS,ANSI_CHARSET

 

dialog.bold.0=Arial Bold,ANSI_CHARSET

dialog.bold.1=WingDings,SYMBOL_CHARSET

dialog.bold.2=Symbol,SYMBOL_CHARSET

dialog.bold.3=Arial Unicode MS,ANSI_CHARSET

 

dialog.italic.0=Arial Italic,ANSI_CHARSET

dialog.italic.1=WingDings,SYMBOL_CHARSET

dialog.italic.2=Symbol,SYMBOL_CHARSET

dialog.italic.3=Arial Unicode MS,ANSI_CHARSET

 

dialog.bolditalic.0=Arial Bold Italic,ANSI_CHARSET

dialog.bolditalic.1=WingDings,SYMBOL_CHARSET

dialog.bolditalic.2=Symbol,SYMBOL_CHARSET

dialog.bolditalic.3=Arial Unicode MS,ANSI_CHARSET

Notice that the above changes affect all four dialog styles: plain, bold, italic, and bolditalic. In this example, we've added the Arial Unicode MS font as a fourth physical font in the dialog logical font. When the runtime searches for glyphs, it will search this new font last. If your new font contains commonly used character glyphs, you can insert its definition higher in the search order.

Next, you need to drop down in the font.properties file to the filename mapping section. Add a line that maps the logical font name to the actual physical font file name that is installed:

filename.Arial=ARIAL.TTF

filename.Arial_Bold=ARIALBD.TTF

filename.Arial_Italic=ARIALI.TTF

filename.Arial_Bold_Italic=ARIALBI.TTF

 

filename.Arial_Unicode_MS=ARIALUNI.TTF

filename.Arial_Unicode_MS_Bold=ARIALUNI.TTF

filename.Arial_Unicode_MS_Italic=ARIALUNI.TTF

filename.Arial_Unicode_MS_Bold_Italic=ARIALUNI.TTF

Although different fonts sometimes contain character glyphs of a specific style (as in the basic Arial font), the new Arial Unicode MS font will handle all the four basic styles: plain, bold, italic, and bolditalic.

The next change is in the fontcharset map. Add a line so that the JRE knows that AWT components must be limited to the CP1252 charset (on U.S. English Windows systems):

fontcharset.dialog.0=sun.io.CharToByteCp1252

fontcharset.dialog.1=sun.awt.windows.CharToByteWingDings

fontcharset.dialog.2=sun.awt.CharToByteSymbol

fontcharset.dialog.3=sun.io.CharToByteCp1252

Finally, the exclusion ranges need modification:

exclusion.dialog.0=0500-20ab,20ad-ffff

exclusion.dialog.1=3400-4dff,4e00-9fff,f900-faff

exclusion.dialog.2=3400-4dff,4e00-9fff,f900-faff

These lines tell the JRE that the physical fonts assigned to dialog.0, dialog.1, and dialog.2 should not be used for mapping any of the three Chinese, Japanese, and Korean ideograph blocks in Unicode. When our application asks for a character in the \u3400-\u4dff range, for example, the JRE will see that these these fonts are excluded from supplying glyphs in that range. The JRE then searches the last font in the series, which is dialog.3. Dialog.3, mapped to the Arial Unicode MS font, doesn't have any exclusion ranges, so it supplies the requested character glyphs.

AWT vs Swing Components

The AWT and Swing graphical components have different text display abilities. The differences exist because AWT components represent peered components. A peered component is a native graphical object that can be created with the host's API. Peered components are sometimes called heavyweight components because they rely on a native, resource hungry window that is controlled primarily by the host. Many Swing components, however, are drawn and controlled by the Java runtime itself.

AWT text components have an encoded character set, which is controlled by the language and regional settings of the host platform. The host's character set is usually a regional set, which is most likely a subset of Unicode. When applications use AWT text components, displayable text is limited to the characters defined in the host's character set. If a character is in the host's character set, AWT components can display it. If it is not, the character is typically converted to a default character, which often appears as a `?' in the displayed text. This limitation exists whether your AWT components use physical or logical fonts. Figure 6 shows the conversion of Unicode text containing Hebrew characters to a more restrictive charset like ISO-8859-1 or Latin-1. Depending on your host locale and settings, the native charset will be different.

Figure 6 AWT components apply character conversions to text.

Swing components use Unicode as their native character set. Their ability to display a character is not affected by lossy character conversions like those that exist in the AWT. That typically means that they can accurately display all Unicode characters9. The same text in Figure 6 can be easily displayed by a Swing text component because the Swing component does not perform character conversions. Given a font that provides Hebrew characters, a JTextField renders the text correctly as show in Figure 7.

Figure 7 Swing components do not perform character conversions.

In the next examples, the application combines both AWT and Swing components to show the difference in their abilities. The example uses the dialog logical font. The images in Figure 8 and Figure 9 show multilingual text before and after editing the font.properties file. Notice that the AWT components in Figure 9 are incapable of displaying any of the Thai, Hebrew, Korean, or Japanese characters in the text. The Swing components, however, can display the Thai and Hebrew characters because the default dialog logical font on the author's U.S. English system contains those glyphs. Even though the characters are in the font, the AWT components are limited to characters within the code page 1252 on Windows systems. The result is a stream of '?' and '' characters that indicate nondisplayable characters.

Figure 8 Using the dialog font before modifying font.properties.

Figure 9 Using the dialog font after modifying font.properties.

Once you have updated the font.properties as described in the previous section, the dialog font includes the physical font Arial Unicode MS . This font contains the remaining Korean and Japanese characters not shown in Figure 8. Those characters display correctly in Figure 9 after modifying the font.properties file.

Of course, your application doesn't need to use logical fonts. It can use physical fonts directly. Assuming your application uses the Arial Unicode MS font directly, the result would look the same as in Figure 9 if the application is running on a U.S. English host. Results would be different if the application were on a Korean system. The Swing output would not change, but the AWT components would display the Korean correctly.

Text Rendering Engine

A text rendering engine is responsible for positioning glyphs in a correct and visually appealing manner. Text engines must understand the sometimes peculiar characteristics of scripts to correctly position glyphs on a screen or page. With scripts like Latin, the rules for text layout can be relatively simple. More complex scripts like Thai require several levels of diacritics, which makes glyph positioning more difficult. Scripts like Arabic have many ligatures and changing glyph forms that depend on the character's position in a word. Both Arabic and Hebrew are bidirectional scripts. Text in either script flows primarily from right to left, but runs of embedded phrases can flow from left to right. Simple text engines might display these characters in some form, but the result would probably not be accurate for native readers and speakers.

The list of supported writing systems in 1.4 is listed in the following table. Supported scripts may not be displayable in all text components. Specifically, AWT components typically require that the underlying host be localized to a language using a specific script in order to display characters in that script. Up to date notes on supported scripts can be found at http://java.sun.com/j2se/1.4/docs/guide/intl/locale.doc.html 10. Other scripts may be displayable, but they are neither tested nor officially supported.

Figure 10 Supported Writing Systems in J2RE version 1.4

Script Language
Arabic Arabic
Chinese (Simplified) Chinese
Chinese (Traditional) Chinese
Cyrillic Belorussian, Russian, etc.
Devanagari Hindi
Greek Greek
Japanese Japanese
Korean Korean
Latin English, French, German, Italian, Spanish, Swedish, etc.
Latin Latvian, Lithuanian
Latin (Central European) Czech, Hungarian, Polish, etc.
Thai Thai

The Java platform's text rendering abilities for Swing components has evolved. Some of the more interesting developments have been Thai, Hindi, Hebrew, and Arabic support. All of these require special consideration beyond the needs of Western European scripts. All of these scripts are supported in the J2RE version 1.4. Arabic and Hebrew have been supported since 1.2. Figure 11 shows multiple scripts in a single document. The Java 2D font engine is responsible for rendering these scripts. Correct rendering handles the demands of ligatures, glyph reordering, and glyph reshaping that are common in Thai, Arabic, and Hindi.

Figure 11 Multiple scripts can exist in a single document.

Summary

The correct display of characters depends on many factors:

Physical fonts contain the glyphs that visually represent characters. The Java platform uses physical fonts that have been installed in your system font directories or in the JRE's lib/fonts subdirectory. Applications are dependent on physical fonts to provide glyphs. Since the release of JRE 1.2, the Java platform allows applications to use physical fonts directly. The best way to use physical fonts is to allow the user to choose them.

Logical fonts were the only fonts available to applications prior to JRE 1.2. Applications can access five logical fonts: dialog, dialoginput, serif, sansserif, and monospaced. These fonts are mapped to physical fonts on the host system. Some JVM implementations provide this mapping via a font.properties file. This file can be modified to support additional characters by adding more physical font mappings to the existing logical font names.

AWT components are peered, native objects supported directly by the host platform. They have an encoded character set that limits their ability to display characters. AWT components can display only the characters defined in their charset. Swing components, however, do not have native charset limitations. Provided with an adequate font, they can typically display a wide range of Unicode characters.

The Java platform's ability to correctly position and layout glyphs is dependent on its text rendering engine. A text rendering engine understands how to layout text in various scripts. Some scripts are more complex than others. Text engines must understand how to position diacritics, when to use ligatures, and how to use variant forms of glyphs depending on a character's position in a text stream. The text engine in the Java platform evolves over time to support additional writing systems.

 


1. See Chapter 3, Characters, for more information about glyphs.

2. Complete code examples are online. See the Preface for more information.

3. A bug in J2SDK 1.3 and 1.3.1 causes the canDisplayUpTo(String str) method to return the string's length when it can display all the chars. In this example, a successful font would return the value 6. One workaround compares the return value with the string's length. If they are the same, the font can display all the characters. This has been corrected in version 1.4 so that the method returns -1 when all chars are displayable.

4. Although many JRE implementations have this file, its existence is not mandatory. The Macintosh, for example, implements logical fonts without the font.properties file.

5. At each step in the search algorithm, the JVM looks first in directory specified by the user.home system property and then in the JRE's lib subdirectory. It uses the first properties file that it finds.

6. The NEED_CONVERTED tag is not present in version 1.4 of the platform. In version 1.3.x it is present in the file, but was not used.

7. The <conversion info> tag, although present in version 1.3.x font.properties, was not and is not currently used. It is not present in version 1.4 font.properties files. This description applies to versions earlier than 1.3.

8. Font exclusion ranges will be described in 'Exclusion Range Map'.

9. Of course, Swing components still rely on supporting fonts and text rendering engines.

10. This list of supported scripts was originally authored by Norbert Lindenberg of Sun Microsystems.