The Unibook Character Browser

Contents

Version: 5.0.0 (build 237)
Last revised: September 18, 2006

Overview

1 Setting up and Running Unibook
1.1 Installation
1.2 Views and Navigation
1.3 Inspecting Characters
1.4 Viewing Character Properties
1.5 Viewing Fonts
1.6 Registry
1.7 Reporting Problems
 
2 Command Reference
2.1 Keyboard
2.2 Mouse
2.3 Toolbar
2.4 Menu Commands
 
3 The Input Files and their Formats
3.1 Combining Font File
3.2 Formatting Configuration File
3.3 Nameslist File
3.4 Font File
3.5 Other Files
 
4 Troubleshooting
4.1 Won't Run
4.2 Displaying Characters
4.3 Installing Additional Fonts
4.4 Files
4.5 Other Tips
 

Overview

The UnibookCharacter Browser is a tool designed to present information about the characters defined in the Unicode Standard and the International Standard ISO/IEC 10646. Using the public nameslist file and a private font suite, Unibook has been used to produce the printed and online code charts for The Unicode Standard since Version 3.0, as well as code charts for all editions of ISO/IEC 10646 since 2000.

Unibook takes a simple text file containing a character name list, plus some font and formatting configuration files, and produces both code charts and character name tables in ISO or Unicode formats. The code charts can be used interactively to look up information on particular characters or character properties, or they can be printed in a format resembling the standard documents.

Unibook allows you to view and print cross mappings for several other character sets to the Unicode Standard. (In Version 5.0.0 this function is limited to information built into and provided by the operating system.) Another useful feature is the ability to create lists of characters and load them into Unibook for proofing.

The program can also be used to prepare drafts and proposals for future additions to the Unicode and ISO standards. While there is no editing support in the program itself, all changes in content can made to the input files using plain text editors. You can also change the formatting of the resulting charts and nameslist by changing the formatting parameters in the dialogs and save these format settings as files.

Unibook requires Microsoft WindowsTM NT 4.0 or later (including Windows 2000 or XP) as well as a suitable collection of fonts to view the characters of interest. Version 5.0.0 has been tested on Windows XP only. For Windows 95, 98, and ME use Unibook 3.0.

Unibook Version 5.0.0 has a small number of nifty new features, in addition to supporting additional properties and characters defined for Unicode 5.0.0. The distribution comes complete with a set of data files from the Unicode Character Database.

1. Setting up and Running Unibook

1.1 Installation

Unibook will run best if all the supplied files are kept in the same directory.

  1. Copy the file unibook.exe and all other provided files into a directory on your hard disk.
  2. Run the INSTALL.BAT script from that directory to run Unibook for the first time.
  3. When you run Unibook for the first time, it will ask you for registration information.
  4. Initially, the program comes up in a built-in default view using the fonts selected in default.cfl.

You may manually install a shortcut in the taskbar by selecting Start/Settings/Taskbar/Advanced/Add and following the prompts.

NOTE: If you are installing version 4.1 on a system where earlier versions had been installed previously, you may need to clear the registry as described in section 4.1.2, or manually reload the default files as described in the next section.

1.1.1 Configuring for additional fonts

By default, Unibook opens the files Default.fmt and Default.cfl. These are preconfigured to make use of some of the multilingual fonts available via Microsoft Office 2000, Internet Explorer 5.0 and later or Microsoft Windows 2000 and later. Unibook will still run without these fonts installed, but  Unibook may not be able to show as many Unicode characters as would be possible by using these fonts on your system. Usually, all you need to do to activate the use of these fonts is to install them on your system. The Default.cfl is also set up to work with two large shareware fonts, Code2000 and Everson Mono Unicode. If you have one or both of these fonts installed, they will be used to display any characters not already covered by other fonts.

If you have additional fonts from other sources, first make sure they are installed in the Windows fonts directory. You can then modify and load a combining font list (*.cfl) file and a corresponding formatting configuration (*.fmt) file. For information on how to edit the sample files provided see section 3 on the input files and their formats . When loading *.fmt and *.cfl files, the best sequence to follow is to:

  1. Open an optional formatting configuration (*.fmt).
  2. Open a combining font list (*.cfl) file to tell the program what fonts to use

The second step will cause a complete re-layout and re-pagination. Once the program is initialized with a particular set of *.cfl and *.fmt files, it will reload the same files upon startup until a different set of files have been opened manually.

1.1.2 Working with multiple configurations

By using a Unibook Project file (*.upr) file, all necessary files are loaded at once. Normally, it is not necessary to manipulate project files directly, since Unibook always remembers the last settings and starts up with the same settings the next time it is started. However, if it is desired to work with multiple configurations, for example in order to inspect an older version of a nameslist, or switch between different formats, saving all the settings (including the location of the nameslist) in a project file can be very handy. Project files can also be opened at startup by placing their file name on the command line.

NOTE: Project files contain a list of filenames for the various files for a given project, such as nameslist, formatting options file, font configuration file etc. If the contents of any of the files in a project are edited outside Unibook, the changes take effect whenever the project is reloaded. In order to save formatting options in the current format configuration (*.fmt) file, use the File/Save... command. In order to save changes to a project file (i.e. after adding, or removing a file used in the project) use the File/Save Project As....command.

1.1.3 Opening a different Character Names List

By default, Unibook opens the file NamesList.lst. You can use the File / Open... command to open a different character names list (*.lst) file. This will cause a complete re-layout and re-pagination. After loading the names list, you may adjust the settings in the View / Show As... dialog to view the information in one of five modes, or select the View/ Character Set... command to switch into character mapping view. Once the program is initialized one way, it will always return to the last selected configuration upon startup. 

1.1.4 Readme file

Please read the readme file for additional information.

1.2 Views and Navigation

You can use Unibook to view and print characters in the following ways, called views:

  Index view Index view
  Iso view ISO/IEC 10646 view
  Unicode view Unicode Book style view (charts, names, or both)
  Character Set Character set mapping view

To select a view, use the View/Show As... command or the View/Characters... command or use the corresponding toolbar button.

To navigate within a view use the ,,◄◄, or  ►►toolbar buttons, or use the PgDn / PgUp, Home and End keys in combination with the Ctrl and Shift (see keyboard reference). The first page of some views will display a summary file statistics and the filename, or it may be a blank page. Use the PgDn key to begin viewing the contents. Use the Backspace key or the Go Back button to return to a previously viewed page.

To navigate within each page, use the arrow keys.

Any of the four basic views can be modified to display the results of highlighting a combination of properties:

  Highlight property or search result
  Use alternate highlight

To search for a character by any part of its name use Ctrl-F or the Goto/Find... command. All characters matching the search expression will be highlighted. To go to a character by Unicode code value, use Ctrl-G or the Goto/Go to... command. To locate a character by character by Unicode block, use Ctrl-B or the Goto/Block... command.

1.3 Inspecting Characters

In addition to the major views, Unibook provides several ways to inspect individual characters via small popup windows which are accessed by clicking on a character image or 4-digit hex code in chart or name list view. For more details on the available information, see character popups in the command reference.

The screen shot shows a sample Character Entry popup in Index View. A character entry is all the information for a given character in the Unicode nameslist. By using Ctr or Shift key while clicking, other styles of character information can be displayed.

Clicking on a character code in the popup brings up information about that character. This can be handy when following a cross reference. Double clicking on a character code will jump to the associated charts or list page. To remove a popup, simply click elsewhere on the page.

For a complete explanation of the special symbols used, and the meaning of each field, please see the description of the character code charts in The Unicode Standard.

Once a character is selected and the popup is displayed, its character code can be copied to the clipboard using Ctrl + Ins or Ctrl + C, or its character name can be copied via the right mouse button.

Character popups are also accessible in other views. On nameslist pages clicking on any part of a line that contains a character code (group of 4 hex digits) will access the character popup for the associated character. In ISO view, the Character Entry popup will reflect the layout and contents of the ISO style name list. (Some character popups require that a character nameslist is loaded).

1.4 Viewing Character Properties

Unibook provides a way to show all characters that share a given character property, for example various types of punctuation characters. Use the  View / Properties.. command or the corresponding toolbar button, either or , to and select from several sources of character property information. Within each source, you can then select the property, for example, you can select the "General category" property and the value of the property, e.g. "Po - punctuation, other".

All characters that have been assigned this value of the property will then be highlighted. By using the F7 or F8 key, or their equivalent toolbar buttons, such as F8, you can quickly navigate to through all the ranges of characters with that property value.

Use the second toolbar button to select another property to highlight for comparison, or use the Goto / Find... command to highlight all characters with a common part in their name or description, for example the word "Mark". Characters that share both properties, are highlighted with split colors.

1.5 Viewing Fonts

Unibook can also be used to view all the characters covered by a given font. Select a font to view with Options / Font.. command or open a TrueType font file with File/ Open.. command.  If the font contains characters in the private use area, make sure that "Index" is selected in the View/ Show As... dialog and that the checkmarks for "Private Use" or "All Blocks" are enabled. In the Character Display tab under Options/ Format... uncheck both "Show only valid characters" and "Mark unassigned codes". Finally, in Goto / Find..., select "Font Coverage." Type in the exact font name and press OK. Now you can navigate all areas covered by the font using the F7 and F8 key. To get back where you were before viewing the font, use the list of recent files in the File menu to reload a project, nameslist, font configuration or format file.

A new View / View Font command is under development to streamline these operations. Feature requests welcome.

1.6 Registry

Unibook always stores the latest values and settings for configuration and formatting options in the registry. You can save a particular configuration via the Save or Save As.. commands, and restore them via the Open command, by opening the corresponding ".fmt" file. Once a fmt file has been loaded, the information is kept in the registry. If you edit a .fmt file with a text editor outside Unibook, you must reload it manually via the File / Open command for the change to take effect. You can use the File / Exit and Discard command to bypass saving changes to the registry. You cannot save the name for an font file opened for viewing, nor changes made to the font table.

Manually clearing the registry key associated with Unibook restores the program to factory settings. See Returning to Factory Defaults in the section on trouble shooting at the end of this file.

1.7 Reporting Problems

First see the section on trouble shooting at the end of this file. For ways to report problems or how to make suggestions, or to check on the availability of updates to the program or this file, please see http://www.unicode.org/unibook/.


2 Command Reference

2.1 Keyboard

Key  Action
+  zoom in
-  zoom out
=  normal view
P  zoom out to view full page
W  zoom to current window width
PgUp  go to the next page (does not scroll on the same page)
PgDn  go to the previous page (does not scroll on the same page)
Ctrl+PgUp  go to the next 'section' (next 16 pages in index view, next block otherwise)
Ctrl+PgDn  go to the previous 'section' (previous 16 pages in index view, previous block otherwise)
Ctrl + Ins  place the selected character on the clipboard
Ctrl + C  place the selected character on the clipboard
Arrows  scroll inside a page (up/down arrows will not scroll to a new page)
Ctrl + Home  Jumps to first page
Ctrl + End  Jumps to last page
Home  go the first page on the plane (skips empty pages in some views)
End  go the last page on the plane (skips empty pages in some views)
Ctrl+B  go to a given Block
Ctrl+G   Go to
Ctrl+P  Print
Ctrl+S  Save to current configuration file
Ctrl+O  Open File
Back Space  Return to last page viewed
F1   Open help file (this file)
F5  Show the View / Character set dialog
F6  Show View / Properties  dialog for the primary highlight
Shift + F6  Show View / Properties  dialog for the alternate highlight
F7  Jump to the preceding page with characters of the currently highlighted property or search result
F8  Jump to the next page with characters of the currently highlighted property or search result
F9  Opens the View Font dialog

2.2 Mouse

Clicking on Left Mouse Button Right Mouse Button
Any character image in a chart or on or close to any character code in the names

Brings up more information about the character or its glyph. 

By using the Ctrl or Shift key when clicking the style of information presented can be selected:

  • Ctrl + shift selects the enlarged character image with bounding box
  • Ctrl + click selects the font name popup

To permanently select the default style of information shown use the Options/Character Popup... dialog, or click on the character with the right mouse button.

Using Ctrl + Ins or Ctrl + C after a character is selected places its character code on the clipboard

Double clicking on a character in the code charts will jump to the nameslist page for that character. Double clicking on a character in the nameslist will jump to the code chart page for that character.

Select modes for viewing characters:

  • an enlarged character image
  • an enlarged character image with bounding box drawn in
  • the full character entry from the nameslist pages
  • the requested and actual font face used to render this character

Select whether to copy the character code (Unicode value) or the character code plus name (in the U+XXXX CHARACTER NAME format) onto the clipboard.

Any character code in a popup Brings up more information about the character

Double clicking will jump to the charts page for the character.

Select modes for viewing characters
Any other part of the page No action Presents a context menu of applicable operations

2.3 Toolbar

Toolbar - The Unibook toolbar gives access to some of the more frequently accessed commands. When hovering with the mouse over each button a short description of the button will pop up. The following list describes some of the buttons in more detail.

Button

Action
Go Back Go Back to a previously visited page
Unicode view Format the code charts as presented in The Unicode Standard
Index view Format the code charts as a 16 × 16 grid
Iso view Format the code charts as presented in ISO/IEC10646
Select a primary property to highlight. The button remains depressed until it is clicked a second time. When released,  Unibook no longer highlights the primary property.
Select an alternate property to highlight. The button remains depressed until it is clicked a second time. When released, Unibook no longer highlights the alternate property.
F8 Skip to the  next page with a highlighted property or search result (whether primary or alternate highlight).
Character Set Show a character set. The Show Character Set button remains depressed until it is clicked a second time. When released, Unibook returns to showing Unicode, instead of a selected mapping to another character set.

Window Title
- The title area of the main window shows information such as the currently viewed block, the selected properties or character sets (in Index view only), or the % completed during loading of a nameslist.

2.4 Menu Commands

FILE
VIEW
GOTO
OPTIONS
TOOLS
HELP

Page Format Tab
Character Display Tab
Page Headers Tab
Namelist Layout Tab

FILE

Open...

A variety of files can be opened. The name of any opened file will be stored in the registry and, if possible, the file will be reloaded upon startup. The file also becomes part of the current project and can be saved to the project file with the Save Project As... command.

*.lst

nameslist files

*.fmt

format files

*.cfl combining font list
*.txt highlight set
*.cmb list of combining marks
*.rtl list of combining marks that overhang to the right
*.ttf a font file for viewing
Project... Show files that are currently loaded. The files shown will be reloaded when the program is restarted. Font files open for viewing will not be shown in the list, nor are they reloaded.
Save Project As... Saves the current list of open files into a new project
  projects dialog
Statistics... Show the number of entities parsed. This is most useful when editing the nameslist.
Save Save the configuration to the registry and the current *.fmt file. This is automatically done on exit.
Save As... Save the configuration into to a specific *.fmt file.
Print all pages This setting affects the Print... command. If selected, all pages will be printed
Print even pages  only This setting affects the Print... command. If selected, only even pages will be printed. Use this to make a double sided printout in two passes.
Print odd pages only This setting affects the Print... command. If selected, only odd pages will be printed. Use this to make a double sided printout in two passes.
Print... Print the document in the currently selected view. In the print dialog, you can specify a range of pages, all pages, or the current page (some versions of the Windows print dialog show this choice as 'selection').
Recent Files A list of recently opened files (up to nine).
Exit Terminates the program after saving the current configuration to the registry. If there are unsaved changes to the configuration or the project file, Unibook will prompt and ask whether these should be saved to their original files.
Exit & Discard Exit the program but do not save anything to the registry

VIEW

Show As...

Selects between major views and enables optional content.

Index - A compact 16x16 matrix format used by default when no nameslist is loaded. This is similar to the index style in the Unicode 1.0 book.

Charts Only - Display charts only.

Names Only - Just display the nameslist tables without intervening charts

Book Style - Interleave charts and tables

ISO Style - this will create framed nameslist tables with simple entries, i.e. without the extra annotation and cross reference lines present in a Unicode name entry.

Show In Index - These options are only available in index view. If all are turned on, it is possible to print a complete index for all code locations in the standard up to 0x10FFFF. (See also the section on Viewing Fonts).

Show Optional Content: Han & Hangul - With these option disabled, Hangul and Han are not part of the charts. When turned off, names for Hangul characters are also not available from the character popups.

Show Optional Content: Empty Charts - With this option disabled, charts that are empty because they cover an empty part of a block, or because of lack of font coverage, are suppressed. This also disables character information popups for the affected ranges. 

Show Optional Content: File Comments/ File Statistics - The NamesList.lst file may contain some comments, these and the file statistics can be displayed on the first few pages by checking these options. Using this option will affect page numbering for all following pages.

Note: Switching views will reset the view to the starting page.

Characters...

Select another character set to view (currently this works only with character sets installed into Windows NT).

charset

Properties...

Alternate Properties...

Select one or more character properties to highlight using the primary highlight color. You may select properties that are built into the Windows operating system, or load various external files (*.txt) from the Unicode Character Database or user defined files (External Property).

The alternate properties command is identical, except that it uses the alternate highlight color.

All characters matching the chosen property value will be shown by highlighting with the current highlight colors.  Use the F8 toolbar button or the F7 and F8 keys to jump to the next page containing characters with the selected property. The color for highlighting can be selected it applies to any currently selected property. See section 3.4 on the file formats for external property files.

Selecting multiple lines in the listbox ORs the properties together, that is, all characters matching any of the selected values will be highlighted. There are additional ways to highlight a combination of properties, see Set Union and Set Intersection. Multiple lines are ORed before applying Set Union, Set Intersection or Set Complement.

Pressing the corresponding toolbar button, either or , will toggle between clearing the highlighting for the selected properties and invoking this dialog.

Windows - This tabs allows you to inspect three groups of properties built into Windows.

Unicode Character Database - Use this tab to select one of eight groups of properties from the Unicode Character Database and associated files. For example, selecting, General Category and Letter, Uppercase will highlight all upper case letters. Selecting a property from this tab will open a local copy of the corresponding data file from the Unicode Character Database.

Additional Properties - Access a set of additional properties published by the Unicode Consortium. Some of these are collections of Boolean properties.

External Property - Highlight a single property defined by a list of characters loaded from a user-specified file using the format described in the section on Other Files.

Set Complement - Use the set complement, in other words, highlight all characters that do no match the property value.

Set Union - Highlight all characters that match the current OR a previously selected property. Disabled if no other property is selected. Use Apply after selecting the first property and OK after selecting the property with which to union it.

Set Intersection - Highlight all characters that match both the current AND a previously selected property. Disabled if no other property is selected. Use Apply after selecting the first property and OK after selecting the property with which to intersect it.

Foreground color - Selects the text color to use for highlighting. Depending on how this command was invoked it affects the primary of secondary highlighting.

Background color - Selects the background color to use for highlighting.

Zoom In Enlarge the view by 25%. Display only. No zoom settings have any effect on printing.
Zoom Out Reduce the view by 25%.
Page Width Scale the view, so a page fits the width of the window
Entire Page Scale the view, so an entire page fits the window
100% Normal view
200% Double size view
300% Triple size view
400% Quadruple size view

GOTO

Back Return to last page viewed
Page Navigate by page.
Section Navigate by section. In index view, a section is 4096 Unicode characters or 16 pages, in all other views a section is a block.
Find...

Locate and highlight all character entries matching the string. A limited form of regular expression search is supported:

^ matches the beginning of a line
$ matches the end of a line
. matches any single character
* matches any number of the character preceding the *
\ escapes the special characters

Character information - select this option to search character names or annotations

Ranges and blocks - select this option to search for groups of characters by the title of their closest enclosing subheader or block name. This is most useful in looking for groups of characters, such as 'stars'. This option ignores multiple headers for the same range. For example in the Tamil block, both the block name (Tamil) and the first subheader (Based on ISCII 1988) would be ignored as the first character range is enclosed by its own subheader (Various signs).

Font coverage - select to locate all characters covered  by a given font (the font must be part of the font definition file, or have been selected via the Options / Font command)

Use alternate highlight - keeps track of a results in a separate set displayed in a separate color, so results of more than one find operation can be viewed simultaneously. If unchecked, the find operation sets only the normal highlight. If checked, it will display the result of the new search using the second highlight.

Options - limit the search to Exact Match, matching case, information in the character name only, or information in character names plus aliases.

To jump to the next page containing a search result, use the F8 toolbar button or the F7 and F8 keys.

To clear the highlighting use the corresponding toolbar button, either or , if alternate highlight is used.

Block...

Select from a list of blocks to jump to. This list is taken from the block headers present in the NamesList.lst file. Blocks can be sorted by range or name for easier access.

Go To...

Select a character, page or plane to jump to. Goto by character will jump to the names list page containing the character.  


OPTIONS

Font

Overrides the current *.cfl file with a single font. This is useful to quickly inspect the contents of a given font. This feature supports TrueType and OpenType fonts that are Unicode-encoded. This command ignores any non-zero offsets in the ASCII offset field. Since the override is not remembered when the program exits, use a *.cfl file for permanent changes in font assignments. 

This command works best in Index view, see the View / Show As... command. To limit the display to only those characters supported by the font, use the font search feature of the Goto / Find... command.

To restore your previous font settings use File / Open... to reopen your latest *.cfl (or *upr) file, or reload the latest *.clf or *.upr file from the list of Recent Files.

Format..

The format command provides access to these dialogs:

Page Setup set page margins, pagination and numbering options
Character Display select size and shading of character cells
Headers and Footers select content and placement of headers and footers
Nameslist Layout customize the look and feel of the nameslist

The *.fmt files provided with Unibook.exe will set up consistent sets of page setup values that are independent of the nameslist or output document. The items that are expected to vary from job to job are the initial table and page number. The chapter number is used only when "include in page number" is selected.

All format choices are retained in the Windows Registry upon program exit, but can also be saved explicitly.

Character Popup..

Select the format of the popup. Supported formats are

Large character  
  This format provides an enlarged view of the representative glyph for a given character code. 

This can be useful when the glyph contains a lot of details, or when the zoom is set too small to view each individual glyph in a chart.

Use Ctrl+C to copy the character code to paste into another application.

Note: this size of this popup changes in proportion with the zoom value selected.

Glyph Information  
  This popup provides an enlarged view of the representative glyph together with additional information placed on a background grid.

The black line is the baseline. The red box outlines the ink, or black box of the glyph. The blue rectangle extends this to the top and bottom of the character cell, while the green rectangle extends from the character origin to the advance width.

Note: the size of the character popup on screen is independent of the zoom value selected.

Character Entry  
 
  This format provides the full entry for the character from the character names list. Use Ctrl-C to copy the character code and the right mouse button to access a command to copy the character code in U+XXXX notation together with the character name. Note: this format requires that a character names list file has been loaded.

The types of information that appear in a character entry are described in Chapter 17, "Code Charts", of Unicode 5.0. In addition, Unibook automatically constructs lines starting with "←". These lines refer to characters entries that cross reference the current entry.

By clicking on any character code displayed in a character entry, you can navigate to that character's entry.

Font Information  
  Use this format to view the requested and actual font face used to show the glyph, as well as the font size and offset.

TOOLS

Bidi

Runs the bidi sample code. The code executed is the sample code published with Unicode Standard Annex #9 Unicode Bidirectional Algorithm. This demo uses a pseudo-alphabet as input and displays several sets of internal values used by the algorithm.

LineBreak

Runs the line break sample code. The code executed uses the pair table published with Unicode Standard Annex #14 Line Breaking Properties, using the sample driver functions published as sample code. This demo uses a pseudo-alphabet as input and displays several sets of internal values used by the algorithm.

Save Selected Characters If you highlighted a property (or combination of properties), or loaded an external file, or created a search result, you can save a shorthand list of all character codes affected, by using this command. The result will be a plain-text file, formatted similar to the data files in the Unicode Character Database. You can read the saved file with with the Open File... command on the External Property tab of the View / Properties.... dialog.
Font Dump

Useful tool for debugging fonts and *.cfl files. Creates a text file showing which range of characters is associated with which font.

Print Block

Print the current block. (The current block is the last block selected with the Goto / Block... command).

Print All Blocks Prints all blocks (One to each file). Prints to file. Select a postcript printer before using this and the next command.
Print highlighted blocks Same as previous, only prints blocks that have highlighted characters in them.

HELP

Info Gives a pointer to this file.
About Brief information about the copyright, version number and the authors. Access to the legal license text.

Page Setup Tab

Note: The default settings are shown. The page size values can be varied, but the paper size values are fixed. To print on A4 paper, use a printer driver that can center an 8.5 × 11 print image on an A4 page. The default margins are narrow enough so that the resulting image fits on an A4 page.

Use charts/list combination - print narrow charts that have only a single column of names on the same page as their names.

Page size - change the page size (Only a fixed page size is available, however, the default margin settings are adjusted such that printing to PDF, the output can be centered to either a A4 or US Letter size).

Character Display Tab

Note on the "Characters" settings:

Mark unassigned codes - This draws a diagonal hatch in all unassigned character locations as well as in private use. Disable to view fonts that have glyphs in the private use area.

Show only valid characters - This blanks out any characters for which there are no entries in the nameslist. Disable to view fonts that have glyphs in the private use area or glyphs for characters of a later version of Unicode than the nameslist loaded by the program.

Blank characters not in font - Suppresses the 'default glyph' by the font.

Do not mark private use - don't mark private use area as unassigned - useful for viewing fonts that have glyphs for private use characters.

Proposal style view - replaces part of the character code with X or XX to indicate that code positions are tentative. Useful when using Unibook to create proposed code tables.

Adjustable width - reduces the cell width for wide charts to fit 16 columns per page.

Note on the "Special Characters" settings:

Reserved: the character code for the glyph used to show a reserved character

Not a character: code for the glyph for the not a character symbol

Dotted circle: code for the dotted circle glyph used to show combining characters

ASCII offset: This is needed since there are many characters (e.g. SPACE, NON-BREAKING SPACE and TAB) which are used both as non-printing characters and shown as special printing symbols. Using a non-zero offset and translating a font to the same offset allows the program to switch between these offset and non-offset codes to select between chart (offset) symbols and text (not-offset) characters. This value is ignored when using Unibook to view individual fonts (via Options/Font..) dialog.

Page Headers Tab

Note: The default settings are shown. The ## in the page number field is a place holder for the page number, "Page ##" would print the word "Page" in front of the page number. For the print date field, enclose any literal string in quotes, as in the example. Use d, M, and y singly, or repeated up to four times to select different formats for day, month and year. These may be placed in any order.

Nameslist Layout Tab

Note: The default settings are shown. The indents and tabs work together in aligning the elements in a character entry, with the indents being relative to the second tab stop value. Some of the values are unused in the current version of Unibook and the corresponding input fields have been disabled.


3 The Input Files and their Formats

You can create your own character charts. To create a character chart you must supply

a formatting configuration file
a combining font list
a nameslist file
several auxiliary files

These input files are described in more details below. A project file (*.upr) is a list of filenames for a consistent set of files and is useful when working with multiple configurations. Unibook can read files using little-Endian UTF-16, marked with a byte order mark (BOM) or using ASCII, ISO-Latin-1 or Windows code page 1252. There is no support for UTF-8.

3.1 Formatting Configuration File (*.fmt)

A *.fmt file is a simple text file that follow the form

key = value

where the values are either hexadecimal numbers or strings. Empty lines and lines starting with ; are ignored. The keys are defined by the program and correspond to entries made in the formatting dialogs. Once a configuration is loaded, or created by changing options in the program, its information is stored in the Windows registry and directly accessed from there. This file is normally not edited outside the program.

Any configuration can be saved to a new file at any time with the File/Save As... command.

3.2 Combining Font List (*.cfl)

3.2.1 Syntax

The *.cfl files list the fonts to be used for formatting. Fonts are described with size and attributes as well as with what character code-ranges they should or should not be used for. The list of fonts is searched in order from top to bottom for each character until a font is found that contains an glyph image for the given character. 

CFL files are plain text files (for example, they can be edited in Notepad). Unibook accepts files both in an active Windows code page, or as little-endian, byte-order marked UTF-16. This is handy whenever font family names contain non-ASCII characters. The easiest way to create a CFL file that can capitalize on the fonts available on your system is to edit the Default.cfl file and save it under a different name. Note that Unibook complains about redundant (unused) entries in any CFL file,  other than Default.cfl. Just remove or comment out any unused lines.

The following summarizes the syntax for a font entry:

<Facename>,<point size>{, <charset>}{, {B}{I}{U}} {<params>} {<switches>}
where:
<switches>:	{/S=xxxx | /O=xxxx} <limits>
        	{/Q=xxxx /R=<range>} <limits>
        	{/U=xxxx /E=xxxx} <limits>
<limits>:       {{/X=<range>} {/I=<range>}}*
<range>:	xxxx-xxxx
<params>:	{/M=ddd }{/C=xxx}
<charset>	- one of SYMBOL, ANSI, SHIFTJIS, HANGEUL, GB2312, CHINESEBIG5 
B I U		- indicate bold, italic, underline respectively
/S=xxxx		- first character code in font for a "chart font"
/O=xxxx		- offset to add to character to access glyph in font
/X=<range>	- exclude the following range, i.e. don't use this font for this range
/I=<range>	- include the following range, i.e. override any /X for this range
/Q=xxxx		- allows arbitrary selection of a <range> of glyphs
/R=<range> 	  from a font starting at character xxxx

/U=xxxx		- UTF-16 coded font (not offset), starting at xxxx
/E=xxxx		- gives ending code location for UTF-16 coded font
/M=ddd          - smallest effective point size
/C=xxxx         - use this glyph for base character

Additional restrictions and requirements:

3.2.2 Unicode Chart Fonts

A Unicode Chart Font is a TrueType font using the SYMBOL_CHARSET, and which contains the characters for a given half block of Unicode characters in the upper 128 character positions. A chart font will always claim to support all 128 characters, unless an /X switch is provided.

By convention, both the font and the file are named in a way that indicates the half block in question. For example, for the lower Cyrillic half block, the file is named unico040.ttf, with an internal font name of "Unico040". The program does not enforce this naming convention, but instead expects the character corresponding to the offset value given with the /S command switch to be at position 0x80 in the font.

Note: Unicode chart fonts are a convenient means to provide fonts for draft code charts and are extensively used for that purpose in preparing the Unicode Standard and its extensions. They are easy to create with common font editing tools, but none are commercially available.

3.3 Nameslist File (*.lst)

The name list is a plain text file that contains Unicode character codes, character names, cross references, block headers and many annotations. Normally there is no need to make any changes to this file. However, a detailed syntax description is provided in the Unicode Character Database, and by following this syntax it is possible to create name lists for proposals for characters not yet encoded. By default, Unibook loads the file NamesList.lst. Note that Unibook expects that the filename for the nameslist end in an extension ".lst" to distinguish the nameslist file from the data files for character properties. The most up-to-date version of this file always resides on http://www.unicode.org/Public/UNIDATA/NamesList.txt. If you want to upgrade to a more up-to-date copy, just save this file to your Unibook directory and rename it so the file name ends with ".lst".

Note that Unibook will complain loudly and insistently if there are syntax errors in a character nameslist. The public beta versions of the nameslist  sometimes contain such errors. Usually clicking 'ignore' will safely let Unibook continue (if you know how, you can always fix your copy of the nameslist using a plain text editor).

If you make edits to the nameslist file, make sure to save it either as 8-bit format (not UTF-8, but Latin-1, or Windows code page 1252), or as little endian UTF-16 (with leading BOM).

3.3 Font File (*.ttf)

In addition to viewing fonts already installed in the Windows font folder, you can use Unibook to load any TrueType or TrueType-based OpenType font contained in a file with the *.ttf extension. After loading the file, Unibook will open the Choose Font.. dialog, just as if you had used the Options / Font... command. However, the list of font will now contain the fonts from the font file that has been loaded. Loading additional fonts, unloads previously loaded fonts. All fonts are unloaded when Unibook exits. Font files for viewing are not part of the current project.

3.5. Other Files (*.cmb, *.txt, *.rtl)

All other files are simple lists of character codes, or character code ranges, one code or one range per line. Comments are allowed, and all text following the code on the line is ignored. Character codes must be 4-6 hex digits long and may not use lower case.

Example:

; this is an example comment
007E
10AB	;this text gets ignored
2224
4E00..AC00

The meaning of the file depends on the extension or on the command used to open it.

A *.cmb file is used to list all the characters that should be displayed with a dotted circle. Without this file, the program uses the information provided by Windows NT.

A *.rtl file is used to list all the combining characters that overhang to the right instead of to the left. Whether or not this information is needed depends on your fonts.

A *.txt file can be used to list all the characters that should be highlighted, via the View / Properties command. This is very useful for quickly verifying lists of characters. Transfer the list into the format given above and load it with the View / Properties / External Property command, select the highlight colors, and you can view the list by paging through the list of characters, easily spotting missing or extra characters in your file.

The Unicode Character Database and Additional Properties Tab load specific files from the Unicode Character Database. These files have a multi-column format requiring additional parsing support, which is not enabled when selecting the External Properties tab. Currently, not all files are supported.


4. Troubleshooting

The following sections contain some brief trouble shooting tips.

4.1 Won't Run

4.1.1 Program won't run

For versions after 3.0, the unibook.exe program will not work on Windows 95 or Windows 98 or ME. Make sure that you use Unibook 3.0 on these systems. (Due to limited Unicode support on the platform there will be differences in behavior). It is recommended to keep all files together and to start Unibook from within its directory.

4.1.2 Won't run after an update

Try manually clearing out the registry from a previous version. On the desktop click on Start / Run... In the edit field type REGEDIT or REGEDT32. Click OK. In the Registry editor go to HKEY_CURRENT_USER / Software. Select ASMUS-Inc then select Unibook. Delete this key. (This allows Unibook to start with a clean slate).

Note: editing the Windows registry can cause Windows to malfunction. Be sure only to edit the parts of the registry specific to Unibook.

Before deleting the registry key, consider exporting a copy of it with the File / Export command in RegEdit. If you forward a copy of the exported .reg file to unibook@unicode.org, it would aid in providing an eventual fix for this problem.

4.1.3 Pages appear to be missing

This is caused by the program detecting that there are no glyphs in the font for the given chart. Try repaginating using the step in 4.1.1 and make sure to set the ASCII offset field to the correct value. In addition, the Index view can be set to show empty pages. See the View/Show As... command.

4.1.4 Navigation with F7/F8 does not work

Make sure the highlighting is enabled. If necessary, click on the property icon. If highlighting is enabled, but there are missing pages due to limitations in the font, navigation via F7/F8 may not work correctly. Select View/Show As... and select Index View and make sure Show Empty pages is selected.

4.1.5 Highlighting properties or search terms isn't working

Make sure that your foreground and background colors for highlighting are not inadvertently set to black on white.

If a property applies to Surrogate Code points, Private Use code points, noncharacters or unassigned characters (including those labeled <reserved> in the code charts), Unibook cannot show a highlight for these ranges. F7 and F8 may still jump to the page, but no cells will be colored with the highlight color. If Do not mark private use is checked on the Character Display tab in the Options/Format... dialog, then Unibook can highlight properties for private use characters.

4.1.6 Unibook complains about a missing printer

When laying out the document, Unibook references the default printer installed on your system. If the printer is not available or not configured correctly, Unibook will base the layout on the current screen device. You will be able to use Unibook normally on-screen, but not be able to print. However, in some instances, printer drivers have been known to fail when queried about their availability. In such a case, you may need to define a different default printer before being able to use Unibook.

4.2 Displaying Characters

4.2.1 Blank cells

Either your selected fonts do not cover the scripts you are viewing. If this only blank cells are for characters between 0000 and 00FF, the setting of the ASCII offset in the Options/ Format/ Character Display tab does not match your font or combining font list. In that case only, try setting this value to zero or F000. 

4.2.2 Glyphs are too large to fit into the cell

Duplicate the entry for the font in the CFL file, exclude the glyph range in question on the first entry using the /X switch, and select a smaller font size on the second entry.

4.2.3 Combining marks don't overlay right

If your font already contains the little dotted circle, remove the character entry from the *.cmb file. If your font requires a RTL convention for combining marks preceding the base character, add an entry to the *.rtl file (the same code must also be entered in the *.cmb file).

4.2.4 Seeing boxed instead of dotted circles

You can set the character code used for showing the dotted circle in Options/ Format/ Character Display. A single value is used for the whole file. Select a value that matches a dotted circle character in one of the fonts loaded. Character U+25CC DOTTED CIRCLE is used by many fonts for this purpose, even though, the size and position of the character relative to its baseline is different from the glyph used to indicate combining characters in the code charts. There is a dotted circle character  in Specials.ttf that matches the glyph used in the code charts. When using default.cfl, the offset to use is E000.

4.2.6 Not seeing dotted circles characters on Windows 95

Windows 95 does not contain built-in information about combining characters. You must load a *.cmb file to tell Unibook which characters are combining. 

4.2.7 Seeing multiple dotted circles

Unibook adds dotted circles on the fly in order to display combining marks. If you are using a special purpose font that is intended for code chart viewing, as opposed to real text usage, it may have dotted circles built in. In this case, just remove the corresponding entries in your *.cmb file (If the File/Project.. command doesn't show a *.cmb file loaded, Unibook is using information from the operating system or from the Unicode property files, and you need to provide an explicit *.cmb file instead to enable this override).

4.2.8 Seeing Wingdings instead of characters

Most likely the font contained in the *.cfl file is not installed on your system. Fonts used with the /S command, or with a SYMBOL setting for the charset field will be opened as symbol fonts. If no matching font is on your system, Wingdings (or some other Symbol font on your system) will be used instead by Windows.

4.2.9 Use of /O vs. /S in the combining font list

Both the /O and the /S command implement offsetting for a range of 128 characters. Use the /O for non-symbol fonts, and the /S for symbol fonts. Use the /Q switch to access a range of characters in a non-symbol font, transposed by some amount.

4.2.10 Error message "3,1 Subtable not found"

This is usually caused by a font with an unusual internal cmap table format. Try using the font with an entry in the *.cfl file that uses the /S command.

4.2.11 Clusters of 4 boxes in the character name list

Add a one line statement like this to the top of your *.cfl file:
Arial Narrow,22, /O=E200

4.2.12 Error messages when reading *.cfl files

Unibook checks each *.cfl file for consistency and redundant entries. The Default.cfl is an exception, since it must contain the names of many fonts that may not be available on some machines. If you rename the Default.cfl file, Unibook will loudly complain about any redundant entries (fonts that are listed but not used). Just remove or comment out these entries to get rid of the warnings.

4.2.13 Supplementary character problems

Unibook fully supports supplementary characters (non-BMP characters that with code points beyond U+FFFF). If you have trouble displaying supplementary characters, even though you are using a font that has glyphs for these characters, add the following setting to your registry. 

[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\LanguagePack]
  SURROGATE=(REG_DWORD)0x00000002

This setting enables Windows 2000 and later to use the internal Uniscribe module to display supplementary characters. If you have installed any of the language packs that cause Uniscribe to be loaded, the install should have made the appropriate changes already and you should not need to apply this setting manually.

Using the /Q command in a *.cfl file, you can also use a font with e.g. private use area assignment of glyphs for supplementary characters to image these at the correct location in the code charts. To show a glyph for the character at 1D400 from a font where it is located at E000, follow this examples:

Font Name,22 /Q=1D400 /R=E000-E000

4.3 Installing Additional Fonts

4.3.1 How to install multilingual fonts for Microsoft Office, or Microsoft Windows

The website http://www.unicode.org/help/display_problems.html is regularly updated with instructions on how to install additional fonts for Microsoft Office and Microsoft Windows.

4.3.2 How to install Everson Mono Unicode

Everson Mono Unicode is a large monospaced font created by Michael Everson of Everson Typography. It is distributed as shareware. See http://www.evertype.com/emono for details. After downloading the font and extracting the TTF file into a folder on your disk, open the Windows fonts folder with the Start/Control Panel/Fonts command. From the menu, select Install... and in the Install dialog select your folder and double click on Everson Mono Unicode when it appears in the window.

4.3.3 How to install Code2000

Code2000 is a large proportionally spaced font created by James Kass It is distributed as shareware. See http://home.att.net/~jameskass/code2000_page.htm for details. After downloading the font and extracting the TTF file into a folder on your disk, open the Windows fonts folder with the Start/Control Panel/Fonts command. From the menu, select Install... and in the Install dialog select your folder and double click on Code2000 when it appears in the window.

4.3.4 Designating a large font as a default or last resort font

If you want a large font, such as Arial Unicode MS, Code2000, or Everson Mono Unicode to be your default font, edit the Default.cfl file to move the line containing it near the beginning, but after the entries for the special characters. That way, it will always be used for any character it supports. If instead you want the font to be your font of last resort, move it to the end of the file; that way, it will be used anytime no other font has a glyph for a given character.

4.4 Files

4.4.1 The last character or line in file is ignored

Add an empty line. Unibook generally requires files to have a terminal line feed.

4.4.2 Unicode in files

Unibook can read Unicode-encoded plain text files, as long as they are prefixed with a BOM (U+FEFF) and are in little-Endian byte order. This is useful for creating *.cfl files that use fonts which only have localized names. Using Unicode for name lists has not been tested. UTF-8 is not supported.

4.4.3 Error messages when reading the nameslist (*.lst) files

These should happen only when opening nameslists that have been edited by the user, occasionally for beta versions of this file. Usually, simply hitting ignore will be sufficient to allow the file to open. For a permanent fix, edit the offending line(s) in the file to fix the errors. (See the section describing the Namelist File.) Unibook maintains an internal database of "known issues" with prior public versions of the official Unicode nameslist files, some of which contain minor syntax errors. If one of those files is detected, any known errors for that file are ignored permanently.

4.5 Other Tips

4.5.1 Forcing repagination

Simply select Options/Format/Nameslist Layout and click OK. This will force a re-layout, even when no changes were made. Changing a setting in the View / Show As dialog, or opening the *.lst file via File/Open.. will also cause a re-layout.

4.5.2 Return to factory defaults

Follow the steps in 4.1.2. This resets all stored user information and configurations. The next time the program starts, you will be asked to sign in again.

4.5.3 Reloading a file

Use the list of recent files in the File menu to reload a project, nameslist, font configuration or format file. If the list is empty, try opening the file default.upr with the File/ Open.. command.

4.5.4 ISO or Unicode margins and tabs look odd

Both the Unicode and the ISO format need specific margin and tab settings to look good. While it is possible to switch between the views with a button, the margins and tab setting remain.  The default.fmt that comes with Unibook uses a set of margins and tabs that give somewhat acceptable results for both, but do not match the actual margin or tab settings for either publication. If you create margin and tab settings that are specific to either view, save your preferred settings into one or more *.fmt files with the File/Save As.. command and load these files to switch views.

4.5.5 Viewing the private use area

Unibook normally suppresses all unassigned blocks or private use areas. By default, it also uses suppresses the display of any character code not defined in the nameslist. See the section on viewing fonts for instructions on how to view fonts with characters in the private use area.


Copyright © 1995-2006 ASMUS, Inc. All Rights Reserved. This version of Unibook is distributed by the Unicode Consortium under a license from ASMUS, Inc., subject to the end user license agreement shown during startup and viewable via the Help/About/License.. command. This documentation file may not be republished in full or in part, except for the purpose of reviewing Unibook. Unibook and ASMUS are trademarks of ASMUS, Inc. Unicode is a registered trademark of the Unicode Consortium. Microsoft Windows and Microsoft Office are trademarks of Microsoft Corporation. Other terms may be trademarks of their respective trademark owners, whether identified or not.