Portfolio - John Rowley

L.I.S.A	Industry article on internal help tool
Symantec's Houdini

Symantec Houdini:
How Symantec addresses some of its help localisation issues

by John Rowley, Symantec

Localising Windows Help should be a straightforward process. After all, it's simply a matter of translating the Rich Text Format (RTF) files used to build the help text, and making sure they compile without errors isn't it?

Well it is if you have been given accurate counts for words and graphics, receive the help files when they are completely stable, and have complete confidence in your translators to be consistent in their cross-referencing (even if the US writers were not) and unlikely to destroy any of the internal links in the help.

And then there's the real world.

Help files are more likely to be delivered piece meal, either as individual RTF files, or worse, as individual topics. Then towards the end, the writers decide to change their browse sequences, alter the context strings and improve their keyword indexing. New topics also get added at the last minute to take account of feature creep, existing topics have their cross-referencing improved, and topics (already in translation) get deleted because the writers do some last minute restructuring.

How do you keep track of it all? Translation memory systems may offer a solution, but there are other alternatives. Symantec has developed an internal tool - Houdini - to cut down the time testing help and comparing against original files. Houdini is distributed to our current vendor base and has attracted a lot of positive comment to the extent we are considering a possible release to the market on the next release of Symantec C++.

This article looks at the general issues associated with localising help, and how Houdini tries to address those issues.

Help Localisation Issues

Help localisation issues can be grouped into five categories:

Generating project statistics

Building the Help

Ensuring the consistency of the help system

Maintaining the integrity of the help

Managing project updates.

Generating project statistics

Estimating the word count for a help project is particularly troublesome. You need to open each RTF file in a word-processor and generate a word count. A product like Norton Utilities for Windows '95 has approximately 10 help projects, using about 35 RTF files. Getting a word count for each RTF file in a project like this is time-consuming and slow. So imagine the headache you'd have trying to get help writers to supply you with this information on a regular basis.

Another problem is identifying the number of bitmaps used in a project. Ideally you are working with a clean help project, where the only bitmaps in the help directory are those used by the project. However, it's not unusual for writers to leave redundant bitmaps in the directory as they build and test their help.

Traditionally, one way to find out which bitmaps are used by the project is to remove all the bitmaps from the directory, compile the help file and note the errors reported for missing bitmaps. Once again, this is time-consuming and not particularly efficient, either for the publisher or the vendor.

Generally it's useful to know where topics are located within the help system, how they refer to each other and what sort of attributes are associated with each topic (such as context strings, titles, browse sequences and so on). Trying to "map" out a help system like this is difficult, but if you have such a map it makes it a lot easier to track down and sort out problems.

Building the Help

Building the help involves organising and capturing any new bitmaps that need to be translated, including segmented bitmaps (bitmaps containing jumps), translating various options in the help project file, and finally compiling the help project. Most of this is a fairly straightforward process.

Ensuring the consistency of the help

Maintaining consistency covers two areas: cross-referencing and formatting.

Checking the consistency of cross-referencing usually looks for inconsistencies between:

Page titles and footnote titles --ideally, these should match

The jump text and the page title of the topics being jumped to

Keyword usage --for example, to ensure there are no instances of mixed case keywords (such as Utilities and utilities).

Checking the consistency of formatting looks for inconsistencies in:

Formatting of topics which are designed to "popup". Topics that have had a non-scrolling region defined in the heading (in other words, the style attribute "keep with next" was set) will not display correctly --a problem not reported by the help compiler.

Orphaned" hotspots --where the underlined text (a jump or popup) is accidentally separated from the hidden text (the context string), usually by a space

Paragraph formatting --a common problem is when paragraphs are formatted with the "hidden" attribute, causing text to display incorrectly on the screen.

Maintaining the integrity of the help

If you are translating a help file, particularly on a topic by topic basis during simultaneous translation, there's always a huge risk your files will get out of step with the US teams.

Integrity inspections try to identify:

Help topics that have duplicate titles --although an optional inspection, duplicate titles can confuse end-users when they are searching for help on a particular subject.

Missing topics --these are topics referred to in the help which are not present in the help project

Duplicate topics --although the help compiler informs you of the problem, irritatingly, it does not tell you where both topics are located, making it difficult to track down.

References to external topics (topics contained in other help files) --the help compiler has no means of checking if these topics exist.

Structural differences between the English help project and the translated help project. For example, you need to ensure that topics in the English files have matching topics in the translated files, and that the original topic and the translated topic have same browse sequences, build tags, cross-references, bitmaps and so forth.

Managing Updates

Managing updates can involve tracking weekly changes between help projects during a simultaneous ship, or identifying changes between two significant product releases. Tracking is easy if writers use tracking sheets to identify their changes, but tracking sheets that are maintained manually are difficult to implement, especially as deadlines draw close. If you are working on a simultaneous ship, then this sort of information becomes even more crucial; significantly, you need to be get this information on a regular basis so that you can monitor the progress of the help system.

From a localisation perspective, the crucial task is to determine:

Which topics can be deleted from the project -these can be manualy extracted before sending on to vendors

Which new topics have been added to the project -once identified, these can be extracted and sent out to vendors for translation while further analysis is going on.

What structural changes have been made to topics common to both releases. For example, has the word count changed between topics, have additional cross references being added, deleted or modified in any way?

Once you've identified the changes you'll probably find that the biggest problem is modifying the topics common to both releases: changing browse sequences, updating the keyword list, and (an absolute nightmare) modifying the topic's unique identifier (the context string) because writers renamed them in the new project. (On one Symantec project, a simple one hour change in the US required about 96 hours of work on the translated edition.)

Article Portfolio