• ena.png
  • enb.png
  • enc.png
  • end.png
  • ene.png
  • enf.png
  • eng.png
  • enh.png
  • eni.png
  • enj.png

Go back to complete story (French) / Voir l’article détaillé en français...

(Jul-2011) — New version available on Indiscripts.com

RELEASE NOTE (Jul-2011).— The files of IndexBrutal and IndexMatic1 are still available in the Indiscripts Repository for archiving purposes. Anyway, those ‘ancient’ scripts have been definitely overtaken by IndexMatic 2 for InDesign CS3/CS4/CS5+. IndexMatic 2 integrates all the functionnalities of IndexBrutal and IndexMatic 1, but it also provides unparalleled features and results because of its ability to combine a large number of settings and filters. Please check out the new release : www.indiscripts.com/category/projects/IndexMatic.

IMPORTANT NOTE (Sep-2008).— As explained in the FAQ section, the current version (2.1) is transitional and suffers from a bug relating the search/replace cleaning prolog. In some case, you’ll have to reset manually the search/replace fields in all tabs (“Text”, “Grep”...) before to run the script. In all case, don’t forget to pre-save your document(s) !


Overview

Zoom on the dialog interface (Mac OS)... Zoom on the dialog interface (Win)...

IndexBrutal is a script for InDesign CS/CS2/CS3. Its main task is to find a set of keywords located within the working document(s), considering each keyword one by one. By “keyword”, I mean any characters string that you could set manually in a Find Text field. The script operates exactly as InDesign would do, but it automates multiple searches and keeps for each encountered keyword occurrence the resulting page number(s). Given an input list of strings, it generates a corresponding output table reporting the indexes.

Here is the basic scenario :

1. Create the input word list

When you run the script, you’ve to provide a text file containing your word list. In most cases, a entry line simply consists of the keyword (the string that the script will search) and that’s also the string to be displayed in the output table (the indexed term). In some cases, you’ll need to dissociate the input keyword from the output term (see below : “Entry Lines Syntax : Special Operators”).

2. Open the InDesign document(s) you need to handle

If you want to target a multi-documented Book, think to open all the documents concerned by the process before you call the script. IndexBrutal’s dialog box let you switch beetween “Active document” and “All (opened) documents” target.

3. Run the script and set the options

Open the Scripts panel, double-click on IndexBrutal.js. Choose or confirm the input file location and adjust the indexing options from the dialog box (target, output mode...). Press OK.

Save the generated index table as text file, or paste it directly into a new text frame (if you checked the Output Clipoard option).

Sample of IndexBrutal table (text file)

Sample of IndexBrutal table (layout)

Entry Lines Syntax : Special Operators

The input file created before you run IndexBrutal is mainly composed of words separated by “carriage return” (CR). By default, each entry line is parsed as a simple keyword. The lowercase strings are case-insensitively tracked in the target document(s). For instance, the key smith will match “smith”, “Smith”, “SMITH”. On the other hand, if the key contains at least one capital letter (Smith, GATES), the program looks for the exact string. This allows to properly handle biographical, geographical names or uppercase acronyms without confusing with common nouns occurrences (smith who forges iron, logic gates...). Last, keys without space or punctuation are defaultly handled in “whole word” mode.

Three special characters (called operators) allow you to change the rules :

OperatorChar nameUsage
 | Vertical bar (pipe)Key-to-term linker (used to rename or group keys)
 > Bigger than (sup)Partial search operator (disable the “whole word” mode)
 ! Exclamation mark“Case handling” operator (reverse the default case-sensitivity policy)

1) Key-to-term linker ( key|term ). — A string like foo|bar tells the script to search effectively each occurrence of the keyword foo, but to substitute it for the term bar in the index table. That way, you have the possibility to rename keys or to group several keys within the same topic :

noise
cacophony|noise
clangor|noise
discordance|noise
roar|noise

The above first line tells the script to track the key noise as itself. The next commands link other words (“cacophony”, “clangor”...) to the same term noise. So, the output index will just display “noise” with the corresponding page numbers of the five keys.

2) Partial search operator ( >key ). — Placing the > operator before a keyword tells the script to search it as partial string (reversing the default “whole word” mode). This syntax extends the flexibility of the indexing process. It’s specially useful when you need to handle the root of similar words instead of keying the whole set :

>punish

The above command will handle all the words containing punish : “dispunishable”, “punisher”, “punishment”, “unpunished”... and the word “punish” itself. All the results will be reported under the term punish. Note that you could rewrite the topic by using the | operator : >punish|PUNISHMENT (for example).

Be careful when using short partial keys, like >oil or >fish, which could match hundreds of words !

3) “Case handling” operator ( !key ). — The ! operator reverses the default policy of the program. It enables a lowercase key to be case-sensitive and a non-lowercase key to be case-insensitive :

!smith
!UNESCO

The first command line handles strictly the lowercase word “smith”, rejecting “Smith” or “SMITH” for example. Conversely, the second line will handle all of the key variants (“UNESCO”, “Unesco”, or even “unesco”) and will display the term UNESCO. This command has the same effect than unesco|UNESCO.

If you combine the two operators > and !, use them in this order (> then !).

Here’s an simple example of entry file :

cricket
Scotland
>football
!gates|gate
>zealand|New Zealand

How it works :

      — the key cricket (whole word, case insensitive) matches “cricket”, “Cricket”, “CRICKET”, but not “crickets” or “cricketers” ;

      — the key Scotland (whole word, case sensitive) matches exactly itself and nothing else ;

      — the command >football is case insensitive and matches all words containing “football”, like “footballs”, “Footballer”... ;

      — the command  !gates|gate matches exactly the string “gates” (excluding “Gates”) and displays it as gate ;

      — the line >zealand|New Zealand handles the strings like “Zealand” or “zealander” and attaches all occurrences to the topic New Zealand.

Dialog Box Preferences

1. Word list file selection. — When the program starts, a file named “words.txt” is looked up in the script file directory. If found, it’s the default word list file. On cancel or file missing, IndexBrutal will ask you for the file location.

The word list file is labelled “Indexor” in the dialog top panel. A field also indicates the number of entry lines.

2. Target. — If available, the Target panel allows to work selectively on the active document or on all opened documents (IndexBrutal only handle opened documents). Don’t forget that the indexing process depends on your page numbering settings. For example, if each of two documents has a page labelled “3”, then the index table will not discriminate between the two locations.

3. Output Mode. — The Output panel let you choose between “Text file” and “Clipboard” as destination of the index table. If “Text file” is checked (default), IndexBrutal will save output file on disk and open it through the default system text editor (most of the time : TextEdit on Mac OS, NotePad for Windows users). The “Clipboard” option has not to be explained, except that it builds a temporary InDesign textframe to transfer the index data into the Clipboard. Keep in mind that this feature will delete any previous clipboard contents.

4. Formating options. — The first one (“Marks not found entries with...”) controls the appareance of missing keys in the table. If this field is left empty, the script doesn’t report them. If you key one or several characters, they will be used as marker string for each search with no match.

The second option (“Regroup pages sequences with...”) controls the assembling process of adjacent page numbers (ranges). If you specify a marker (a hyphen, for example), then IndexBrutal will compact pages sequences like 12, 13, 14, 15 in the format 12-15. Let the field empty to disable this function.

5. Index generation and sort. —- Once the dialog box has been validated, IndexBrutal starts to check occurrences in the selected document(s). Of course, the process duration depends on the complexity of keys and on the number of pages/documents to scan. In the output file, terms are sorted according to the basic latin alphabet order [a...z] (with no sensitivity to case and diacritics).

FAQ, tips & tricks

Q. — Some strings containing white space and/or special characters are not indexed. Why ?

A. — For the moment, IndexBrutal does not parse and “extrapolate” white spaces and special characters in a “generic” way. For example, a simple space keyed in the input word list will match only the ASCII space (U+0020) in the InDesign document. So, the non-breaking space(s), the en/em/thin-spaces, etc., are not tracked. Tip : If you need to capture some special or generic characters, use meta-characters like ^t (tab), ^w (all white spaces). We describe some of them here.

Q. — What about the text framed on the pasteboard (outside of the pages layout) ?

A. — No page number, no index item !

Q. — How to customize IndexBrutal ?

A.(For advanced users only !) Open the IndexBrutal.js file in a simple text editor and take a look at the “SCRIPT GLOBAL SETTINGS & PREFERENCES” section (at the beginning of the code). There’s some interesting global variables :

      — ENTRIES_FILE (defaulted to "words.txt") set the default name of the input word list (supposed to be in the IndexBrutal script directory) ;

      — SORT_TERMS (defaulted to true) enables the alphabetic sort of output terms. Set it to false to disable this feature ;

      — TERM_SEPARATOR (defaulted to "\t") generates the tab separation beetween terms and page numbers. Use another string if you need something else ;

      — PAGE_SEPARATOR (defaulted to ", ") inserts spaced commas beetween page numbers (or ranges). Replace it by your preferred separator.

Q. — Does the script process include footnotes (CS2/CS3) ?

A. — As a result of a CS2 specific bug, IndexBrutal can’t handle footnotes within ID-CS2 documents ! But that’s OK within ID-CS3. (Note that “including footnotes” will be optional in the next version of the script.)

Q. — Could IndexBrutal sort the terms correctly in other languages ?

A. — Yes it could !... but you need to patch the script. I wrote privately a special version for Norvegian (tell me if you need it). The next version will provide a better “i12n” integration.

Q. — Is there any possibility to discriminate homonyms ?

A. — No. Remember that IndexBrutal is “just a text machine”. It can’t decide between relevant and irrelevant occurrences of a same word, considered as a simple string of characters.

Q. — It seems that IndexBrutal doesn’t restore, after the job, the previous settings of the InDesign search/replace box. Does it ?

A. — Some users have reported this bug. The current version of the script deals with a multi-version API (CS/CS2/CS3) and, concerning the search/replace, the clean-up process is not perfect. The future release of Indexbrutal will fix it.

Q. — Where to report bugs ?

A. — Here : marcautret(at)free(dot)fr.

Q. — What about “IndexMatic” ?

A. — IndexMatic is a CS3-compatible InDesign script which reuses and refounds some of the IndexBrutal’s bricks in a different way, allowing to automate indexation without input file, but according to character and/or paragraph styles. This new script is GREP oriented and introduces some cool indexing features like case formatting, “minimum word length” option, progress bar... The obvious future of my project is to blend IndexBrutal into IndexMatic, leading to a complete indexing tool for Indesign users...

Q. — How can you write English so wrongly ?

A. — Excuse me... in the real life, I just speak French.

InDesignSecrets review...This script will be updated soon @ Indiscripts.com

Go back to complete story (French) / Voir l’article détaillé en français...

BlogNot! is an art and design playground fed by Marc Autret since 2004. Not sure you could really enjoy it without a XHTML/CSS compliant browser...
Feedbacks : marcautret(at)free(point)fr