Next: , Up: Notes on Various Options


4.4.1 Notes on Various Filters and Filter Modes

Aspell now has filter support. You can either select from individual filters or choose a filter mode. To select a filter mode use the mode option. You may choose from `none', `url', `email', `sgml', `ccpp', `tex' and any other available on your system. The default mode is `url'. Individual filters can be added with the option add-filter and removed with the rem-filter option. The currently available filters are `url', `email', `sgml' and `tex', `latex' (alias for `tex'), `nroff', `context', as well as a bunch of filters which translate the text from one format to another.

To check which filters are available use aspell dump filters. To check which filter modes are available use aspell dump modes. The aspell help command will also list all available filter and filter modes.

4.4.1.1 None Filter Mode

The none mode is exactly what it says. It turns off all filters.

4.4.1.2 URL Filter

The url filter/mode skips over URLs, host names, and email addresses. Because this filter is almost always useful and rarely does any harm it is enabled in all modes except none. To turn it off either select the none mode or use rem-filter option after the desired mode is selected.

4.4.1.3 Email Filter

The email filter mode skips over quoted text. It currently does not support skipping over headers however a future version should. In the meantime I suggest you use Aspell with Newsbody which can be found at http://home.worldonline.dk/~byrial/newsbody/. The option email-skip controls the number of characters that can appear before the email quote character, the default is 10. The option add|rem-email-quote controls the characters that are considered quote characters, the defaults are `>' and `|'.

4.4.1.4 SGML Filter

The SGML filter allows you to spell check SGML, HTML, XHTML, and XML files. In most cases everything within a tag `<tag attrib=value attrib2="a whole sentence">' will be skipped by the spell checker. The SGML/HTML/XML that Aspell supports is a slight superset of most DTDs (Document Type Definitions) and can spell check the often non-conforming HTML found on the web.

Two configuration options, sgml-skip and sgml-check, allow you to control what is spell checked. The tag and attribute names specified are case insensitive.

sgml-skip
This is a list of tags whose contents will also be skipped by the spell checker. For example, if you wish to leave a misspelling in a document and not have them flagged as misspellings, you could surround them with a <nospellcheck> tag:
            <TD><FONT size=2><NOSPELLCHECK>leviosa</NOSPELLCHECK>
            is what Mr. Potter said</FONT></TD>
     

And put that word in the skip config directive:

          add-sgml-skip nospellcheck
     

sgml-check
This is a list of attributes whose values you do want spell checked. By default, 'alt' (<img> alternate text) is a member of the check list since it is text that is seen by a web page viewer. You may also want 'value' to be on the check list since that is the text put on buttons:
          add-sgml-check value
     

In this case `<input type=button value="Donr">' will be flagged as a misspelling.

This filter will also translate SGML characters of the form `&#num;'. Other SGML characters such as `&amp;' will simply be skipped over so that the word `amp', for example, will not be spell checked. Eventually full support for properly translating SGML characters will be added.

4.4.1.5 HTML Filter

The html filter is like the SGML Filter Mode but specialized for HTML. By default, 'script' and 'style' are members of the skip list in HTML mode.

4.4.1.6 TeX/LaTeX Filter

The tex (all lowercase) filter mode skips over TeX commands and parameters and/or options to certain commands. It also skips over TeX comments by default. The option [dont-]tex-check-comments controls whether or not Aspell will skip over TeX comments. The option add|rem-tex-command controls which TeX commands should have certain parameters and/or options also skipped over. Commands that are not specified will have all their parameters and/or options checked. The format for each item is

     <command> <a list of p,P,o and Os>

The first item is simply the command name. The second item controls which parameters to skip over. A 'p' skips over a parameter while a 'P' doesn't. Similarly an 'o' will skip over an optional parameter while an 'O' doesn't. The first letter on the list will apply to the first parameter, the second letter will apply to the second parameter etc. If there are more parameters than letters Aspell will simply check them as normal. For example the option

     add-tex-command rule pp

will skip over the first two parameters of the rule command while the option

     add-tex-command foo Pop

will check the first parameter of the foo command, skip over the next optional parameter, if it is present, and will skip over the second parameter — even if the optional parameter is not present — and will check any additional parameters.

A `*' at the end of the command is simply ignored. For example the option

     enlargethispage p

will ignore the first parameter in both enlargethispage and enlargethispage*.

To remove a command simply use the rem-tex-command option. For example

     rem-tex-command foo

will remove the command foo, if present, from the list of TeX commands.

The TeX filter can also understand TeX commands which are used to encode accents or other non-ASCII characters, and is able to skip over the TeX hyphen command. Exactly what it understands is given by the tex.conv file.

By default, when giving suggestions for a misspelled word, the TeX commands will not be used to encode non-ASCII characters. To use the TeX commands simply set the tex-form to `multi'.

The TeX filter mode is also available via latex alias name.

4.4.1.7 Texinfo Filter

The texinfo filter allows you to spell check Texinfo files. It will skip over any Texinfo commands and their parameters when appropriate. It will also skip over some Texinfo environments such as example. The list option texinfo-ignore controls which commands to ignore the parameters of and the list option texinfo-ignore-env controls which Texinfo environments to ignore.

The Texinfo filter has special code to deal with the @table and related commands. It will apply the formatting command to each of the @item or @itemx commands just like Texinfo will. This means that if the formatting command is @code and and the @code command is a member of the texinfo-ignore option than the Texinfo filter will ignore the parameter of the @item command as if the parameter was also the parameter of the @code command.

The Texinfo filter will also skip over the `\input texinfo' line.

The Texinfo filter can also understand Texinfo commands which are used to encode accents or other non-ASCII characters, and is able to skip over the Texinfo hyphen command. Exactly what it understands is given by the texinfo.conv file.

By default, when giving suggestions for a misspelled word, the Texinfo commands will be used to encode non-ASCII characters, since Texinfo does not work well with non-ASCII characters. To avoid using Texinfo commands set the tex-form to `single'.

4.4.1.8 Nroff Filter

The nroff filter mode allows you to check the spelling of Nroff documents. The mode is enabled by giving --add-filter=nroff or -n command line option to aspell. It is also automatically enabled if the first three characters of the file being checked are .\" (a nroff comment marker) or the file name ends in a one of the following suffixes:

This filter mode skips following nroff language elements:

4.4.1.9 Genconv Filter

The genconv (Generic Convert) filter converts text from a single character encoding (such as Unicode) to a multi character encoding (such as old ASCII encoded used to encode accents). It can also be used to convert text the other way around.

The data file used is given by the genconv-file option.

The genconv-form sets the preferred form that output should be in. Currently either `single' or `multi' to use the multi character encoding.

Format of the Genconv data file

A genconv data file should have have an extension of .conv and have the folling format

     name name
     table
     data

Where name is the name of the conversion and should be the same as the base name of the file, ie the file should be named name.conv.

data is the actual data for the conversion which is a white space delimited table. Non-ASCII characters are expected to be in UTF-8. The first column is the single character encoding of letter which will generally consist of a single Unicode character encoded in UTF-8. The second column is the preferred multi-character encoding of the letter; setting the genconv-form to `multi' will use the encoding in this column. The remaining columns are alternate multi-character encodings that should be recognized.

A `.' in the first column has a special meaning. It represents the empty string. Strings found in the other columns of this row will be converted to nothing. This is usefull for removing discretionary hyphens such as the TeX `\-'. If you need to include a literal `.' use `\.'.

As an example here is a small part of the TeX conversion file, tex.conv:

     name tex
     table
     . \\-
     æ \\ae
     ç \\c{c}
     ü \\"u \\"{u}
4.4.1.10 Context Filter

The context filter allows Aspell to distinguish between visible and invisible contexts. The visible ones will be spell checked and the invisible ones will be ignored. The contexts are distinguished by the fact that the visible/invisible ones are delimited by specific and unique delimiter characters or character sequences. Whether the delimited contexts should be visible or invisible only stated by the value of the [dont-]context-visible-first option and not by the delimiters.

The context delimiters are specified as pairs of delimiters via the add|rem-context-delimiters option. The delimiters enclosing a specific context are specified as a space separated pair. If more than one delimiter pair is specified by one call of add|rem-context-delimiters they have to be combined to a comma separated list. To indicate that a context is always closed by end of line use \0 sequence as closing delimiter.

4.4.1.11 Ccpp Filter Mode

The ccpp filter mode will limit spell checking to C/C++ comments and string literals. Any code in between will be left alone.