Word skipping by context - GNU Aspell 0.61-cvs

Next: Hidden Markov Model, Up: Notes on Various Items

D.3.1 Word skipping by context

This was posted on the Aspell mailing list on January 1, 1999:

I had an idea on a great general way to determine if a word should be skipped. Determine the words to skip based on the symbols that (almost) always surround the word.

For example when asked to check the following C++ code:

     cout << "My age is: " << num << endl;
     cout << "Next year I will be " << num + 1 << endl;

cout, num, and endl will all be skipped. cout will be skipped because it is always preceded by a `<<'. num will be skipped because it is always preceded by a `<<'. And endl will be skipped because it is always between a `<<' and a `;'.

Given the following HTML code.

     <table width=50% cellspacing=0 cellpadding=1>
     <tr><td>One<td>Two<td>Three
     <tr><td>1<td>2<td>3
     </table>
     
     <table cellspacing=0 cellpadding=1>
     </table>

table, width cellspacing, cellpadding, tr, td will all be skipped because they are always enclosed in `<>'. Now of course table and width would be marked as correct anyway however there is no harm in skipping them.

So I was wondering if anyone on this list has any experience in writing this sort of context recognition code or could give me some pointers in the right direction.

This sort of word skipping will be very powerful if done right. I imagine that it could replace specific spell checker modes for TeX, Nroff, SGML etc because it will automatically be able to figure out where it should skip words. It could also probably do a very good job on programming languages code.

If you are interested in helping me out with this or just have general comments about the idea please let me know.