Next: C. Credits
Up: GNU Aspell 0.50.5
Previous: A. Changelog
  Contents
Subsections
Words in bold indicate how you should refer to the item when discussing
it with me or others.
These items are already done, or close to it, in the develpment version
of Aspell.
- Convert manual from LyX/LATEX to Texinfo.
- Add gettext support to Aspell.
- Allow filters to be loaded at run-time. At the moment all filters
must be complied in.
- Integrate Kevin Hendricks affix compression code into Aspell. His
code is already in use in OpenOffice as part of the lingucomponent
component. More information can be found at http://lingucomponent.openoffice.org/.
The latest version of his code is also available there.
- Make Aspell Thread safe. Even though Aspell itself is not multi-threaded
I would like it to be thread safe so that it can be used by multi-threaded
programs. There are several areas of Aspell that that are potently
thread unsafe (such as accessing a global pool) and several several
classes which have the potential of being used by more than one thread
(such as the personal dictionary). [In Progress]
- Enhance ispell.el so that it will work better with the new Aspell.
[In Progress]
B.2 Things that need to be done
These items need to be done before I consider Aspell finished. If
you are interested in helping me with one of these tasks please email
me. Good C++ skills are needed for most of these tasks involving coding.
I hope to have these all done by Aspell 0.51.
- Clean up copyright notices and bring the Aspell package up to GNU
Standards.
- Allow Aspell to check documents which are in UTF-8. I don't know the
proper way to use Unicode characters with the curses library, and
I can't seam to find any concrete documentation on how to do it. If
you have experience in this area I would really appreciate it if you
could enlighten me.
- Come up with a nroff mode for spell checking. I know nothing
about nroff. I would gladly write the filter if someone would be willing
to work with me in developing one. All I really need to know is what
to skip.
I would like to get these done. However, I may still consider Aspell
finished with out. They will probably eventually get implemented.
However, I could still use help with them.
- Use Lawrence Philips' new Double Metaphone algorithm. See http://aspell.sourceforge.net/metaphone/.
The main task involved here is converting the algorithm into table
form. This will take some time but their is no real programming experience
is required. If you want to help with Aspell but don't have any real
programming experience, this would be a great place to start.
- Create a C++ interface for Aspell, possibly on top of the C one.
- Write a GUI for the aspell utility. Ideally it should be able to do
everything the Aspell utility can do and not just be able spell check
a document.
- Better support for compound words. If you speak a language which has
a lot of compound of run-together words I would appreciate hearing
back from you. The current support for conditional compound
words will disappear in Aspell 0.51 since no one seams to be using
it. Support for unconditional compound words will still be
available. However, several people have informed me that they need
more. I attempted to provide that, but it wasn't powerful enough,
and hence unused. Thus, I am going to start from scratch, but I need
to know exactly what is involved in correct compound formation.
These items all sound like good ideas however I am not sure when I
will get to implementing then if ever.
- Come up with a plug-in for gEdit the gnome text editor.
- Change languages (and thus dictionaries) based on the information
in the actual document.
- Come up with a mode that will skip words based on the symbols that
(almost) always surround the word. (Word skipping by context)
- Create two server modes for Aspell. One that uses the DICT
protocol and one that uses ispell -a method of communication
of some arbitrary port.
- Come up with thread safe personal dictionaries.
- Use the Hidden Markov Model to base the suggestions on not
only the word itself but on the context around the word.
- Having a way to email the personal dictionary and/or
replacement list to a particular address either periodical or when
it grows to a certain size.
- Be able to accept words with spaces in them as many
languages have words such as as a word in a foreign phrases which
only make sense when followed by other words.
The following good ideas where found in the ispell WISHES file so
I thought I would pass them on.
- Ispell should be smart enough to ignore hyphenation signs, such as
the TEX \- hyphenation indicator.
- (Jeff Edmonds) The personal dictionary should be able to remove certain
words from the master dictionary, so that obscure words like "wether"
wouldn't mask favorite typos.
- (Jeff Edmonds) It would be wonderful if ispell could correct inserted
spaces such as "th e" for "the"
or even "can not" for "cannot".
- Since ispell has dictionaries available to it, it is conceivable that
it could automatically determine the language of a particular file
by choosing the dictionary that produced the fewest spelling errors
on the first few lines.
I want to implement this give other people an idea of how it should
be done and because I am really sick of having to spell check through
url and email address.
This was posted on the Aspell mailing list on January 1, 1999:
I had an idea on a great general way to determine if a word should
be skipped. Determine the words to skip based on the symbols that
(almost) always surround the word.
For example when asked to check the following C++ code:
-
- cout « "My age is: " « num « endl;
cout « "Next year I will be " « num + 1 « endl;
cout, num, and endl will all be skipped. "cout"
will be skipped because it is always preceded by a «. "num"
will be skipped because it is always preceded by a «. And "endl"
will be skipped because it is always between a « and a ;.
Given the following html code.
-
- <table width=50% cellspacing=0 cellpadding=1>
<tr><td>One<td>Two<td>Three
<tr><td>1<td>2<td>3
</table>
<table cellspacing=0 cellpadding=1>
</table>
table, width cellspacing, cellpadding, tr, td will all be skipped
because they are always enclosed in "<>". Now of
course table and width would be marked as correct anyway however there
is no harm in skipping them.
So I was wondering if anyone on this list has any experience in writing
this sort of context recognition code or could give me some pointers
in the right direction.
This sort of word skipping will be very powerful if done right. I
imagine that it could replace specific spell checker modes for Tex,
Nroff, SGML etc because it will automatically be able to figure out
where it should skip words. It could also probably do a very good
job on programming languages code.
If you are interested in helping be out with this or just have general
comments about the idea please let me know.
Knud Haugaard Sørensen suggested this one. From his email on the Aspell
mailing list:
consider this examples.
a fone number. -> a phone number.
a fone dress. -> a fine dress.
the example illustrates that the right correction might depend on
the context of the word. So I suggest that you take a look on HMM
to solve this problem.
This might also provide a good base to include grammar correction
in aspell.
see this link http://www.cse.ogi.edu/CSLU/HLTsurvey/ch1node7.html
I think it is a great idea. However unfortunately it will probably
be very complicated to implement. Perhaps in the far future.
Some one suggest in a personal email:
Have you thought of adding a function to aspell, that - when the personal
dictionary has grown significantly - sends the user's personal dictionary
to the maintainer of the corresponding aspell dictionary? (if the
user allows it)
It would be a very useful service to the dictionary maintainers, and
I think most users can see their benefit in it too.
And I replied:
Yes I have considered something like that but not for the personal
dictionaries but rather the replacement word list in order to get
better test data for http://aspell.sourceforge.net/test/. The
problem is I don't know of a good way to do this sense Aspell can
also be used as a library. It also is not a real high priority, especially
sense I would first need to learn how to send email within a C++ program.
While this is something I would like to do it is not a simple task.
The basic problem is that when tokenizing a string there is no good
way to keep phrases together. So the solution is to some how add special
conditions to certain words which will dictate which words can come
before/after it. Then there is also a problem of how to come up with
intelligent suggestions. What further complicates things is that many
applications send words to Aspell a word at a time. So even if Aspell
did support such a thing many applications that would use Aspell will
not. So, in order for this to work applications will need to send
text to Aspell a document or at least a sentence at a time. Unfortunately
the framework for doing this is not there yet. It will be once I finish
the filter interface. Another possible is to provide call back functions
in which Aspell will be able to request the previous or next word
on request. Yet again the framework for doing this is not there. Perhaps
sometime in the near future.
Next: C. Credits
Up: GNU Aspell 0.50.5
Previous: A. Changelog
  Contents
Kevin Atkinson
2004-02-10