Some checks failed
Docker. / Ubuntu (push) Has been cancelled
User-agent updater. / User-agent (push) Failing after 15s
Lock Threads / lock (push) Failing after 10s
Waiting for answer. / waiting-for-answer (push) Failing after 22s
Close stale issues and PRs / stale (push) Successful in 13s
Needs user action. / needs-user-action (push) Failing after 8s
Can't reproduce. / cant-reproduce (push) Failing after 8s
247 lines
7.0 KiB
Groff
247 lines
7.0 KiB
Groff
.TH hunspell 3 "2017-11-20"
|
|
.LO 1
|
|
.hy 0
|
|
.SH NAME
|
|
\fBhunspell\fR - spell checking, stemming, morphological generation and analysis
|
|
.SH SYNOPSIS
|
|
\fB#include <hunspell.hxx> /* or */\fR
|
|
.br
|
|
\fB#include <hunspell.h>\fR
|
|
.br
|
|
.sp
|
|
.BI "Hunspell(const char *" affpath ", const char *" dpath );
|
|
.sp
|
|
.BI "Hunspell(const char *" affpath ", const char *" dpath ", const char * " key );
|
|
.sp
|
|
.BI "~Hunspell(" );
|
|
.sp
|
|
.BI "int add_dic(const char *" dpath );
|
|
.sp
|
|
.BI "int add_dic(const char *" dpath ", const char *" key );
|
|
.sp
|
|
.BI "int spell(const char *" word );
|
|
.sp
|
|
.BI "int spell(const char *" word ", int *" info ", char **" root );
|
|
.sp
|
|
.BI "int suggest(char***" slst ", const char *" word);
|
|
.sp
|
|
.BI "int analyze(char***" slst ", const char *" word);
|
|
.sp
|
|
.BI "int stem(char***" slst ", const char *" word);
|
|
.sp
|
|
.BI "int stem(char***" slst ", char **" morph ", int " n);
|
|
.sp
|
|
.BI "int generate(char***" slst ", const char *" word ", const char *" word2);
|
|
.sp
|
|
.BI "int generate(char***" slst ", const char *" word ", char **" desc ", int " n);
|
|
.sp
|
|
.BI "void free_list(char ***" slst ", int " n);
|
|
.sp
|
|
.BI "int add(const char *" word);
|
|
.sp
|
|
.BI "int add_with_affix(const char *" word ", const char *" example);
|
|
.sp
|
|
.BI "int remove(const char *" word);
|
|
.sp
|
|
.BI "char * get_dic_encoding(" );
|
|
.sp
|
|
.BI "const char * get_wordchars(" );
|
|
.sp
|
|
.BI "unsigned short * get_wordchars_utf16(int *" len);
|
|
.sp
|
|
.BI "struct cs_info * get_csconv(" );
|
|
.sp
|
|
.BI "const char * get_version(" );
|
|
.SH DESCRIPTION
|
|
The \fBHunspell\fR library routines give the user word-level
|
|
linguistic functions: spell checking and correction, stemming,
|
|
morphological generation and analysis in item-and-arrangement style.
|
|
.PP
|
|
The optional C header contains the C interface of the C++ library with
|
|
Hunspell_create and Hunspell_destroy constructor and destructor, and
|
|
an extra HunHandle parameter (the allocated object) in the
|
|
wrapper functions (see in the C header file \fBhunspell.h\fR).
|
|
.PP
|
|
The basic spelling functions, \fBspell()\fR and \fBsuggest()\fR can
|
|
be used for stemming, morphological generation and analysis by
|
|
XML input texts (see XML API).
|
|
.
|
|
.SS Constructor and destructor
|
|
Hunspell's constructor needs paths of the affix and dictionary files.
|
|
(In WIN32 environment, use UTF-8 encoded paths started with the long path prefix \\\\?\\ to handle system-independent character encoding and very long path names, too.)
|
|
See the \fBhunspell\fR(4) manual page for the dictionary format.
|
|
Optional \fBkey\fR parameter is for dictionaries encrypted by
|
|
the \fBhzip\fR tool of the Hunspell distribution.
|
|
.
|
|
.SS Extra dictionaries
|
|
The add_dic() function load an extra dictionary file.
|
|
The extra dictionaries use the affix file of the allocated Hunspell
|
|
object. Maximal number of the extra dictionaries is limited in the source code (20).
|
|
.
|
|
.SS Spelling and correction
|
|
The spell() function returns non-zero, if the input word is recognised
|
|
by the spell checker, and a zero value if not. Optional reference
|
|
variables return a bit array (info) and the root word of the input word.
|
|
Info bits checked with the SPELL_COMPOUND, SPELL_FORBIDDEN or SPELL_WARN
|
|
macros sign compound words, explicit forbidden and probably bad words.
|
|
From version 1.3, the non-zero return value is 2 for the dictionary
|
|
words with the flag "WARN" (probably bad words).
|
|
.PP
|
|
The suggest() function has two input parameters, a reference variable
|
|
of the output suggestion list, and an input word. The function returns
|
|
the number of the suggestions. The reference variable
|
|
will contain the address of the newly allocated suggestion list or NULL,
|
|
if the return value of suggest() is zero. Maximal number of the suggestions
|
|
is limited in the source code.
|
|
.PP
|
|
The spell() and suggest() can recognize XML input, see the XML API section.
|
|
.
|
|
.SS Morphological functions
|
|
The plain stem() and analyze() functions are similar to the suggest(), but
|
|
instead of suggestions, return stems and results of the morphological
|
|
analysis. The plain generate() waits a second word, too. This extra word
|
|
and its affixation will be the model of the morphological generation of
|
|
the requested forms of the first word.
|
|
.PP
|
|
The extended stem() and generate() use the results of a
|
|
morphological analysis:
|
|
.PP
|
|
.RS
|
|
.nf
|
|
char ** result, result2;
|
|
int n1 = analyze(&result, "words");
|
|
int n2 = stem(&result2, result, n1);
|
|
.fi
|
|
.RE
|
|
.PP
|
|
The morphological annotation of the Hunspell library has fixed
|
|
(two letter and a colon) field identifiers, see the
|
|
\fBhunspell\fR(4) manual page.
|
|
.PP
|
|
.RS
|
|
.nf
|
|
char ** result;
|
|
char * affix = "is:plural"; // description depends from dictionaries, too
|
|
int n = generate(&result, "word", &affix, 1);
|
|
for (int i = 0; i < n; i++) printf("%s\\n", result[i]);
|
|
.fi
|
|
.RE
|
|
.PP
|
|
.SS Memory deallocation
|
|
The free_list() function frees the memory allocated by suggest(),
|
|
analyze, generate and stem() functions.
|
|
.SS Other functions
|
|
The add(), add_with_affix() and remove() are helper functions of a
|
|
personal dictionary implementation to add and remove words from the
|
|
base dictionary in run-time. The add_with_affix() uses a second root word
|
|
as the model of the enabled affixation and compounding of the new word.
|
|
.PP
|
|
The get_dic_encoding() function returns "ISO8859-1" or the character
|
|
encoding defined in the affix file with the "SET" keyword.
|
|
.PP
|
|
The get_csconv() function returns the 8-bit character case table of the
|
|
encoding of the dictionary.
|
|
.PP
|
|
The get_wordchars() and get_wordchars_utf16() return the
|
|
extra word characters defined in affix file for tokenization by
|
|
the "WORDCHARS" keyword.
|
|
.PP
|
|
The get_version() returns the version string of the library.
|
|
.SS XML API
|
|
The spell() function returns non-zero for the "<?xml?>" input
|
|
indicating the XML API support.
|
|
.PP
|
|
The suggest() function stems, analyzes and generates the forms of the
|
|
input word, if it was added by one of the following "SPELLML" syntaxes:
|
|
.PP
|
|
.RS
|
|
.nf
|
|
<?xml?>
|
|
<query type="analyze">
|
|
<word>dogs</word>
|
|
</query>
|
|
.fi
|
|
.RE
|
|
.PP
|
|
|
|
.PP
|
|
.RS
|
|
.nf
|
|
<?xml?>
|
|
<query type="stem">
|
|
<word>dogs</word>
|
|
</query>
|
|
.fi
|
|
.RE
|
|
.PP
|
|
|
|
.PP
|
|
.RS
|
|
.nf
|
|
<?xml?>
|
|
<query type="generate">
|
|
<word>dog</word>
|
|
<word>cats</word>
|
|
</query>
|
|
.fi
|
|
.RE
|
|
.PP
|
|
|
|
.PP
|
|
.RS
|
|
.nf
|
|
<?xml?>
|
|
<query type="generate">
|
|
<word>dog</word>
|
|
<code><a>is:pl</a><a>is:poss</a></code>
|
|
</query>
|
|
.fi
|
|
.RE
|
|
.PP
|
|
|
|
.PP
|
|
.RS
|
|
.nf
|
|
<?xml?>
|
|
<query type="add">
|
|
<word>word</word>
|
|
</query>
|
|
.fi
|
|
.RE
|
|
.PP
|
|
|
|
.PP
|
|
.RS
|
|
.nf
|
|
<?xml?>
|
|
<query type="add">
|
|
<word>word</word>
|
|
<word>model_word_for_affixation_and_compounding</word>
|
|
</query>
|
|
.fi
|
|
.RE
|
|
.PP
|
|
|
|
|
|
|
|
The outputs of the type="stem" query and the stem() library function
|
|
are the same. The output of the type="analyze" query is a string contained
|
|
a <code><a>result1</a><a>result2</a>...</code> element. This
|
|
element can be used in the second syntax of the type="generate" query.
|
|
.SH EXAMPLE
|
|
See analyze.cxx in the Hunspell distribution.
|
|
.SH AUTHORS
|
|
Hunspell based on Ispell's spell checking algorithms and OpenOffice.org's Myspell source code.
|
|
.PP
|
|
Author of International Ispell is Geoff Kuenning.
|
|
.PP
|
|
Author of MySpell is Kevin Hendricks.
|
|
.PP
|
|
Author of Hunspell is László Németh.
|
|
.PP
|
|
Author of the original C API is Caolan McNamara.
|
|
.PP
|
|
Author of the Aspell table-driven phonetic transcription algorithm and code is Björn Jacke.
|
|
.PP
|
|
See also THANKS and Changelog files of Hunspell distribution.
|