]]>
dtsrcreateuser cmd
dtsrcreateCreate
and initialize a DtSearch database
dtsrcreate−q−o−fd−fa−aabstr−ddir−wnmin−wx max−llang
dbname
DESCRIPTION
The dtsrcreate command creates and initializes an
instance of a DtSearch database. A DtSearch database consists of a set of
related files. If the specified database already exists, after prompting for
confirmation, dtsrcreate will erase and reinitialize the
preexisting database.
Database Name
The dbname argument is the database
name. It is a 1 to 8 ascii character string used at creation time as a base
file name, and as a general database identifier thereafter. All created database
files are named by assembling the base name, plus a period and a 1 to 3 ASCII
character suffix. The database names dtsearch and austext are reserved and may not be specified.
Target Directory
The dbname argument can include an
optional path prefix. If it does, the database files will be created and initialized
in the specified target directory. If no path prefix is specified, the target
directory is the current working directory.
Model File
One of the created database files is based on a model file,
dtsearch.dbe, provided with DtSearch. Database creation will fail
if the model file cannot be found. dtsrcreate looks for
the model file first in the directory specified by a command line option,
if any; secondly in the current working directory; and thirdly in the optional dbname target directory.
Configuration Options
DtSearch databases can be customized with a number of configuration
options that are specified only at creation time. Initialization consists
of loading into the database a configuration and status record identifying
the configuration options for the particular database instance. After initialization, dtsrcreate prints a small report of the current contents of the
configuration record to stdout. (See also &cdeman.dtsrdbrec;,
which prints the report without changing the database).
Database Types
The customizable features available at database creation time fall into
clusters of related capabilities that constitute a set of basic database types.
When you select a database type, you prespecify a number of features that
are optimized for the basic type of database you want.
In the DtSearch database type, documents are not
stored in a repository and are not available from the search engine after
a search. The abstract returned from a search typically contains a document
reference, usually the file name, and the application is itself responsible
for accessing the document. Hilighting of search words is possible when the
application passes the document cleartext back to the DtSearch API.
In an AusText database type, compressed documents
are stored directly into a repository and the originals are thereafter ignored.
The abstracts returned from searches are typically descriptive of the documents
they represent, and are displayed directly to users. Documents can be retrieved
from an AusText type database through the API, and the
search words are highlighted as desired.
OPTIONS
The following options are available:
If an option takes a value, the value must be directly appended to the
option name without white space.
−q
Suppresses printing of configuration record report.
−o
Suppresses overwrite prompt; preauthorizes erasure and reinitialization
of preexisting database.
−ddir
Specifies where to find the model dtsearch.dbe
file, rather than in the current working directory or target directory.
−fd
Configure a DtSearch type database. This is the default.
−fa
Configure an AusText type database.
−aabstr
Set the maximum abstract size to abstr
bytes. This is the maximum permitted length in characters for an abstract
string. To optimize space considerations in the database the choice for abstract
length may be adjusted upward. Default size depends on the specified database
type. (See &cdeman.dtsrfzkfiles; and &cdeman.DtSearch;
for more information about abstract fields.)
−wnmin
Change minimum word size to min characters.
This is the minimum word size in characters to be indexed in the database.
Document and query words shorter than the minimum are treated as stop list
words (see &cdeman.dtsrfzkfiles;). The minimum can be overridden
for specific individual words by adding them to the optional include list
file (see &cdeman.dtsrfzkfiles;). For most natural languages the
default minimum word size is usually correct; permitting very short words
will usually cause a significant increase in the storage requirements for
the database. This option is typically applicable to single-byte European
languages and may be ignored by multibyte language processors. (See
&cdeman.DtSearch; for more information about DtSearch word sizes).
−wxmax
Change maximum word size to max characters.
This is the maximum word size in characters. Smaller is better since extraordinarily
long words in most documents do not represent words at all, but nonsemantic
symbol strings. To optimize space considerations in the database, the choice
for maximum word size will usually be adjusted upward. For most natural languages
the default maximum word size is usually correct. This option is typically
applicable to single-byte European languages and may be ignored by multibyte
language processors. (See &cdeman.DtSearch; for more information
about DtSearch word sizes).
−llang
Change the language number to lang.
The default is 0.
Supported languages include:
0
DtSrLaENG
English, ASCII character set
1
DtSrLaENG2
English, ISO Latin-1 character set
2
DtSrLaESP
Spanish, ISO Latin-1 character set
3
DtSrLaFRA
French, ISO Latin-1 character set
4
DtSrLaITA
Italian, ISO Latin-1 character set
5
DtSrLaDEU
German, ISO Latin-1 character set
6
DtSrLaJPN
Japanese, packed EUC character set;
all possible kanji substrings are indexed
7
DtSrLaJPN2
Japanese, packed EUC character set;
only individual kanjis are indexed, plus compounds from a knj language file
Specifying an unsupported language number will establish a DtSearch
custom language for the database. (See &cdeman.DtSearch; for
information about DtSearch languages).
OPERAND
The dbname operand specifies the new
DtSearch database. It consists of an optional path prefix, a 1- to 8-character
database name, an optional period, and an optional 1- to 3-character extension.
This is the name that the other build tools and the the search API will use
to reference the database.
ENVIRONMENT VARIABLES
None.
RESOURCES
None.
ACTIONS/MESSAGES
None.
RETURN VALUES
The return values are as follows:
0
dtsrcreate completed successfully.
non-zero
dtsrcreate encountered an error.
FILES
dtsrcreate reads dtsearch.dbe.
It creates or reinitializes the following database files:
dbname.d00
dbname.d01
dbname.d21
dbname.d22
dbname.d23
dbname.k00
dbname.k01
dbname.k21
dbname.k22
dbname.k23
It deletes the file dbname.d99.
Note that not all necessary database files are created by dtsrcreate. Some additional files are included in the DtSearch distribution,
are created by later database build programs, or may be provided by the developer.
EXAMPLES
Create a standard DtSearch type database named mydb
that will index ASCII English words of standard length for that language.
dtsrcreate mydb
Create an AusText type database named jpndb. It will
index Japanese words expressed in packed EUC, with automatic compounding of
all kanji substrings. When the text contains embedded ASCII, words that are
between 2 and 20 characters long will be indexed. At least 150 bytes will
be available for the abstract field.
dtsrcreate -fa -a150 -wn2 -wx20 -l6 jpndb
SEE ALSO
&cdeman.dtsrdbrec;, &cdeman.DtSrAPI;,
&cdeman.dtsrdbfiles;, &cdeman.DtSearch;