250 lines
9.6 KiB
Plaintext
250 lines
9.6 KiB
Plaintext
<!-- $XConsortium: srload.sgm /main/7 1996/09/08 19:57:05 rws $ -->
|
|
<!-- (c) Copyright 1996 Digital Equipment Corporation. -->
|
|
<!-- (c) Copyright 1996 Hewlett-Packard Company. -->
|
|
<!-- (c) Copyright 1996 International Business Machines Corp. -->
|
|
<!-- (c) Copyright 1996 Sun Microsystems, Inc. -->
|
|
<!-- (c) Copyright 1996 Novell, Inc. -->
|
|
<!-- (c) Copyright 1996 FUJITSU LIMITED. -->
|
|
<!-- (c) Copyright 1996 Hitachi. -->
|
|
<![%CDE.C.CDE; [<refentry id="CDE.SEARCH.dtsrload">]]>
|
|
<refmeta><refentrytitle>dtsrload</refentrytitle><manvolnum>user cmd</manvolnum>
|
|
</refmeta>
|
|
<refnamediv><refname><command>dtsrload</command></refname><refpurpose>Load
|
|
document objects in a database</refpurpose></refnamediv>
|
|
<refsynopsisdiv>
|
|
<cmdsynopsis>
|
|
<command>dtsrload</command>
|
|
<arg choice="plain">−d<replaceable>dbname</replaceable></arg>
|
|
<arg choice="opt">−c</arg>
|
|
<arg choice="opt">−t<replaceable>etxstr</replaceable></arg>
|
|
<arg choice="opt"><group choice="plain"><arg choice="plain">−h0</arg>
|
|
<arg choice="plain">−h<replaceable>hashsz</replaceable></arg>
|
|
</group></arg>
|
|
<arg choice="opt">−e<replaceable>hufname</replaceable></arg>
|
|
<arg choice="opt">−p<replaceable>dotcnt</replaceable></arg>
|
|
<arg choice="plain"><replaceable>file</replaceable></arg>
|
|
</cmdsynopsis>
|
|
</refsynopsisdiv>
|
|
<refsect1>
|
|
<title>DESCRIPTION</title>
|
|
<para><command>dtsrload</command> loads document header information and, in
|
|
AusText type databases, documents themselves into a DtSearch database.
|
|
The input is a file of one or more documents in a simple canonical
|
|
format (fzk file). An fzk file can be generated by
|
|
<command>dtsrhan</command> manually with a text editor, or by a special
|
|
application program created for the purpose. Typically the same fzk file
|
|
is used for <command>dtsrload</command> and
|
|
<command>dtsrindex</command>, but it is not required and there are
|
|
situations where it may not be desirable. (See
|
|
&cdeman.dtsrfzkfiles; for information about DtSearch fzk files).
|
|
</para>
|
|
<para><command>dtsrload</command> also maintains the current total document
|
|
count in the database's configuration and status record.
|
|
</para>
|
|
<para>If a document's unique key in the fzk file does not preexist in the
|
|
database, <command>dtsrload</command> considers the document to be new
|
|
and does not add it as a new document. If the document's key already
|
|
exists in the database, <command>dtsrload</command> totally replaces its
|
|
record with the one in the fzk file. When duplicate record ids are
|
|
encountered in a single fzk file, only the first occurrence of the
|
|
document is loaded into the database, the second one is discarded.
|
|
Duplicate record ids are maintained during execution with a hash table.
|
|
</para>
|
|
<para><command>dtsrload</command> also performs a data compression function for
|
|
documents that are actually stored in a database repository (that is,
|
|
AusText type databases). In order to do this an encode
|
|
compression huf file must be available.
|
|
(See &cdeman.huffcode; for information about DtSearch document compression.)
|
|
</para>
|
|
<para><command>dtsrload</command> also performs a data compression function for
|
|
documents that are actually stored in a database repository (that is,
|
|
AusText type databases). In order to do this an encode
|
|
compression huf file must be available.
|
|
(See &cdeman.huffcode; for information about DtSearch document compression.)
|
|
</para>
|
|
<para><command>dtsrload</command> does not index the words used to access the
|
|
database. This is done by <command>dtsrindex</command>. To prevent
|
|
database link corruption, execute <command>dtsrindex</command>
|
|
immediately after <command>dtsrload</command>.
|
|
</para>
|
|
<caution>
|
|
<para>To prevent database corruption, execute <command>dtsrload</command> only
|
|
after all users of a preexisting database have exited their search
|
|
programs to prevent database corruption. For a single fzk file,
|
|
<command>dtsrload</command> must be executed immediately before
|
|
<command>dtsrindex</command> so that <command>dtsrindex</command> can
|
|
map the words it indexes to the correct internal database addresses.
|
|
Only after both programs successfully complete execution may users again
|
|
be allowed to perform online searches of the database.
|
|
</para>
|
|
</caution>
|
|
</refsect1>
|
|
<refsect1>
|
|
<title>OPTIONS</title>
|
|
<para>The following options are available:</para>
|
|
<note>
|
|
<para>If an option takes a value, the value must be directly appended to
|
|
the option name without white space.</para>
|
|
</note>
|
|
<variablelist>
|
|
<varlistentry><term><literal>−d</literal><Symbol Role="Variable">dbname</Symbol></term>
|
|
<listitem>
|
|
<para>Specifies the 1 to 8 ASCII character name of the database to be
|
|
updated.
|
|
If an optional directory path is not prepended to the database
|
|
name, <command>dtsrload</command> will attempt to open the database from
|
|
the current working directory. File name extensions for database
|
|
files are automatically appended.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry><term><literal>−c</literal></term>
|
|
<listitem>
|
|
<para>Instructs <command>dtsrload</command> to initialize the database total
|
|
document count by counting existing records before loading the current
|
|
batch. This option is usually not required.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry><term><literal>−t</literal><Symbol Role="Variable">etxstr</Symbol></term>
|
|
<listitem>
|
|
<para>Specifies the end of document text delimiter string. The default
|
|
document separator in an fzk file is an ASCII form feed character
|
|
followed by an ASCII line feed ('\f\n'). For certain multibyte languages
|
|
it may be more convenient to specify a nonASCII string as the document
|
|
delimiter.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry><term><literal>−h0</literal></term>
|
|
<listitem>
|
|
<para>Instructs <command>dtsrload</command> to not check for duplicate
|
|
record ids. This option should not be specified unless it
|
|
is certain that there are no duplicate ids in the fzk file.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry><term><literal>−h</literal><Symbol Role="Variable">hashsz</Symbol></term>
|
|
<listitem>
|
|
<para>Sets the duplicate record id hash table size to
|
|
<Symbol Role="Variable">hashsz</Symbol>. The default is 3000.
|
|
<command>dtsrload</command> will execute more efficiently if the
|
|
specified table size is larger than the number of documents in the fzk
|
|
file.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry><term><literal>−e</literal><Symbol Role="Variable">hufname</Symbol></term>
|
|
<listitem>
|
|
<para>Sets the compression encode file name to
|
|
<Symbol Role="Variable">hufname</Symbol>. The default is
|
|
<filename>ophuf.huf</filename>. The file name can include a path prefix.
|
|
This option is ignored unless the database type is AusText.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry><term><literal>−p</literal><Symbol Role="Variable">dotcount</Symbol></term>
|
|
<listitem>
|
|
<para>Instructs <command>dtsrload</command> to print a progress character to
|
|
stdout for every <Symbol Role="Variable">dotcount</Symbol> documents
|
|
processed. The default is 20.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
</variablelist>
|
|
</refsect1>
|
|
<refsect1>
|
|
<title>OPERANDS</title>
|
|
<para>The required input file name (<Symbol Role="Variable">file</Symbol>)
|
|
identifies the file to be processed by <command>dtsrload</command>. It
|
|
can optionally include a path prefix, either from root or relative to
|
|
the current working directory. If a file name extension is not
|
|
specified, <command>dtsrload</command> assumes a default extension of
|
|
<Filename>.fzk</Filename>.
|
|
</para>
|
|
</refsect1>
|
|
<refsect1>
|
|
<title>ENVIRONMENT VARIABLES</title>
|
|
<para>None.</para>
|
|
</refsect1>
|
|
<refsect1>
|
|
<title>RESOURCES</title>
|
|
<para>None.</para>
|
|
</refsect1>
|
|
<refsect1>
|
|
<title>ACTIONS/MESSAGES</title>
|
|
<para>None.</para>
|
|
</refsect1>
|
|
<refsect1>
|
|
<title>RETURN VALUES</title>
|
|
<para>The return values are as follows:</para>
|
|
<variablelist>
|
|
<varlistentry><term>0</term>
|
|
<listitem>
|
|
<para><command>dtsrload</command> completed successfully.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry><term>1</term>
|
|
<listitem>
|
|
<para><command>dtsrload</command> successfully
|
|
recovered from an error. This occurs when one or more
|
|
documents were discarded because of a partially invalid
|
|
fzk file format, duplicate record ids, or empty record text.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry><term>>1</term>
|
|
<listitem>
|
|
<para><command>dtsrload</command> encountered a fatal error.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
</variablelist>
|
|
</refsect1>
|
|
<refsect1>
|
|
<title>FILES</title>
|
|
<para><command>dtsrload</command> reads the specified fzk file and opens
|
|
all the database and related language files for the specified
|
|
database name.
|
|
</para>
|
|
<para>For AusText type databases, it also reads the compression encode file
|
|
<filename>ophuf.huf</filename>.
|
|
</para>
|
|
<para><command>dtsrload</command> updates the following database files:
|
|
</para>
|
|
<simplelist>
|
|
<member><symbol role="Variable">dbname</symbol>.d00</member>
|
|
<member><symbol role="Variable">dbname</symbol>.d01</member>
|
|
<member><symbol role="Variable">dbname</symbol>.k00</member>
|
|
<member><symbol role="Variable">dbname</symbol>.k01</member>
|
|
</simplelist>
|
|
</refsect1>
|
|
<refsect1>
|
|
<title>EXAMPLES</title>
|
|
<para>Load database <filename>mydb</filename> with the documents specified in
|
|
the fzk file named <filename>batch1.fzk</filename> in the current
|
|
working directory.
|
|
</para>
|
|
<programlisting>
|
|
dtsrload -dmydb batch1
|
|
</programlisting>
|
|
<para>Load database <filename>mydb</filename> with the documents specified in
|
|
the fzk file <filename>/u/dtsearch/jpndocs.1</filename>. Three ASCII
|
|
plus signs at the bottom of each document signals the end of document
|
|
text and the beginning of the next fzk file record.
|
|
</para>
|
|
<programlisting>
|
|
dtsrload -dmydb -t+++ /u/dtsearch/jpndocs.1
|
|
</programlisting>
|
|
</refsect1>
|
|
<refsect1>
|
|
<title>SEE ALSO</title>
|
|
<para>&cdeman.dtsrhan;,
|
|
&cdeman.dtsrindex;,
|
|
&cdeman.huffcode;,
|
|
&cdeman.dtsrfzkfiles;,
|
|
&cdeman.DtSearch;
|
|
</para>
|
|
</refsect1>
|
|
</refentry>
|