cdesktop/cde/doc/C/guides/man/man1_dt/srload.sgm

<!-- $XConsortium: srload.sgm /main/7 1996/09/08 19:57:05 rws $ -->
<!-- (c) Copyright 1996 Digital Equipment Corporation. -->
<!-- (c) Copyright 1996 Hewlett-Packard Company. -->
<!-- (c) Copyright 1996 International Business Machines Corp. -->
<!-- (c) Copyright 1996 Sun Microsystems, Inc. -->
<!-- (c) Copyright 1996 Novell, Inc. -->
<!-- (c) Copyright 1996 FUJITSU LIMITED. -->
<!-- (c) Copyright 1996 Hitachi. -->
<![%CDE.C.CDE; [<refentry id="CDE.SEARCH.dtsrload">]]>
<refmeta><refentrytitle>dtsrload</refentrytitle><manvolnum>user cmd</manvolnum>
</refmeta>
<refnamediv><refname><command>dtsrload</command></refname><refpurpose>Load
document objects in a database</refpurpose></refnamediv>
<refsynopsisdiv>
<cmdsynopsis>
<command>dtsrload</command>
<arg choice="plain">&minus;d<replaceable>dbname</replaceable></arg>
<arg choice="opt">&minus;c</arg>
<arg choice="opt">&minus;t<replaceable>etxstr</replaceable></arg>
<arg choice="opt"><group choice="plain"><arg choice="plain">&minus;h0</arg>
<arg choice="plain">&minus;h<replaceable>hashsz</replaceable></arg>
</group></arg>
<arg choice="opt">&minus;e<replaceable>hufname</replaceable></arg>
<arg choice="opt">&minus;p<replaceable>dotcnt</replaceable></arg>
<arg choice="plain"><replaceable>file</replaceable></arg>
</cmdsynopsis>
</refsynopsisdiv>
<refsect1>
<title>DESCRIPTION</title>
<para><command>dtsrload</command> loads document header information and, in
AusText type databases, documents themselves into a DtSearch database.
The input is a file of one or more documents in a simple canonical
format (fzk file). An fzk file can be generated by
<command>dtsrhan</command> manually with a text editor, or by a special
application program created for the purpose. Typically the same fzk file
is used for <command>dtsrload</command> and
<command>dtsrindex</command>, but it is not required and there are
situations where it may not be desirable. (See
&cdeman.dtsrfzkfiles; for information about DtSearch fzk files).
</para>
<para><command>dtsrload</command> also maintains the current total document
count in the database's configuration and status record.
</para>
<para>If a document's unique key in the fzk file does not preexist in the
database, <command>dtsrload</command> considers the document to be new
and does not add it as a new document. If the document's key already
exists in the database, <command>dtsrload</command> totally replaces its
record with the one in the fzk file. When duplicate record ids are
encountered in a single fzk file, only the first occurrence of the
document is loaded into the database, the second one is discarded.
Duplicate record ids are maintained during execution with a hash table.
</para>
<para><command>dtsrload</command> also performs a data compression function for
documents that are actually stored in a database repository (that is,
AusText type databases). In order to do this an encode
compression huf file must be available.
(See &cdeman.huffcode; for information about DtSearch document compression.)
</para>
<para><command>dtsrload</command> also performs a data compression function for
documents that are actually stored in a database repository (that is,
AusText type databases). In order to do this an encode
compression huf file must be available.
(See &cdeman.huffcode; for information about DtSearch document compression.)
</para>
<para><command>dtsrload</command> does not index the words used to access the
database. This is done by <command>dtsrindex</command>. To prevent
database link corruption, execute <command>dtsrindex</command>
immediately after <command>dtsrload</command>.
</para>
<caution>
<para>To prevent database corruption, execute <command>dtsrload</command> only
after all users of a preexisting database have exited their search
programs to prevent database corruption. For a single fzk file,
<command>dtsrload</command> must be executed immediately before
<command>dtsrindex</command> so that <command>dtsrindex</command> can
map the words it indexes to the correct internal database addresses.
Only after both programs successfully complete execution may users again
be allowed to perform online searches of the database.
</para>
</caution>
</refsect1>
<refsect1>
<title>OPTIONS</title>
<para>The following options are available:</para>
<note>
<para>If an option takes a value, the value must be directly appended to
the option name without white space.</para>
</note>
<variablelist>
<varlistentry><term><literal>&minus;d</literal><Symbol Role="Variable">dbname</Symbol></term>
<listitem>
<para>Specifies the 1 to 8 ASCII character name of the database to be
updated.
If an optional directory path is not prepended to the database
name, <command>dtsrload</command> will attempt to open the database from
the current working directory. File name extensions for database
files are automatically appended.
</para>
</listitem>
</varlistentry>
<varlistentry><term><literal>&minus;c</literal></term>
<listitem>
<para>Instructs <command>dtsrload</command> to initialize the database total
document count by counting existing records before loading the current
batch. This option is usually not required.
</para>
</listitem>
</varlistentry>
<varlistentry><term><literal>&minus;t</literal><Symbol Role="Variable">etxstr</Symbol></term>
<listitem>
<para>Specifies the end of document text delimiter string. The default
document separator in an fzk file is an ASCII form feed character
followed by an ASCII line feed ('\f\n'). For certain multibyte languages
it may be more convenient to specify a nonASCII string as the document
delimiter.
</para>
</listitem>
</varlistentry>
<varlistentry><term><literal>&minus;h0</literal></term>
<listitem>
<para>Instructs <command>dtsrload</command> to not check for duplicate
record ids. This option should not be specified unless it
is certain that there are no duplicate ids in the fzk file.
</para>
</listitem>
</varlistentry>
<varlistentry><term><literal>&minus;h</literal><Symbol Role="Variable">hashsz</Symbol></term>
<listitem>
<para>Sets the duplicate record id hash table size to
<Symbol Role="Variable">hashsz</Symbol>. The default is 3000.
<command>dtsrload</command> will execute more efficiently if the
specified table size is larger than the number of documents in the fzk
file.
</para>
</listitem>
</varlistentry>
<varlistentry><term><literal>&minus;e</literal><Symbol Role="Variable">hufname</Symbol></term>
<listitem>
<para>Sets the compression encode file name to
<Symbol Role="Variable">hufname</Symbol>. The default is
<filename>ophuf.huf</filename>. The file name can include a path prefix.
This option is ignored unless the database type is AusText.
</para>
</listitem>
</varlistentry>
<varlistentry><term><literal>&minus;p</literal><Symbol Role="Variable">dotcount</Symbol></term>
<listitem>
<para>Instructs <command>dtsrload</command> to print a progress character to
stdout for every <Symbol Role="Variable">dotcount</Symbol> documents
processed. The default is 20.
</para>
</listitem>
</varlistentry>
</variablelist>
</refsect1>
<refsect1>
<title>OPERANDS</title>
<para>The required input file name (<Symbol Role="Variable">file</Symbol>)
identifies the file to be processed by <command>dtsrload</command>. It
can optionally include a path prefix, either from root or relative to
the current working directory. If a file name extension is not
specified, <command>dtsrload</command> assumes a default extension of
<Filename>.fzk</Filename>.
</para>
</refsect1>
<refsect1>
<title>ENVIRONMENT VARIABLES</title>
<para>None.</para>
</refsect1>
<refsect1>
<title>RESOURCES</title>
<para>None.</para>
</refsect1>
<refsect1>
<title>ACTIONS/MESSAGES</title>
<para>None.</para>
</refsect1>
<refsect1>
<title>RETURN VALUES</title>
<para>The return values are as follows:</para>
<variablelist>
<varlistentry><term>0</term>
<listitem>
<para><command>dtsrload</command> completed successfully.</para>
</listitem>
</varlistentry>
<varlistentry><term>1</term>
<listitem>
<para><command>dtsrload</command> successfully
recovered from an error. This occurs when one or more
documents were discarded because of a partially invalid
fzk file format, duplicate record ids, or empty record text.
</para>
</listitem>
</varlistentry>
<varlistentry><term>>1</term>
<listitem>
<para><command>dtsrload</command> encountered a fatal error.
</para>
</listitem>
</varlistentry>
</variablelist>
</refsect1>
<refsect1>
<title>FILES</title>
<para><command>dtsrload</command> reads the specified fzk file and opens
all the database and related language files for the specified
database name.
</para>
<para>For AusText type databases, it also reads the compression encode file
<filename>ophuf.huf</filename>.
</para>
<para><command>dtsrload</command> updates the following database files:
</para>
<simplelist>
<member><symbol role="Variable">dbname</symbol>.d00</member>
<member><symbol role="Variable">dbname</symbol>.d01</member>
<member><symbol role="Variable">dbname</symbol>.k00</member>
<member><symbol role="Variable">dbname</symbol>.k01</member>
</simplelist>
</refsect1>
<refsect1>
<title>EXAMPLES</title>
<para>Load database <filename>mydb</filename> with the documents specified in
the fzk file named <filename>batch1.fzk</filename> in the current
working directory.
</para>
<programlisting>
dtsrload -dmydb batch1
</programlisting>
<para>Load database <filename>mydb</filename> with the documents specified in
the fzk file <filename>/u/dtsearch/jpndocs.1</filename>. Three ASCII
plus signs at the bottom of each document signals the end of document
text and the beginning of the next fzk file record.
</para>
<programlisting>
dtsrload -dmydb -t+++ /u/dtsearch/jpndocs.1
</programlisting>
</refsect1>
<refsect1>
<title>SEE ALSO</title>
<para>&cdeman.dtsrhan;,
&cdeman.dtsrindex;,
&cdeman.huffcode;,
&cdeman.dtsrfzkfiles;,
&cdeman.DtSearch;
</para>
</refsect1>
</refentry>