Initial import of the CDE 2.1.30 sources from the Open Group.

This commit is contained in:
Peter Howkins
2012-03-10 18:21:40 +00:00
commit 83b6996daa
18978 changed files with 3945623 additions and 0 deletions

View File

@@ -0,0 +1,405 @@
<!-- $XConsortium: archform.htm /main/1 1996/09/22 18:14:21 rws $ -->
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML Strict//EN">
<HTML>
<HEAD>
<TITLE>Architectural Form Processing</TITLE>
</HEAD>
<BODY>
<H1>Architectural Form Processing</H1>
<P>
The Hytime standard (ISO/IEC 10744) introduced the concept of
architectural forms. This document assumes you are already familiar
with this concept. The first Technical Corrigendum to HyTime, which is
soon to be published, generalizes this, and makes it possible to have
an <I>architecture engine</I> which can perform architectural form
processing for arbitrary architectures. SP now includes such an
architecture engine.
<P>
Non-markup sensitive applications built using SP now support
architectural form processing using the <SAMP>-A
<VAR>archname</VAR></SAMP> option. When this option is specified, the
document will be validated against all declared base architectures,
and the output will be for the architectural document for that
architecture: the element types, notations and attributes will be
those defined in the meta-DTD.
<P>
This option is experimental and has not been subject to much testing.
Please be sure to report any bugs or problems you encounter.
<P>
Although spam does not support the <SAMP>-A</SAMP> option because it
works with the markup of your document, sgmlnorm does.
<H2>Architectural Support Attributes</H2>
<P>
To use the <SAMP>-A</SAMP> option with a document, you must add
<UL>
<LI>
an architecture base declaration for <SAMP><VAR>archname</VAR></SAMP>,
<LI>
a notation declaration and associated attribute definition list
declaration for <SAMP><VAR>archname</VAR></SAMP>;
this is called the <I>architecture notation declaration</I>.
</UL>
<P>
An architecture base declaration is a processing instruction of the form:
<PRE>
&lt;?ArcBase <VAR>archname</VAR>&gt;
</PRE>
<P>
The processing instruction is recognized either in the DTD or in an
active LPD.
<P>
The architecture notation declaration and associated attribute
definition list declaration serve to declare a number of architectural
support attributes which control the architecture engine. The value
for each architecture support attribute is taken from the default
value, if any, specified for that attribute in the attribute
definition list declaration. It is an error to declare an
architecture support attribute as <SAMP>#REQUIRED</SAMP>.
<P>
The following architectural support attributes are recognized:
<DL>
<DT>
<SAMP>ArcDTD</SAMP>
<DD>
The name of an external entity that contains the meta-DTD.
This attribute is required.
If the name starts with the PERO delimiter <SAMP>%</SAMP>,
the entity is a parameter entity,
otherwise it is a general entity.
<DT>
<SAMP>ArcQuant</SAMP>
<DD>
A list of tokens that looks like what follows <SAMP>QUANTITY SGMLREF</SAMP>
in the quantity set section of an SGML declaration.
The quantities used for parsing the meta-DTD
and validating the architectural document
will be the maximum of the quantities in the document's concrete syntax
and the quantities specified here.
<DT>
<SAMP>ArcDocF</SAMP>
<DD>
The name of the document element type in the meta-DTD.
This would be <SAMP>HyDoc</SAMP> for HyTime.
This defaults to <SAMP><VAR>archname</VAR></SAMP>.
<DT>
<SAMP>ArcFormA</SAMP>
<DD>
The name of the attribute that elements use to specify the
corresponding element type, if any, in the meta-DTD.
Data entities also use this attribute to specify the corresponding
notation in the meta-DTD.
This would be <SAMP>HyTime</SAMP> for HyTime.
This defaults to <SAMP><VAR>archname</VAR></SAMP>.
<DT>
<SAMP>ArcNamrA</SAMP>
<DD>
The name of the attribute that elements use to specify substitutes for
the names of attributes in the meta-DTD. A value of
<SAMP>#DEFAULT</SAMP> is allowed for a substitute name; this inhibits
mapping of an attribute to an architectural attribute, but specifies
that the value of the architectural attribute should be defaulted
rather than taken from the value of another attribute in the document.
For HyTime the value of this attribute would be <SAMP>HyNames</SAMP>.
By default no attribute name substitutition is done.
<DT>
<SAMP>ArcSuprA</SAMP>
<DD>
The name of an attribute that elements may use to suppress processing
of their descendants. This attribute is not recognized for data
entities. The value of the attribute must be one of the following
tokens:
<DL>
<DT>
<SAMP>sArcAll</SAMP>
<DD>
Completely suppress all architectural processing of descendants.
It is not possible to restore architectural processing
for a descendant.
<DT>
<SAMP>sArcForm</SAMP>
<DD>
Suppress processing of the <SAMP>ArcFormA</SAMP> attribute of all
descendants of this element, except for those elements that have a
non-implied <SAMP>ArcSuprA</SAMP> attribute.
<DT>
<SAMP>sArcNone</SAMP>
<DD>
Don't suppress architectural processing for the descendants of
this element.
</DL>
<P>
The value may also be implied, in which case the state of
architectural processing is inherited.
<P>
If an element has an ArcSuprA attribute that was processed, its
ArcFormA attribute will always be processed. Otherwise its ArcFormA
attribute will be processed unless its closest ancestor that has a
non-implied value for the ArcSuprA attribute suppressed processing of
the ArcFormA attribute. An element whose ArcFormA attribute is
processed will not be treated as architectural if it has an implied
value for the ArcFormA attribute.
<DT>
<SAMP>ArcSuprF</SAMP>
<DD>
The name of the element type in the meta-DTD that suppresses
architectural processing in the same manner as does the
<SAMP>sHyTime</SAMP> form in HyTime. By default, no element type
does. This behaves like an element with an
<SAMP>ArcSuprA</SAMP> attribute of <SAMP>sArcForm</SAMP>. The element
type should be declared in the meta-DTD. You should not specify a
value for this attribute if you specified a value for the
<SAMP>ArcSuprA</SAMP> attribute.
<P>
This is a non-standardized extension.
<DT>
<SAMP>ArcIgnDA</SAMP>
<DD>
The name of an attribute that elements may use to control whether
data is ignored.
The value of the attribute must be one of the following values:
<DL>
<DT>
<SAMP>nArcIgnD</SAMP>
<DD>
Data is not ignored.
It is an error if data occurs where not allowed by the meta-DTD.
<DT>
<SAMP>cArcIgnD</SAMP>
<DD>
Data is conditionally ignored.
Data will be ignored only when it occurs where the meta-DTD
does not allow it.
<DT>
<SAMP>ArcIgnD</SAMP>
<DD>
Data is always ignored.
</DL>
<P>
The value may also be implied, in which case the state of
architectural processing is inherited.
If no the document element has no value specified,
<SAMP>cArcIgnD</SAMP> will be used.
<DT>
<SAMP>ArcBridF</SAMP>
<DD>
The name of a default element type declared in a meta-DTD,
to which elements in the document should be automatically mapped
if they have an ID and would not otherwise be considered
architectural.
This would be <SAMP>HyBrid</SAMP> for HyTime.
If your meta-DTD declares IDREF attributes, it will
usually be appropriate to specify a value for
<SAMP>ArcBridF</SAMP>, and to declare an ID attribute
for that form in your meta-DTD.
<DT>
<SAMP>ArcDataF</SAMP>
<DD>
The name of a default notation declared in the meta-DTD,
to which the external data entities in the document
should be automatically mapped if they would
not otherwise be considered architectural.
If this attribute is defined,
then general entities will be automatically architectural:
any external data entity whose notation cannot otherwise be mapped
into a notation in the meta-DTD will be automatically treated
as an instance of the <SAMP>ArcDataF</SAMP> notation.
This would be <SAMP>data</SAMP> for HyTime.
If your meta-DTD declares entity attributes, it will usually
be appropriate to specify a value for <SAMP>ArcDataF</SAMP>
even if your meta-DTD declares no data attributes for the
notation.
<DT>
<SAMP>ArcAuto</SAMP>
<DD>
This must have one of the following values:
<DL>
<DT>
<SAMP>ArcAuto</SAMP>
<DD>
If an element does not have an <SAMP>ArcFormA</SAMP> attribute and the
meta-DTD defines an element type with the same name as the element's
type, the element will be automatically treated as being an instance
of the meta-type. This rule does not apply to the
document element type; this is automatically treated as being an
instance of the meta-DTD's document element type.
Note that this automatic mapping is prevented if
the element has an <SAMP>ArcFormA</SAMP> attribute with an implied
value. It is also prevented if processing of the
<SAMP>ArcFormA</SAMP> attribute is suppressed. This applies equally
to the notations of external data entities.
The default element or notation specified with the
<SAMP>ArcBridF</SAMP> or <SAMP>ArcDfltN</SAMP> attribute
is only considered after the mapping specified by <SAMP>ArcAuto</SAMP>.
<DT>
<SAMP>nArcAuto</SAMP>
<DD>
Automatic mapping is not performed.
</DL>
<P>
The default value is <SAMP>ArcAuto</SAMP>.
<DT>
<SAMP>ArcOptSA</SAMP>
<DD>
A list of names of architectural support attributes,
each of which is interpreted as a list of parameter entities
to be defined with a replacement text of <SAMP>INCLUDE</SAMP>
when parsing the meta-DTD.
The default value is <SAMP>ArcOpt</SAMP>.
</DL>
<H2>Meta-DTDs</H2>
<P>
A meta-DTD is allowed to use the following extensions:
<UL>
<LI>
a single element type or notation is allowed to be an associated
element type or associated notation name for multiple attribute
definition lists.
<LI>
<SAMP>#ALL</SAMP> can be used as an associated element type
or associated notation name in an attribute definition list
to define attributes for all element types or notations
in the meta-DTD
</UL>
<P>
Before any of these extensions can be used, the meta-DTD must include a
declaration
<PRE>
&lt;!AFDR "ISO/IEC 10744:1992"&gt;
</PRE>
<P>
This declaration should only be included if the extensions are used.
<P>
In all other respects a meta-DTD must be a valid SGML DTD.
<P>
A declared value of ENTITY for an attribute in a meta-DTD means that
the value of the attribute must be an entity declared in
the (non-meta) DTD that is architectural.
An external data entity is architectural only if its notation can be
mapped into a notation in the meta-DTD.
All other kinds of data entities and subdoc entities are automatically
architectural.
<P>
An IDREF attribute in the meta-document must have a corresponding ID
in the meta-document. An attribute with a declared value of ID in the
document will be automatically mapped to an attribute with a declared
value of ID in the meta-DTD.
<P>
A declared value of NOTATION in the meta-DTD means that the value of
the attribute must have one the values specified in the name group and
that it must be a notation in the meta-DTD.
(Perhaps if the attribute also has a declared value of NOTATION
in the non-meta-DTD, the value should be mapped in a similar
way to the notation of an external data entity.)
<H2>Differences from HyTime</H2>
<P>
There are a number of differences from how architectural processing is
defined in the pre-Corringendum version of the HyTime standard.
<UL>
<LI>
The <SAMP>ArcNamrA</SAMP> and <SAMP>ArcFormA</SAMP> attributes are not
part of the meta-DTD. Rather they are used by the architecture engine
in deriving the meta-document that is validated against the meta-DTD.
<LI>
The <SAMP>use:</SAMP> conventional comment is not recognized. Instead
a single element type is allowed to be an associated element type for
multiple attribute definition lists.
<LI>
The notation and data attributes of an external data entity are
treated just like the element type and attributes of an element. The
notation of an external data entity is mapped into a notation in the
meta-DTD and the data attributes of the entity are mapped onto
attributes defined for the meta-DTD notation.
<LI>
<SAMP>#FIXED</SAMP> has the same meaning in a meta-DTD that it does in
a regular DTD: the value of the attribute must be the same as the
default value of the attribute specified in the meta-DTD.
</UL>
<H2>Specifying architectural processing with an LPD</H2>
<P>
Link attributes defined by an implicit link process are treated in the
same way as non-link attributes. The only complication is that SGML
allows link attributes to have the same name as non-link attributes.
If there is a link attribute and a non-link attribute with the same
name, the architecture engine will only look at the link attribute,
even if the value of the link attribute is implied. The only
exception is the <SAMP>ArcNamrA</SAMP> attribute: the architecture
engine will use both the link attribute and the non-link attribute,
but the substitute names in the value of the non-link attribute cannot
refer to link attribute names.
<P>
The <SAMP>-A <VAR>archname</VAR></SAMP> option automatically activates
any link type <SAMP><VAR>archname</VAR></SAMP>.
<P>
The architecture notation declaration and associated attribute
definition list declaration are allowed in the LPD. Although the
productions of ISO 8879 do not allow a notation declaration in a link
type declaration subset, it is clearly the intent of the standard that
they be allowed. You can use a <SAMP>-wlpd-notation</SAMP> option to
disallow them.
<H2>Notation set architecture</H2>
<P>
An architecture for which <VAR>archname</VAR> is declared
as a notation with a public identifier of
<PRE>
"ISO/IEC 10744//NOTATION AFDR ARCBASE
Notation Set Architecture Definition Document//EN"
</PRE>
<P>
is special. The element types in the meta-DTD for this architecture
are the notations of the document DTD and the attributes defined for
the element types in the meta-DTD are the data attributes defined for
the notations in the document DTD. For each element, the attribute
with a declared value of NOTATION performs the function that the
ArcFormA attribute performs for normal architectures. Only the
<SAMP>ArcNamrA</SAMP> and <SAMP>ArcSuprA</SAMP> architectural support
attributes can be used with this architecture.
<P>
The notation set architecture can also be declared using
an architecture base declaration of the form:
<PRE>
&lt;?ArcBase #NOTATION&gt;
</PRE>
<P>
In this case, no architecture support attributes can be declared;
<SAMP>ArcNamrA</SAMP> will be defaulted to <SAMP>notnames</SAMP>,
and <SAMP>ArcSuprA</SAMP> to <SAMP>notsupr</SAMP>.
<H2>Derived architectures</H2>
<P>
A meta-DTD can have one or more base architectures in the same way as
a normal DTD. Multiple <SAMP>-A</SAMP> options can be used to exploit
this. For example,
<PRE>
-A <VAR>arch1</VAR> -A <VAR>arch2</VAR>
</PRE>
<P>
will perform architectural processing on the source document to
produce an architectural document conforming to the architecture
<SAMP><VAR>arch1</VAR></SAMP> declared in the source document, and
will then perform architectural processing on this architectural
document to produce an architectural document conforming to the
<SAMP><VAR>arch2</VAR></SAMP> architecture declared in
<SAMP><VAR>arch1</VAR></SAMP>'s meta-DTD.
<P>
A document that is validated against a meta-DTD will automatically
be validated against any base architectures of that meta-DTD.
<H2>Not implemented</H2>
<P>
The following features in the current AFDR draft are not implemented:
<UL>
<LI>
<SAMP>ArcIndr</SAMP> architectural support attribute with value
other than <SAMP>nArcIndr</SAMP>.
</UL>
<P>
<ADDRESS>
James Clark<BR>
jjc@jclark.com
</ADDRESS>
</BODY>
</HTML>

View File

@@ -0,0 +1,104 @@
<!-- $XConsortium: build.htm /main/1 1996/09/22 18:14:41 rws $ -->
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML Strict//EN">
<HTML>
<HEAD>
<TITLE>Building SP</TITLE>
</HEAD>
<BODY>
<H1>Building SP</H1>
<P>
You will need a C++ compiler with good template support to build this.
Support for exceptions is not required.
<P>
In most cases you should be able to port to a new compiler just by
editing <code>include/config.h</code>.
<H2>Unix</H2>
<P>
To build on Unix, edit the Makefile, and do a make. You can also
build in a different directory. This requires GNU make or another
make that implements VPATH. Copy or link the top-level Makefile to
the build directory, change srcdir in the Makefile to point to the
original directory, and do a make in the build directory.
<P>
<SAMP>make check</SAMP> runs some tests. You shouldn't get any reports
of differences.
<P>
<SAMP>make install</SAMP> installs the programs; `make install-man'
installs the man pages.
<P>
You can use the following compilers:
<DL>
<DT>
gcc
<DD>
gcc 2.7.2 works (gcc 2.7.0 won't work at least on the sparc). You
will also an iostream library (eg as provided by libg++ 2.7). This
distribution builds on Solaris 2.3 and on Linux 1.2. I expect it will
build on SunOS 4 as well with little difficulty.
<P>
With gcc 2.6.3/SunOS 4, you'll need to compile with
<CODE>-Dsig_atomic_t=int</CODE>, and, if you want to compile with
-DSP_HAVE_SOCKET, you'll need to make netdb.h and arpa/inet.h C++
compatible.
<DT>
Sun C++
<DD>
To compile with Sun C++ 4.0.1, run first sunfix.sh. Also in the
top-level Makefile, change set libMakefile to Makefile.lib.sun.
This makes the library build use the -xar option.
</DL>
<P>
Nelson Beebe has ported SP to a variety of other Unix systems and has
produced some <A
HREF="http://www.math.utah.edu/~beebe/sp-notes-1.0.1.html">notes</A>
about his experiences.
<H2>DOS/Windows</H2>
<P>
You must use a compiler that generates 32-bit code.
<H3></H3>
<P>
The following compilers have been tested:
<DL>
<DT>
Visual C++ 4.1
<DD>
Open SP.mak as a Makefile in the Developer Studio and build whatever
you want.
Don't use <SAMP>Batch Build</SAMP> or <SAMP>Rebuild All</SAMP>: these
rebuild the library repeatedly.
You can build all the targets in a particular configuration by
building the all target.
The <SAMP>sp-generate.mak</SAMP> makefile can be used to make
all the .cxx and .h files that are automatically generated.
(These are included in the distribution, so you don't need to do this
unless you want to modify SP.)
<P>
To create a new program, make a new project in the SP project
workspace using the <SAMP>Build&gt;Subprojects</SAMP> command, and
include <SAMP>lib</SAMP> and maybe <SAMP>generic</SAMP> as
subprojects. You may also want to add your project as a subproject to
<SAMP>all</SAMP>.
Then, in <SAMP>Build&gt;Settings</SAMP> under the <SAMP>C/C++</SAMP>
tab in the <SAMP>Preprocessor</SAMP> category, copy the
<SAMP>Preprocessor definitions</SAMP> and <SAMP>Additional include
directories</SAMP> entries from the nsgmls subproject.
In the <SAMP>Code Generation</SAMP> category make sure you've selected
the same run-time library as that used by the corresponding configuration
of <SAMP>lib</SAMP>.
<DT>
Watcom C++ 10.5a
<DD>
Use Makefile.wat.
<P>
You must compile on a platform that supports long filenames.
</DL>
<P>
<ADDRESS>
James Clark<BR>
jjc@jclark.com
</ADDRESS>
</BODY>
</HTML>

View File

@@ -0,0 +1,166 @@
<!-- $XConsortium: catalog.htm /main/1 1996/09/22 18:14:58 rws $ -->
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML Strict//EN">
<HTML>
<HEAD>
<TITLE>SP - Catalogs</TITLE>
</HEAD>
<BODY>
<H1>Catalogs</H1>
<P>
The entity manager generates a system identifier for every external
entity using catalog entry files in the format defined by <A
HREF="http://www.sgmlopen.org/sgml/docs/library/9401.htm">SGML Open
Technical Resolution TR9401:1995</A>. The entity manager will give an
error if it is unable to generate a system identifier for an external
entity. Normally if the external identifier for an entity includes a
system identifier then the entity manager will use that as the
effective system identifier for the entity; this behaviour can be
changed using <CODE>OVERRIDE</CODE> or <CODE>SYSTEM</CODE> entries in
a catalog entry file.
<P>
A catalog entry file contains a sequence of entries in one of the
following forms:
<DL>
<DT>
<SAMP>PUBLIC <VAR>pubid</VAR> <VAR>sysid</VAR></SAMP>
<DD>
This specifies that <SAMP><VAR>sysid</VAR></SAMP> should be used as
the effective system identifier if the public identifier is
<SAMP><VAR>pubid</VAR></SAMP>. <SAMP><VAR>Sysid</VAR></SAMP> is a
system identifier as defined in ISO 8879 and
<SAMP><VAR>pubid</VAR></SAMP> is a public identifier as defined in ISO
8879.
<DT>
<SAMP>ENTITY <VAR>name</VAR> <VAR>sysid</VAR></SAMP>
<DD>
This specifies that <VAR>sysid</VAR> should be used as the effective
system identifier if the entity is a general entity whose name is
<VAR>name</VAR>.
<DT>
<SAMP>ENTITY %<VAR>name</VAR> <VAR>sysid</VAR></SAMP>
<DD>
This specifies that <SAMP><VAR>sysid</VAR></SAMP> should be used as
the effective system identifier if the entity is a parameter entity
whose name is <VAR>name</VAR>. Note that there is no space between
the <SAMP>%</SAMP> and the <SAMP><VAR>name</VAR></SAMP>.
<DT>
<SAMP>DOCTYPE <VAR>name</VAR> <VAR>sysid</VAR></SAMP>
<DD>
This specifies that <SAMP><VAR>sysid</VAR></SAMP> should be used as
the effective system identifier if the entity is an entity declared in
a document type declaration whose document type name is <VAR>name</VAR>.
<DT>
<SAMP>LINKTYPE <VAR>name</VAR> <VAR>sysid</VAR></SAMP>
<DD>
This specifies that <SAMP><VAR>sysid</VAR></SAMP> should be used as the
effective system identifier if the entity is an entity declared in a
link type declaration whose link type name is <VAR>name</VAR>.
<DT>
<SAMP>NOTATION <VAR>name</VAR> <VAR>sysid</VAR></SAMP>
<DD>
This specifies that <SAMP><VAR>sysid</VAR></SAMP> should be used as
the effective system identifier for a notation whose name is
<SAMP><VAR>name</VAR></SAMP>. This is an extension to the SGML Open
format. This is relevant only with the <SAMP>-n</SAMP> option.
<DT>
<SAMP>OVERRIDE <VAR>bool</VAR></SAMP>
<DD>
<SAMP><VAR>bool</VAR></SAMP> may be <SAMP>YES</SAMP> or
<SAMP>NO</SAMP>. This sets the overriding mode for entries up to the
next occurrence of OVERRIDE or the end of the catalog entry file. At
the beginning of a catalog entry file the overriding mode will be NO.
A PUBLIC, ENTITY, DOCTYPE, LINKTYPE or NOTATION entry with an
overriding mode of YES will be used whether or not the external
identifier has an explicit system identifier; those with an overriding
mode of NO will be ignored if external identifier has an explicit
system identifier. This is an extension to the SGML Open format.
<DT>
<SAMP>SYSTEM <VAR>sysid1</VAR> <VAR>sysid2</VAR></SAMP>
<DD>
This specifies that <VAR>sysid2</VAR> should be used as the effective
system identifier if the system identifier specified in the external
identifier was <SAMP><VAR>sysid1</VAR></SAMP>. This is an extension
to the SGML Open format. <VAR>sysid2</VAR> should always be quoted to
ensure that it is not misinterpreted when parsed by a system that does
not support this extension.
<DT>
<A NAME="sgmldecl"><SAMP>SGMLDECL <VAR>sysid</VAR></SAMP></A>
<DD>
This specifies that if the document does not contain an SGML declaration,
the SGML declaration in <SAMP><VAR>sysid</VAR></SAMP> should be implied.
<DT>
<SAMP>DOCUMENT <VAR>sysid</VAR></SAMP>
<DD>
This specifies that the document entity is <SAMP><VAR>sysid</VAR></SAMP>.
This entry is used only with the <SAMP>-C</SAMP> option.
<DT>
<SAMP>CATALOG <VAR>sysid</VAR></SAMP>
<DD>
This specifies that <SAMP><VAR>sysid</VAR></SAMP> is the system
identifier of an additional catalog entry file to be read after this
one. Multiple <SAMP>CATALOG</SAMP> entries are allowed and will be
read in order. This is an extension to the SGML Open format.
<DT>
<SAMP>BASE <VAR>sysid</VAR></SAMP>
<DD>
This specifies that relative storage object identifiers in system
identifiers in the catalog entry file following this entry should be
resolved using first storage object identifier in
<SAMP><VAR>sysid</VAR></SAMP> as the base, instead of the storage
object identifiers of the storage objects comprising the catalog entry
file. This is an extension to the SGML Open format. This extension
is proposed in <A HREF=
"ftp://ftp.internic.net/internet-drafts/draft-ietf-mimesgml-exch-02.txt">Using
SGML Open Catalogs and MIME to Exchange SGML Documents</A>.
Note that the <CODE><VAR>sysid</VAR></CODE> must exist.
<DT>
<SAMP>DELEGATE <VAR>pubid-prefix</VAR> <VAR>sysid</VAR></SAMP>
<DD>
This specifies that entities with a public identifier that has
<SAMP><VAR>pubid-prefix</VAR></SAMP> as a prefix should be resolved
using a catalog whose system identfier is
<SAMP><VAR>sysid</VAR></SAMP>. For more details, see <A
HREF="http://www.entmp.org/fpi-urn/delegate.html">A Proposal for
Delegating SGML Open Catalogs</A>. This is an extension to the SGML
Open format.
</DL>
<P>
The delimiters can be omitted from the <SAMP><VAR>sysid</VAR></SAMP>
provided it does not contain any white space. Comments are allowed
between parameters delimited by <SAMP>--</SAMP> as in SGML.
<P>
The environment variable <SAMP>SGML_CATALOG_FILES</SAMP> contains a
list of catalog entry files. The list is separated by colons under
Unix and by semi-colons under MS-DOS and Windows.. These will be
searched after any catalog entry files specified using the
<SAMP>-m</SAMP> option, and after the catalog entry file called
<SAMP>catalog</SAMP> in the same place as the document entity. If
this environment variable is not set, then a system dependent list of
catalog entry files will be used. In fact catalog entry files are not
restricted to being files: the name of a catalog entry file is
interpreted as a system identifier.
<P>
A match in one catalog entry file will take precedence over any match
in a later catalog entry file. A more specific matching entry in one
catalog entry file will take priority over a less specific matching
entry in the same catalog entry file. For this purpose, the order of
specificity is (most specific first):
<UL>
<LI>
<SAMP>SYSTEM</SAMP> entries;
<LI>
<SAMP>PUBLIC</SAMP> entries;
<LI>
<SAMP>DELEGATE</SAMP> entries ordered by the length of the prefix,
longest first;
<LI>
<SAMP>ENTITY</SAMP>, <SAMP>DOCTYPE</SAMP>, <SAMP>LINKTYPE</SAMP> and
<SAMP>NOTATION</SAMP> entries.
</UL>
<P>
<ADDRESS>
James Clark<BR>
jjc@jclark.com
</ADDRESS>
</BODY>
</HTML>

View File

@@ -0,0 +1,139 @@
<!-- $XConsortium: features.htm /main/1 1996/09/22 18:15:17 rws $ -->
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML Strict//EN">
<HTML>
<HEAD>
<TITLE>SP - Features Summary</TITLE>
<BODY>
<H1>
SP
</H1>
<H3>
A free, object-oriented toolkit for SGML parsing and entity management
</H3>
<H2>
Features summary
</H2>
<UL>
<LI>
Includes nsgmls
<UL>
<LI>
Compatible with sgmls
<LI>
Also generates RAST (ISO/IEC 13673)
</UL>
<LI>
Provides access to all information about SGML document
<UL>
<LI>
Access to DTD and SGML declaration as well as document instance
<LI>
Access to markup as well as abstract document
<LI>
Sufficient to recreate character-for-character identical
copy of any SGML document
</UL>
<LI>
Supports almost all optional SGML features
<UL>
<LI>
Arbitrary concrete syntaxes
<LI>
SHORTTAG, OMITTAG, RANK
<LI>
SUBDOC
<LI>
LINK (SIMPLE, IMPLICIT and EXPLICIT)
<LI>
Only DATATAG and CONCUR not supported
</UL>
<LI>
Sophisticated entity manager
<UL>
<LI>
Supports ISO/IEC 10744 Formal System Identifiers
<LI>
Supports SGML Open catalogs
<LI>
Supports WWW
<LI>
Can be used independently of parser
</UL>
<LI>
Supports multi-byte character sets
<UL>
<LI>
Parser can use 16-bit characters internally
<LI>
16-bit characters can be used in tag names and other markup
<LI>
Supports ISO/IEC 10646 (Unicode) using both UCS-2 and UTF-8
<LI>
Supports Japanese character sets (Shift-JIS, EUC)
</UL>
<LI>
Object-oriented
<LI>
Written in C++ from scratch
<UL>
<LI>
Not a modified version of a parser originally written in C
<LI>
Reentrant
<LI>
Sophisticated architecture
</UL>
<LI>
Fast
<UL>
<LI>
Up to twice as fast as sgmls on large documents
</UL>
<LI>
Portable
<UL>
<LI>
All major Unix variants
<LI>
MS-DOS
<LI>
Win32: Windows 95/Windows NT
<LI>
OS/2
</UL>
<LI>
Production quality
<UL>
<LI>
Version 1.0 recently released, after a year of test releases
<LI>
Tested using several SGML test suites
<LI>
Already used in several new commercial products
<LI>
Written by James Clark, previously responsible for turning arcsgml into sgmls
</UL>
<LI>
Free
<UL>
<LI>
Includes source code
<LI>
No restrictions on commercial use
</UL>
<LI>
Disadvantages
<UL>
<LI>
Programmer-level documentation only for generic API
and not for native API.
</UL>
</UL>
<P>
<ADDRESS>
James Clark<BR>
jjc@jclark.com
</ADDRESS>
</BODY>
</HTML>

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,489 @@
<!-- $XConsortium: ideas.htm /main/1 1996/09/22 18:15:57 rws $ -->
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML Strict//EN">
<HTML>
<HEAD>
<TITLE>Ideas for improving SP</TITLE>
</HEAD>
<BODY>
<H1>Ideas for improving SP</H1>
<H2>
Parser
</H2>
<P>
Have option (fixedDocCharset) in which document charcater set cannot
be changed by SGML declaration; declared document character set used
for character references, and to determine which characters are
non-SGML. Would need separate event for non-SGML character.
In Text would need separate TextItem for non-SGML data.
Disallow non-SGML charcters in internal entities.
<P>
Supporting caching across multiple runs of parser in single
process.
<P>
Make Dtd copiable.
<P>
?Subdoc parser needs character set for system id (should be system
character set).
<P>
Recover better from non-existent documents or subdocuments.
<P>
Think about entity declarations/references in inactive LPDs.
<P>
Don't allow name groups in parameter entity references in document
type specifications in start-/end-tags.
<P>
With link, don't do a pass 2 unless we replace a referenced entity
(what about default entity?).
<P>
Options to warn about things that HTML disallows: marked sections in
instance, explicit subsets.
<P>
Option to warn about MDCs in comments in comment declarations.
<P>
Option to warn about omitted REFC.
<P>
Check that names of added functions are valid names in concrete syntax
(both characters and lengths). Also need to do upper-case
substitution on them?
<P>
Recover from nested doctype declaration intelligently.
<P>
Recover from missing doctype declaration intelligently.
<P>
Could optimize parsing of attribute literals using technique similar
to extendData().
<P>
attributeValueLength error should give actual length of value.
<P>
Recover better from entity reference with name group in literal.
<P>
At start of pass 2 clear everything in pass1LPDs except entity sets.
<P>
Give an error if EXPLICIT > 1 and LPDs don't chain as required by
436:5-7 and 436:18-20.
<P>
Handle quantity errors by reporting at the end of the prolog and the
end of the instance any quantities that need to be increased.
<P>
Make noSuchReservedName error more helpful.
<P>
Function characters should perform their function even when markup
recognition is suppressed. (I think I've handled this.)
<P>
Give a warning for notation attribute that is #CONREF.
<P>
Try to separate out Parser::compileModes().
<P>
In CompiledModelGroup have vector that gives an index for each element type
that occurs in the model group. Then in each leaf content token have a
vector that maps this index to a LeafContentToken *, if there
is a simple transition (no and groups involved) to that element type.
<P>
MatchState::minAndDepth and MatchState::AndInfo should be separated
off info object pointed to from MatchState; pointer would be null for
elements with no AND groups.
<P>
What to do if we encounter USELINK or USEMAP declaration after DTD in
prolog? Should stop prolog and start DTD. If we have SCOPE INSTANCE
then if we get an unknown declaration type in prolog, don't give
error, but unget token and start instance.
<P>
?Have separate version of reportNonSgml() for case where datachar is allowed.
<P>
Implement CONCUR.
<P>
AttributeDefinition constructors should have Owner&lt;DeclaredValue> &,
arguments to avoid storage leaks when exceptions are thrown.
<P>
Create a list like IList but which keeps track of length. Then
combine tagLevel into openElement stack, and inputLevel into
inputStack.
<P>
AttributeDefinition::makeValue should return
ConstResourcePointer&lt;AttributeValue>.
<P>
Syntax member functions should use reference for result.
<P>
Have a LocationKey data structure that can be used to determine the
relative order of locations in possibly different concurrent
instances. Contains: offset in document instance; is it a replacement
of named character reference; for each entity and numeric character
reference: location in entity and index of dtd in which instance is
declared.
<P>
On systems with fixed stacks, avoid unlimited stack growth: hard
limits on number of SUBDOCS and GRPLVL.
<P>
With extendData and extendS don't extend more than some fixed amount
(eg 1024), otherwise could overrun InputSource buffer on 16-bit
system.
<P>
Have a location in ElementType saying where the first mention of the
element name was. Useful for giving warnings about undefined
elements.
<P>
How to detect 310:8-10?
<P>
AttributeSemantics should return const pointers rather than ResourcePointer's
<P>
Rename Parser -> ParserImpl SgmlParser -> Parser
Syntax::isB -> Syntax::isBlank
<P>
What mode should be used for parsing other prolog after document element?
<P>
Flag out of context data.
<P>
Provide mechanism to allow character names to be mapped onto universal
character numbers.
<P>
Provide mechanism to allow specification of wbat characters are
control characters (for the purposes of SHUNCHAR controls).
<P>
With SCOPE INSTANCE, which syntax should be used for delimiters in
bracketed text entities?
<P>
Better error messages for ambiguous delimiters.
<P>
Do we need both EndLpd and ComplexLink/SimpleLink events?
<P>
What to do about 457:19-21?
<P>
Rename lpd_ to activeLpd_; allLpd_ to lpd_.
<P>
Test for validity of character numbers in syn ref charset (perhaps
unnecessary, because bad numbers won't be translateable into doc
charset).
<P>
Option to read bootstrap character set from entity.
<P>
In AttributeDefinitionList have a flag that is true if any checking of
unspecified values in attribute list is needed (ie CURRENT, REQUIRED,
non-implied ENTITY, non-implied NOTATION). In this case can avoid
running over attributes in AttributeList::finish, by computing value
only when user calls Attribute::value().
<P>
Construct link attributes from definition if no applicable link rule.
(RAST maybe doesn't want this. Make it a separate method in LinkProcess and
use in SgmlsEventHandler. Very useful with ArcEngine.)
<P>
Shouldn't have OpenElementInfo in Message. Instead use RTTI.
<P>
noSuchAttribute: include gi in message; if element is undefined, don't
give error at all
<P>
noSuchAttributeToken: say what element or entity
<P>
nonExistentEntityRef should say document/link type
<P>
Distinguish errors that are totally recoverable.
<P>
Find better way to unpack entity information in entity attribute.
<H2>
Entity Manager
</H2>
<P>
Avoid requiring that BASE sysid exist.
<P>
When FSI has only a single storage manager and that is a literal,
return an InternalInputSource.
<P>
Allow user of InputSource to specify what bit combinations they
want to see for RS and RE.
<P>
Have environment variable SP_INPUT_BCTF that overrides SP_BCTF for
input.
<P>
Avoid using numeric character references for all characters in storage
object identifier of literal storage manager in effective system
identifier.
<P>
Instead of registering coding system pass CodingSystemKit that can create
that can create coding systems.
<P>
Need BCTF entry in catalog that specifies default BCTF.
<P>
Have catalog entry that describes internet charset as BCTF plus PUBLIC
identifier of SGML character set; then have charset= storage attribute
that does the translation.
<P>
An SOEntityCatalog should consist of a Vector&lt;ConstPtr&lt;EntityCatalog>
> which can be shared between several catalogs. This would facilitate
> caching.
<P>
Maybe need to be able to specify two types of catalog entry file: one
used for all documents; one used for this document alone.
<P>
Allow end-tags in FSIs. Support alternative SOSs.
<P>
Character sets in the catalog need rethinking. Also character set of
ParsedSystemId::Map::publicId.
<P>
Allow for HTTP proxy.
<P>
Cache catalogs.
<P>
Use Microsoft ActiveX (formerly Sweeper) DLL on Win95 or NT.
<P>
Implement DTDDECL catalog entry.
<P>
Support FILE URLs.
<P>
Perhaps don't want to do searching for catalog files (and perhaps
command line files).
<P>
Provide mechanism for specifying when (if at all) base dir is searched
relative to other dirs.
<P>
Provide extension to catalog format to distinguish entities declared
in non-base DTDs. Perhaps precede entity name by document type name
surrounded by GRPO/GRPC delimiters.
<P>
URLStorageManager should use a DescriptorManager shared with
PosixStorageManager.
<P>
URLStorageManager::resolveRelative should delete "xxx/../" and "./"
components. Might also be a good idea to resolve host names.
<P>
Implement JIS encoding system (what should be done with half-width yen
and overbar in JIS-Roman? translate to Unicode).
<P>
ExternalInfoImpl::convertOffset: when the position is the character
past the last character and the last character was a newline, line
number should be number of lines + 1.
<P>
Try harder to rewind in StdioStorageObject.
<P>
charset= storage attribute that infers BCTF from MIME charset assuming
10646 document character set.
<H2>
Generic
</H2>
<P>
Provide mechanism to access data entities using generated system id.
<P>
Support IMPLICIT/SIMPLE LINK.
<P>
Character set information.
<P>
Need to know space character that separates token. Alternatively
provide broken down view of tokens.
<P>
Need to know IDREF (and other declared values)?
<H2>
nsgmls
</H2>
<P>
Problem with "\#n;" escape sequence is that it might get used other
than in data. Probably should get rid of this feature, and give
a warning when there's an unencodable character.
<H2>
Internal
</H2>
<P>
Make all macros that occur in headers begin with SP.
<P>
Make sure all files use #pragma i/i.
<P>
Get rid of assumption that Vector&lt;T>::size_type, String&lt;T>::size_type
is size_t.
<P>
Maybe align Owner with auto_ptr.
<P>
Get rid of uses of string as identifier.
<P>
?Maybe support non-const copy constructors for NCVector/Owner.
<P>
Get rid of asEntityOrigin (as far as possible). Make
InputSourceOrigin::defLocation virtual on origin. Avoid excessive use
of asInputSourceOrigin.
<P>
Hash should define Hash(String&lt;unsigned char>),
Hash(String&lt;unsigned short>) etc.
<P>
Invert sense of SP_HAVE_BOOL define.
<P>
Get rid of OutputCharStream::open. Instead have
OutputCharStream::setEncoding. (Perhaps make a friend so we can use
ostream if we're not interested in encodings.) Allow use of ostream
instead of OutputCharStream. Change ParserToolkit::errorStream_'s coding
system when we change the coding system.
<P>
Support 32-bit Char. Need to fix XcharMap and SubstTable.
Detemplatize SubstTable. Then support UTF-16.
<P>
Have a common version of Ptr for things that have a virtual
destructor.
<P>
Have a common version of Owner for all things that have a virtual
destructor.
<P>
Inheritance in AttributeSemantics unnecesary.
<P>
Rename ISet -> RangeSet.
<P>
ISet and RangeMap should use binary search.
<P>
Better hash function for wide characters.
<P>
OutputCharStream should canonically use RS/RE and translate to system
newline char with raw option that prevents this.
<P>
Avoid having Entity.h depend on ParserState, perhaps by double
dispatching.
<P>
Add uses of explicit keyword.
<P>
When generating message.h file; if we don't have .cxx file and
namespaces are supported, use anonymous namespace.
<H2>
Application framework
</H2>
<P>
Only use static programName for outOfMemory message.
<P>
Need to use AppChar *const * not AppChar ** in CmdLineApp.
<P>
When reporting message with MessageEventHandler need to be able to
update error count.
<P>
Option argument names need to be internationalized.
<P>
Support response files for DOS.
<P>
Sort options in usage message.
<P>
StringMessageArg should be associated with a character set (in
particular, need to distinguish parser character sets from
StorageManager character sets).
<P>
Should translate StringMessageArg from document character set to
system character set. Have MessageReporter::setDocumentCharacter
function.
<P>
In MessageReporter, maybe distinguish messages coming from the parser.
<P>
Don't ever give a non-existent file as a location in a error message.
<P>
Text of messages should be able to specify that an open quote or close
quote should be inserted at a particular point.
<P>
When outputting a StringMessageArg translate \r to \n.
<P>
Make sure wild cards work in VC++ and MS-DOS.
<H2>
Win32
</H2>
<P>
Compilers can typically eliminate unused templates. Reengineer Vector
to reduce code size with such compilers.
<P>
Store messages in resources; requires numeric tags for messages.
<P>
Should automatically register all available code pages.
<P>
Make use of IsTextUnicode() API.
<P>
Have StorageManager that uses Win32 API directly. Would avoid limits
on number of open files. Also use flag that says file is being
accessed sequentially.
<P>
Allow DTDs to be compiled into binary by having storage manager that
uses resource ids.
<H2>
Architecture engine
</H2>
<P>
Should give an error with -A if the specified arch does not exist.
<P>
Interpret APPINFO parameter, and automatically enable architectural
processing based on this.
<P>
Handle derived architecture support attributes.
<P>
When doing architectural processing in link type, not possible to have
notation declaration, so need some other way to specify public
identifier for architecture.
<P>
Allow DOCTYPE to be declared inline (as with CONCUR or EXPLICIT LINK).
<P>
Grok conventional comments.
<P>
Make work automatically with EventHandlers that process subdoc. Make
references to subdocs architectural.
<P>
Support different SGML declaration for meta-DTD.
<P>
Maybe should map internal sdata/cdata entities to copies in meta-DTD.
<P>
Perhaps when getting open element info should indicate that gis are
architectural.
<P>
Think about references to SDATA entities in default values in meta-DTD.
<P>
Add default entity from real DTD to meta-DTD.
<P>
Tokenize ArcForm attribute appropriately.
<P>
Make special case for parsing DTD when entity can't be accessed.
<P>
Try to provide extension that would allow architecture elements be
asynchronous with actual elements? This would provide CONCUR
functionality.
<H2>
sgmlnorm
</H2>
<P>
Avoid bogus newline from invalid empty document.
<P>
Avoid always escaping >.
<P>
Option to say whether to use character references for 8-bit characters.
<P>
Option to output implied attributes.
<P>
Option to output all non-implied attributes.
<P>
Option to omit attribute name with name tokens.
<P>
Protect against recognition of short references.
<P>
Option to preserve CDATA entity references.
<P>
Option to output general entity declarations in DTD subset
(but what about data attributes)?
<H2>
spam
</H2>
<P>
Option to normalize names.
<P>
Add comments round expanded entities to prevent false delimiter
recognition.
<P>
Add newline at the end if last thing was omitted tag.
<P>
Option to warn about changes in internal entities when not expanding.
<H2>
Documentation
</H2>
<P>
Error message format.
<P>
&lt;catalog&gt; FSI tag.
<P>
<ADDRESS>
James Clark<BR>
jjc@jclark.com
</ADDRESS>
</BODY>
</HTML>

View File

@@ -0,0 +1,94 @@
<!-- $XConsortium: index.htm /main/1 1996/09/22 18:16:18 rws $ -->
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML Strict//EN">
<HTML>
<HEAD>
<TITLE>SP</TITLE>
</HEAD>
<BODY>
<H1>SP</H1>
<H3>
An SGML System Conforming to International Standard ISO 8879 --
Standard Generalized Markup Language
</H3>
<P>
The following documents are available:
<P>
<UL>
<LI>
<A HREF="features.htm">Summary of SP's features</A>
<LI>
<A HREF="http://www.jclark.com/sp/howtoget.htm">How to get SP</A>
<LI>
<A HREF="build.htm">How to build and install SP from source</A>
<LI>
Using SP
<UL>
<LI>
<A HREF="new.htm">What's new in SP?</A>
<LI>
<A HREF="nsgmls.htm">nsgmls</A>, a replacement for sgmls
<LI>
<A HREF="sgmlsout.htm">nsgmls output format</A>,
an extension to the output format of sgmls
<LI>
<A HREF="spam.htm">spam</A>, a sophisticated normalizer,
perhaps better thought of as a markup stream editor
<LI>
<A HREF="sgmlnorm.htm">sgmlnorm</A>, a simpler normalizer
that focuses on producing the same ESIS rather than
preserving details of the markup
<LI>
<A HREF="spent.htm">spent</A>, a program providing access
to SP's entity manager
<LI>
<A HREF="sysdecl.htm">System declaration</A>
<LI>
<A HREF="sgmldecl.htm">Handling of SGML declarations</A>
<LI>
<A HREF="sysid.htm">System identifiers</A>
<LI>
<A HREF="catalog.htm">Using SGML Open catalogs to generate
system identifiers</A>
<LI>
<A HREF="archform.htm">Architectural form support</A>
<LI>
<A HREF="winntu.htm">Notes on SP Unicode support under Windows NT</A>
</UL>
<LI>
Programming with SP
<UL>
<LI>
<A HREF="generic.htm">Generic API to SP</A>
<LI>
<A HREF="ideas.htm">Ideas for improving SP</A>
</UL>
</UL>
<P>
There is a mailing list for programmer-level discussions of SP. Mail
subscription requests <A
HREF="mailto:sp-prog-request@jclark.com">sp-prog-request@jclark.com</A>.
Messages for the list should go to <A
HREF="mailto:sp-prog@jclark.com">sp-prog@jclark.com</A>.
<P>
For information about SGML, see
<UL>
<LI>
<A
HREF="http://www.sil.org/sgml/sgml.html">The SGML Web Page</A>.
<LI>
<A HREF="http://www.iso.ch/cate/d16387.html">ISO 8879:1986</A>
<LI>
The SGML Handbook, Charles F. Goldfarb
</UL>
<P>
I would like to hear about any bugs you find in SP. When reporting a
bug, please always include a complete self-contained file that will
enable me to reproduce the bug. I am also interested in receiving
suggestions for improvements to SP no matter how small.
<P>
<ADDRESS>
James Clark<BR>
jjc@jclark.com
</ADDRESS>
</BODY>
</HTML>

View File

@@ -0,0 +1,166 @@
<!-- $XConsortium: new.htm /main/1 1996/09/22 18:16:37 rws $ -->
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML Strict//EN">
<HTML>
<HEAD>
<TITLE>What's new in SP?</TITLE>
</HEAD>
<BODY>
<H1>What's new?</H1>
<P>
This document describes recent user-visible changes in SP. Bug fixes
are not described.
<H2>Version 1.1</H2>
<P>
There is now generalized support for <A
HREF="archform.htm">architectural form processing</A>.
<P>
Documentation is now in HTML format.
<P>
A BASE catalog entry can be used to specify a base system identifier
for resolving relative storage object identifiers occurring in the
catalog.
<P>
A LITERAL storage manager is now provided.
<P>
Programs have a -E option that sets the maximum number of errors.
<P>
A DELEGATE catalog entry allows distributed resolution of public
identifiers.
<P>
nsgmls has a -B (batch mode) option that allows you to parse multiple
documents with a single invocation of nsgmls.
<P>
In nsgmls the -c option now specifies a catalog as it does in spam and
sgmlnorm, in addition to the -m option that previously did this.
<P>
The <SAMP>-n</SAMP> option has been replaced by a
<SAMP>-onotation-sysid</SAMP> which applies to nsgmls only, and a
<SAMP>-wnotation-sysid</SAMP> which applies generally.
<P>
SP can be built as a DLL under Win32.
<H2>Version 1.0</H2>
<P>
The syntax of system identifiers has completely changed. The new
syntax is based on the syntax of formal system identifiers defined in
ISO/IEC 10744 (HyTime) Technical Corrigendum 1, Annex D.
<P>
The NSGMLS_CODE environment variable has been renamed to SP_BCTF.
nsgmls has a -b option to specify the bit combination transformation
format to be used for output.
<P>
A list of directories in which files specified in system identifiers
should be searched for can be specified using the environment variable
SGML_SEARCH_PATH or the option -D.
<P>
Individual SYSTEM identifiers in external identifiers can be
overridden using SYSTEM entries in the catalog.
<P>
The OVERRIDE catalog entry now takes a YES/NO argument. (This change
was required for conformance to the SGML Open TR.) It applies to each
entry individually rather than to the entire catalog.
<P>
The -w options of nsgmls and spam have been enhanced. In spam, the -w
option takes an argument as with nsgmls. There are new warnings for
minimized start and end tags (-wunclosed, -wempty, -wnet and
-wmin-tag); for unused short reference maps (-wunused-maps); for
unused parameter entities (-wunused-param). -wall now doesn't include
those warnings that are about conditions that, in the opinion of the
author, there is no reason to avoid. A warning can be turned off by
using its name prefixed by no-; thus -wmin-tag -wno-net is equivalent
to -wunclosed -wempty. The -w option is also used to turn off errors:
-wno-idref replaces the -x option; -wno-significant replaces the -X
option.
<P>
In the output of nsgmls, characters that cannot be represented in the
encoding translation specified by the NSGMLS_BCTF environment variable
are represented using an escape sequence of the form \#N; when N is a
decimal integer.
<P>
In the multi-byte versions of nsgmls there are new BCTFs is8859-N
for N = 1,...,9.
<P>
There is a -o option to nsgmls which makes it output additional
information: -oentity outputs information about all entities; -oid
distinguish attributes with a declared value of id; -oincluded
distinguishes included subelements.
<P>
nsgmls now automatically searches for a catalog entry file called
"catalog" in the same place as the document entity. Note that when
the document entity is specified with a URL, this matches the
behaviour of Panorama.
<P>
A catalog entry file can contain CATALOG entries specifying additional
catalog entry files. This matches the behaviour of Panorama.
<P>
The parser can now make available to an application complete
information about the markup of prologs and SGML declarations. It
would now be possible, for example, to use SP to write a DTD editor.
spam exploits this to a limited extent: if the -p option is specified
twice, then parameter entity references between declarations will be
expanded; the -mreserved option puts all reserved names in upper-case;
with the -mshortref option short reference use declarations and short
reference mapping declarations will be removed; attribute
specification lists in data attribute specifications in entity
declarations can be normalized like attribute specification lists in
start-tags; with -mms it resolves IGNORE/INCLUDE marked sections.
<P>
nsgmls has a -C option which causes the command line filenames to be
treated as a catalog whose DOCUMENT entry specifies the document
entity.
<P>
nsgmls has a -n option which causes it to generate system identifiers
for notations in the same way as it does for entities.
<P>
spam now has a -f option like nsgmls.
<P>
The interface between the parser and entity manager has been
redesigned so that the entity manager can be used independently of the
parser. This is exploited by a new program called spent that prints
an entity with a specified system identifier on the standard output.
<P>
In most cases, a Control-Z occurring as the last byte in a file will
be stripped. This is controlled by the zapeof attribute in formal
system identifiers.
<H2>Version 0.4</H2>
<P>
External concrete syntaxes, character sets and capacity sets are
supported using PUBLIC entries in catalog files. The multicode code
core and reference syntaxes are no longer built-in. Only a few
character sets are now built-in.
<P>
Within external concrete syntaxes, various useful extensions are
permitted. In particular, an ellipsis syntax is allowed for the
specification of name characters and single character short
references. It is now practical to specify tens of thousands of
additional name characters.
<P>
The default SGML declaration is more permissive.
<P>
nsgmls has a -x option that inhibits checking of idrefs.
<P>
nsgmls has a -w option that can enable additional warnings. In
particular, -wmixed will warn about mixed content models that do not
allow #pcdata everywhere.
<P>
The meaning of the f command in the output of nsgmls has changed
slightly. It now gives the effective system identifier of the entity.
<P>
The functionality of the rast program has been merged into the nsgmls
program and the rast program has been removed. The -t option makes
nsgmls generate a RAST result.
<P>
spam has a -l option that uses lower-case for added names that were
subject to upper-case substitution.
<P>
spam has a -mcurrent option that adds omitted attribute specifications
for current attributes.
<P>
<ADDRESS>
James Clark<BR>
jjc@jclark.com
</ADDRESS>
</BODY>
</HTML>

View File

@@ -0,0 +1,448 @@
<!-- $XConsortium: nsgmls.htm /main/1 1996/09/22 18:16:58 rws $ -->
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML Strict//EN">
<HTML>
<HEAD>
<TITLE>NSGMLS</TITLE>
</HEAD>
<BODY>
<H1>NSGMLS</H1>
<H4>
An SGML System Conforming to
International Standard ISO 8879 --<BR>
Standard Generalized Markup Language
</H4>
<H2>
SYNOPSIS
</H2>
<P>
<SAMP>nsgmls</SAMP>
[
<SAMP>-BCdeglprsuv</SAMP>
]
[
<SAMP>-a<VAR>linktype</VAR></SAMP>
]
[
<SAMP>-b<VAR>bctf</VAR></SAMP>
]
[
<SAMP>-c<VAR>sysid</VAR></SAMP>
]
[
<SAMP>-D<VAR>directory</VAR></SAMP>
]
[
<SAMP>-E<VAR>max_errors</VAR></SAMP>
]
[
<SAMP>-f<VAR>file</VAR></SAMP>
]
[
<SAMP>-i<VAR>name</VAR></SAMP>
]
[
<SAMP>-o<VAR>output_option</VAR></SAMP>
]
[
<SAMP>-t<VAR>file</VAR></SAMP>
]
[
<SAMP>-w<VAR>warning_type</VAR></SAMP>
]
[
<SAMP><VAR>sysid</VAR>...</SAMP>
]
<H2>DESCRIPTION</H2>
<P>
Nsgmls parses and validates
the SGML document whose document entity is specified by the
<A HREF="sysid.htm">system identifiers</A>
<SAMP><VAR>sysid</VAR>...</SAMP>
and prints on the standard output a simple text representation of its
Element Structure Information Set.
(This is the information set which a structure-controlled
conforming SGML application should act upon.)
If more than one system identifier is specified,
then the corresponding entities will be concatenated to form
the document entity.
Thus the document entity may be spread amongst several files;
for example, the SGML declaration, prolog and document
instance set could each be in a separate file.
If no system identifiers are specified, then
nsgmls
will read the document entity from the standard input.
A command line system identifier of
<SAMP>-</SAMP>
can be used to refer to the standard input.
(Normally in a system identifier,
<SAMP>&lt;osfd>0</SAMP>
is used to refer to standard input.)
<H2>OPTIONS</H2>
<P>
The following options are available:
<DL>
<DT>
<SAMP>-a<VAR>linktype</VAR></SAMP>
<DD>
Make link type
<SAMP><VAR>linktype</VAR></SAMP>
active.
Not all ESIS information is output in this case:
the active LPDs are not explicitly reported,
although each link attribute is qualified with
its link type name;
there is no information about result elements;
when there are multiple link rules applicable to the
current element,
nsgmls
always chooses the first.
<DT>
<SAMP>-b<VAR>bctf</VAR></SAMP>
<DD>
Use the <A HREF="sysid.htm#bctf">BCTF</A> named
<SAMP><VAR>bctf</VAR></SAMP>
for output.
<DT>
<SAMP>-B</SAMP>
<DD>
Batch mode.
Parse each <SAMP><VAR>sysid...</VAR></SAMP> specified on the command
line separately, rather than concatenating them.
This is useful mainly with <SAMP>-s</SAMP>.
<P>
If <SAMP>-t<VAR>filename</VAR></SAMP> is also specified, then
the specified <SAMP><VAR>filename</VAR></SAMP> will be prefixed
to the <SAMP><VAR>sysid</VAR></SAMP> to make the filename
for the RAST result for each <SAMP><VAR>sysid</VAR></SAMP>.
<DT>
<SAMP>-c<VAR>sysid</VAR></SAMP>
<DD>
Map public identifiers and entity names to system identifiers
using the catalog entry file whose system identifier is
<SAMP><VAR>sysid</VAR></SAMP>.
Multiple
<SAMP>-c</SAMP>
options are allowed.
If there is a catalog entry file called
<SAMP>catalog</SAMP>
in the same place as the document entity,
it will be searched for immediately after those specified by
<SAMP>-c</SAMP>.
<DT>
<A NAME="optC"><SAMP>-C</SAMP></A>
<DD>
The
<SAMP><VAR>filename</VAR>...</SAMP>
arguments specify catalog files rather than the document entity.
The document entity is specified by the first
<SAMP>DOCUMENT</SAMP>
entry in the catalog files.
<DT>
<A NAME="optD"><SAMP>-D<VAR>directory</VAR></SAMP></A>
<DD>
Search
<SAMP><VAR>directory</VAR></SAMP>
for files specified in system identifiers.
Multiple
<SAMP>-D</SAMP> options
are allowed.
See the description of the
<SAMP>osfile</SAMP>
storage manager for more information about file searching.
<DT>
<SAMP>-e</SAMP>
<DD>
Describe open entities in error messages.
Error messages always include the position of the most recently
opened external entity.
<DT>
<SAMP>-E<VAR>max_errors</VAR></SAMP>
<DD>
Nsgmls
will exit after
<SAMP><VAR>max_errors</VAR></SAMP>
errors.
If
<SAMP><VAR>max_errors</VAR></SAMP>
is 0, there is no limit on the number of errors.
The default is 200.
<DT>
<SAMP>-f<VAR>file</VAR></SAMP>
<DD>
Redirect errors to
<SAMP><VAR>file</VAR></SAMP>.
This is useful mainly with shells that do not support redirection
of stderr.
<DT>
<SAMP>-g</SAMP>
<DD>
Show the generic identifiers of open elements in error messages.
<DT>
<A NAME="opti"><SAMP>-i<VAR>name</VAR></SAMP></A>
<DD>
Pretend that
<PRE>
&lt;!ENTITY % <VAR>name</VAR> "INCLUDE">
</PRE>
<P>
occurs at the start of the document type declaration subset
in the SGML document entity.
Since repeated definitions of an entity are ignored,
this definition will take precedence over any other definitions
of this entity in the document type declaration.
Multiple
<SAMP>-i</SAMP>
options are allowed.
If the SGML declaration replaces the reserved name
<SAMP>INCLUDE</SAMP>
then the new reserved name will be the replacement text of the entity.
Typically the document type declaration will contain
<PRE>
&lt;!ENTITY % <VAR>name</VAR> "IGNORE">
</PRE>
<P>
and will use
<SAMP>%<VAR>name</VAR>;</SAMP>
in the status keyword specification of a marked section declaration.
In this case the effect of the option will be to cause the marked
section not to be ignored.
<DT>
<SAMP>-o<VAR>output_option</VAR></SAMP>
<DD>
Output additional information accordig to
<SAMP><VAR>output_option</VAR></SAMP>:
<DL>
<DT>
<SAMP>entity</SAMP>
<DD>
Output definitions of all general entities
not just for data or subdoc entities that are referenced or named in an
ENTITY or ENTITIES attribute.
<DT>
<SAMP>id</SAMP>
<DD>
Distinguish attributes whose declared value is ID.
<DT>
<SAMP>line</SAMP>
<DD>
Output
<SAMP>L</SAMP>
commands giving the current line number and filename.
<DT>
<SAMP>included</SAMP>
<DD>
Output an
<SAMP>i</SAMP>
command for included subelements.
<DT>
<SAMP>notation-sysid</SAMP>
<DD>
Output an <SAMP>f</SAMP> command before an <SAMP>N</SAMP> command,
if a system identifier could be generated for that notation.
</DL>
<P>
Multiple
<SAMP>-o</SAMP>
options are allowed.
<DT>
<SAMP>-p</SAMP>
<DD>
Parse only the prolog.
Nsgmls
will exit after parsing the document type declaration.
Implies
<SAMP>-s</SAMP>.
<DT>
<SAMP>-s</SAMP>
<DD>
Suppress output.
Error messages will still be printed.
<DT>
<SAMP>-t<VAR>file</VAR></SAMP>
<DD>
Output to
<SAMP><VAR>file</VAR></SAMP>
the RAST result as defined by
ISO/IEC 13673:1995 (actually this isn't quite an IS yet;
this implements the Intermediate Editor's Draft of 1994/08/29,
with changes to implement ISO/IEC JTC1/SC18/WG8 N1777).
The normal output is not produced.
<DT>
<SAMP>-v</SAMP>
<DD>
Print the version number.
<DT>
<A NAME="optw"><SAMP>-w<VAR>type</VAR></SAMP></A>
<DD>
Control warnings and errors.
Multiple
<SAMP>-w</SAMP>
options are allowed.
The following values of
<SAMP><VAR>type</VAR></SAMP>
enable warnings:
<DL>
<DT>
<SAMP>mixed</SAMP>
<DD>
Warn about mixed content models that do not allow #pcdata anywhere.
<DT>
<SAMP>sgmldecl</SAMP>
<DD>
Warn about various dubious constructions in the SGML declaration.
<DT>
<SAMP>should</SAMP>
<DD>
Warn about various recommendations made in ISO 8879 that the document
does not comply with.
(Recommendations are expressed with ``should'', as distinct from
requirements which are usually expressed with ``shall''.)
<DT>
<SAMP>default</SAMP>
<DD>
Warn about defaulted references.
<DT>
<SAMP>duplicate</SAMP>
<DD>
Warn about duplicate entity declarations.
<DT>
<SAMP>undefined</SAMP>
<DD>
Warn about undefined elements: elements used in the DTD but not defined.
<DT>
<SAMP>unclosed</SAMP>
<DD>
Warn about unclosed start and end-tags.
<DT>
<SAMP>empty</SAMP>
<DD>
Warn about empty start and end-tags.
<DT>
<SAMP>net</SAMP>
<DD>
Warn about net-enabling start-tags and null end-tags.
<DT>
<SAMP>min-tag</SAMP>
<DD>
Warn about minimized start and end-tags.
Equivalent to combination of
<SAMP>unclosed</SAMP>,
<SAMP>empty</SAMP>
and
<SAMP>net</SAMP>
warnings.
<DT>
<SAMP>unused-map</SAMP>
<DD>
Warn about unused short reference maps: maps that are declared with a
short reference mapping declaration but never used in a short
reference use declaration in the DTD.
<DT>
<SAMP>unused-param</SAMP>
<DD>
Warn about parameter entities that are defined but not used in a DTD.
Unused internal parameter entities whose text is
<SAMP>INCLUDE</SAMP>
or
<SAMP>IGNORE</SAMP>
won't get the warning.
<DT>
<SAMP>notation-sysid</SAMP>
<DD>
Warn about notations for which no system identifier could be generated.
<DT>
<SAMP>all</SAMP>
<DD>
Warn about conditions that should usually be avoided
(in the opinion of the author).
Equivalent to:
<SAMP>mixed</SAMP>,
<SAMP>should</SAMP>,
<SAMP>default</SAMP>,
<SAMP>undefined</SAMP>,
<SAMP>sgmldecl</SAMP>,
<SAMP>unused-map</SAMP>,
<SAMP>unused-param</SAMP>,
<SAMP>empty</SAMP>
and
<SAMP>unclosed</SAMP>.
</DL>
<P>
A warning can be disabled by using its name prefixed with
<SAMP>no-</SAMP>.
Thus
<SAMP>-wall -wno-duplicate</SAMP>
will enable all warnings except those about duplicate entity
declarations.
<P>
The following values for
<SAMP><VAR>warning_type</VAR></SAMP>
disable errors:
<DL>
<DT>
<SAMP>no-idref</SAMP>
<DD>
Do not give an error for an ID reference value
which no element has as its ID.
The effect will be as if each attribute declared as
an ID reference value had been declared as a name.
<DT>
<SAMP>no-significant</SAMP>
<DD>
Do not give an error when a character that is not a significant
character in the reference concrete syntax occurs in a literal in the
SGML declaration. This may be useful in conjunction with certain
buggy test suites.
</DL>
</DL>
<P>
The following options are also supported for backwards compatibility
with sgmls:
<DL>
<DT>
<SAMP>-d</SAMP>
<DD>
Same as
<SAMP>-wduplicate</SAMP>.
<DT>
<SAMP>-l</SAMP>
<DD>
Same as
<SAMP>-oline</SAMP>.
<DT>
<SAMP>-m<VAR>sysid</VAR></SAMP>
<DD>
Same as <SAMP>-c</SAMP>.
<DT>
<SAMP>-r</SAMP>
<DD>
Same as
<SAMP>-wdefault</SAMP>.
<DT>
<SAMP>-u</SAMP>
<DD>
Same as
<SAMP>-wundef</SAMP>.
</DL>
<H2>ENVIRONMENT</H2>
<DL>
<DT>
<SAMP>SP_BCTF</SAMP>
<DD>
If this is set to one of
<SAMP>identity</SAMP>,
<SAMP>utf-8</SAMP>,
<SAMP>euc-jp</SAMP> and <SAMP>sjis</SAMP>, then that BCTF will be used as the
default BCTF for everything (including file input, file output,
message output, filenames, environment variable names, environment
variable values and command line arguments). Note that setting
<SAMP>SP_BCTF</SAMP> to <SAMP>unicode</SAMP>
will not work.
</DL>
<P>
<ADDRESS>
James Clark<BR>
jjc@jclark.com
</ADDRESS>
</BODY>
</HTML>

View File

@@ -0,0 +1,273 @@
<!-- $XConsortium: sgmldecl.htm /main/1 1996/09/22 18:17:17 rws $ -->
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML Strict//EN">
<HTML>
<HEAD>
<TITLE>SP - SGML declaration</TITLE>
</HEAD>
<BODY>
<H1>Handling of the SGML declaration in SP</H1>
<H2>Default SGML declaration</H2>
<P>
If the SGML declaration is omitted
and there is no applicable
<A HREF="catalog.htm#sgmldecl"><SAMP>SGMLDECL</SAMP></A>
entry in a catalog,
the following declaration will be implied:
<PRE>
&lt;!SGML "ISO 8879:1986"
CHARSET
BASESET "ISO 646-1983//CHARSET
International Reference Version (IRV)//ESC 2/5 4/0"
DESCSET 0 9 UNUSED
9 2 9
11 2 UNUSED
13 1 13
14 18 UNUSED
32 95 32
127 1 UNUSED
CAPACITY PUBLIC "ISO 8879:1986//CAPACITY Reference//EN"
SCOPE DOCUMENT
SYNTAX
SHUNCHAR CONTROLS 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
18 19 20 21 22 23 24 25 26 27 28 29 30 31 127 255
BASESET "ISO 646-1983//CHARSET International Reference Version
(IRV)//ESC 2/5 4/0"
DESCSET 0 128 0
FUNCTION RE 13
RS 10
SPACE 32
TAB SEPCHAR 9
NAMING LCNMSTRT ""
UCNMSTRT ""
LCNMCHAR "-."
UCNMCHAR "-."
NAMECASE GENERAL YES
ENTITY NO
DELIM GENERAL SGMLREF
SHORTREF SGMLREF
NAMES SGMLREF
QUANTITY SGMLREF
ATTCNT 99999999
ATTSPLEN 99999999
DTEMPLEN 24000
ENTLVL 99999999
GRPCNT 99999999
GRPGTCNT 99999999
GRPLVL 99999999
LITLEN 24000
NAMELEN 99999999
PILEN 24000
TAGLEN 99999999
TAGLVL 99999999
FEATURES
MINIMIZE DATATAG NO
OMITTAG YES
RANK YES
SHORTTAG YES
LINK SIMPLE YES 1000
IMPLICIT YES
EXPLICIT YES 1
OTHER CONCUR NO
SUBDOC YES 99999999
FORMAL YES
APPINFO NONE>
</PRE>
<P>
with the exception that all characters that are neither significant
nor shunned will be assigned to DATACHAR.
<H2>Character sets</H2>
<P>
A character in a base character set is described either by giving its
number in a universal character set, or by specifying a minimum
literal. The constraints on the choice of universal character set are
that characters that are significant in the SGML reference concrete
syntax must be in the universal character set and must have the same
number in the universal character set as in ISO 646 and that each
character in the character set must be represented by exactly one
number; that character numbers in the range 0 to 31 and 127 to 159 are
control characters (for the purpose of enforcing SHUNCHAR CONTROLS).
It is recommended that ISO 10646 (Unicode) be used as the universal
character set, except in environments where the normal document
character sets are large character set which cannot be compactly
described in terms of ISO 10646.
The public identifier of a base character set can be associated
with an entity that describes it by using a
<SAMP>PUBLIC</SAMP>
entry in the catalog entry file.
The entity must be a fragment
of an SGML declaration
consisting of the
portion of a character set description,
following the DESCSET keyword,
that is, it must be a sequence of character descriptions,
where each character description specifies a described character
number, the number of characters and
either a character number in the universal character set, a minimum literal
or the keyword
<SAMP>UNUSED</SAMP>.
Character numbers in the universal character set can be as big as
99999999.
<P>
In addition SP has built in knowledge of a few character sets.
These are identified using the designating sequence in the
public identifier. The following designating sequences are
recognized:
<DL>
<DT>
<SAMP>ESC 2/5 4/0</SAMP>
<DD>
The full set of ISO 646 IRV.
This is not a registered character set,
but is recommended by ISO 8879 (clause 10.2.2.4).
<DT>
<SAMP>ESC 2/8 4/0</SAMP>
<DD>
G0 set of ISO 646 IRV,
ISO Registration Number 2.
<DT>
<SAMP>ESC 2/8 4/2</SAMP>
<DD>
G0 set of ASCII,
ISO Registration Number 6.
<DT>
<SAMP>ESC 2/1 4/0</SAMP>
<DD>
C0 set of ISO 646,
ISO Registration Number 1.
</DL>
<P>
All the above character sets will be treated as mapping character numbers
0 to 127 inclusive as in ISO 646.
<P>
It is not necessary for every character set used in the SGML
declaration to be known to SP
provided that characters in the document character set that are
significant both in the reference concrete syntax and in the described
concrete syntax are described using known base character sets and that
characters that are significant in the described concrete syntax are
described using the same base character sets or the same minimum
literals in both the document character set description and the syntax
reference character set description.
<H2>Concrete syntaxes</H2>
<P>
The public identifier for a public concrete syntax can be associated
with an entity that describes using a
<SAMP>PUBLIC</SAMP>
entry in the catalog entry file.
The entity must be a fragment of an SGML declaration
consisting of a concrete syntax description
starting with the
<SAMP>SHUNCHAR</SAMP>
keyword
as in an SGML declaration.
The entity can also make use of the following extensions:
<UL>
<LI>
An
<I>added function</I>
can be expressed as a parameter literal
instead of a name.
<LI>
The replacement for a reference reserved name
can be expressed as a parameter literal instead of a name.
<LI>
The
<SAMP>LCNMSTRT</SAMP>,
<SAMP>UCNMSTRT</SAMP>,
<SAMP>LCNMCHAR</SAMP>
and
<SAMP>UCNMCHAR</SAMP>
keywords may each be followed by more than one parameter literal. A
sequence of parameter literals has the same meaning as a single
parameter literal whose content is the concatenation of the content of
each of the literals in the sequence. This extension is useful
because of the restriction on the length of a parameter literal in the
SGML declaration to 240 characters.
<LI>
The total number of characters specified for
<SAMP>UCNMCHAR</SAMP>
or
<SAMP>UCNMSTRT</SAMP>
may exceed the total number of characters specified for
<SAMP>LCNMCHAR</SAMP>
or
<SAMP>LCNMSTRT</SAMP>
respectively.
Each character in
<SAMP>UCNMCHAR</SAMP>
or
<SAMP>UCNMSTRT</SAMP>
which does not have a corresponding character in the same position in
<SAMP>LCNMCHAR</SAMP>
or
<SAMP>LCNMSTRT</SAMP>
is simply assigned to <SAMP>UCNMCHAR</SAMP> or <SAMP>UCNMSTRT</SAMP>
without making it the upper-case form of any character.
<LI>
A parameter following any of
<SAMP>LCNMSTRT</SAMP>,
<SAMP>UCNMSTRT</SAMP>,
<SAMP>LCNMCHAR</SAMP>
and
<SAMP>UCNMCHAR</SAMP>
keywords may be followed by
the name token <SAMP>...</SAMP>
(three periods) and another parameter literal.
This has the same meaning as the two parameter literals
with a parameter literal in between
containing in order each character whose number
is greater than the number of the last character in
the first parameter literal and less than the
number of the first character in the second
parameter literal.
A parameter literal must contain at least one character for each
<SAMP>...</SAMP>
to which it is adjacent.
<LI>
A number may be used as a parameter following the
<SAMP>LCNMSTRT</SAMP>,
<SAMP>UCNMSTRT</SAMP>,
<SAMP>LCNMCHAR</SAMP>
and
<SAMP>UCNMCHAR</SAMP>
keywords or as a delimiter in the
<SAMP>DELIM</SAMP>
section with the same meaning as a parameter literal
containing just a numeric character reference with that number.
<LI>
The parameters following the
<SAMP>LCNMSTRT</SAMP>,
<SAMP>UCNMSTRT</SAMP>,
<SAMP>LCNMCHAR</SAMP>
and
<SAMP>UCNMCHAR</SAMP>
keywords may be omitted.
This has the same meaning as specifying
an empty parameter literal.
<LI>
Within the specification of the short reference delimiters,
a parameter literal containing exactly one character
may be followed by the name token <SAMP>...</SAMP>
and another parameter literal containing exactly one character.
This has the same meaning as a sequence of parameter literals
one for each character number that is greater than or equal
to the number of the character in the first parameter literal
and less than or equal to the number of the character in the
second parameter literal.
</UL>
<H2>Capacity sets</H2>
<P>
The public identifier for a public capacity set can be associated
with an entity that describes using a
<SAMP>PUBLIC</SAMP>
entry in the catalog entry file.
The entity must be a fragment of an SGML declaration
consisting of a sequence of capacity names and numbers.
<P>
<ADDRESS>
James Clark<BR>
jjc@jclark.com
</ADDRESS>
</BODY>
</HTML>

View File

@@ -0,0 +1,151 @@
<!-- $XConsortium: sgmlnorm.htm /main/1 1996/09/22 18:17:36 rws $ -->
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML Strict//EN">
<HTML>
<HEAD>
<TITLE>SGMLNORM</TITLE>
</HEAD>
<BODY>
<H1>SGMLNORM</H1>
<H4>
An SGML System Conforming to
International Standard ISO 8879 --<BR>
Standard Generalized Markup Language
</H4>
<H2>SYNOPSIS</H2>
<P>
<SAMP>sgmlnorm</SAMP>
[
<SAMP>-Cdemnv</SAMP>
]
[
<SAMP>-b<VAR>bctf</VAR></SAMP>
]
[
<SAMP>-c<VAR>catalog</VAR></SAMP>
]
[
<SAMP>-D<VAR>dir</VAR></SAMP>
]
[
<SAMP>-i<VAR>name</VAR></SAMP>
]
[
<SAMP>-w<VAR>warning</VAR></SAMP>
]
<SAMP><VAR>sysid...</VAR></SAMP>
<H2>DESCRIPTION</H2>
<P>
Sgmlnorm prints on the standard output a <I>normalized</I> document instance
for the SGML document contained in the concatenation of the entities
with <A HREF="sysid.htm">system identifiers</A>
<SAMP><VAR>sysid...</VAR></SAMP>.
<P>
When the normalized instance is prefixed with the original SGML declaration
and prolog, it will have the same ESIS as the original SGML document,
with the following exceptions:
<UL>
<LI>
The output of sgmlnorm does not protect against the recognition of
short reference delimiters, so any <SAMP>USEMAP</SAMP> declarations
must be removed from the DTD.
<LI>
The normalized instance will use the reference delimiters, even if the
original instance did not.
<LI>
If marked sections are included in the output using the
<SAMP>-m</SAMP> option, the reference reserved names will be used for
the status keywords even if the original instance did not.
<LI>
Any ESIS information relating to the SGML LINK feature will be lost.
</UL>
<P>
The normalized instance will not use any markup minimization features
except that:
<UL>
<LI>
Any attributes that were not specified in the original instance
will not be included in the normalized instance.
(Current attributes will be included.)
<LI>
If the declared value of an attribute was a name token group,
and a value was specified that was the same as the name of
the attribute, then the attribute name and value indicator will be
omitted.
For example, with HTML sgmlnorm would output <CODE>&lt;DL COMPACT&gt;</CODE>
rather than <CODE>&lt;DL COMPACT="COMPACT"&gt;</CODE>
</UL>
<P>
The following options are available:
<DL>
<DT>
<SAMP>-b<VAR>bctf</VAR></SAMP>
<DD>
Use the <A HREF="sysid.htm#bctf">BCTF</A> with name
<SAMP><VAR>bctf</VAR></SAMP>
for output.
<DT>
<SAMP>-c<VAR>file</VAR></SAMP>
<DD>
Use the catalog entry file
<SAMP><VAR>file</VAR></SAMP>.
<DT>
<SAMP>-C</SAMP>
<DD>
This has the same effect as in <A HREF="nsgmls#optC">nsgmls</A>.
<DT>
<SAMP>-d</SAMP>
<DD>
Output a document type declaration with the same external
identifier as the input document, and with no
internal declaration subset.
No check is performed that the document instance is valid
with respect to this DTD.
<DT>
<SAMP>-D<VAR>directory</VAR></SAMP>
<DD>
Search
<SAMP><VAR>directory</VAR></SAMP>
for files specified in system identifiers.
This has the same effect as in <A HREF="nsgmls.htm#optD">nsgmls</A>.
<DT>
<SAMP>-e</SAMP>
<DD>
Describe open entities in error messages.
<DT>
<SAMP>-i<VAR>name</VAR></SAMP>
<DD>
This has the same effect as in <A HREF="nsgmls.htm#opti">nsgmls</A>.
<DT>
<SAMP>-m</SAMP>
<DD>
Output any marked sections that were in the input document instance.
<DT>
<SAMP>-n</SAMP>
<DD>
Output any comments that were in the input document instance.
<DT>
<SAMP>-r</SAMP>
<DD>
Raw output.
Don't perform any conversion on RSs and REs when printing the entity.
The entity would typically have the storage manager attribute
<SAMP>records=asis</SAMP>.
<DT>
<SAMP>-v</SAMP>
<DD>
Print the version number.
<DT>
<SAMP>-w<VAR>type</VAR></SAMP>
<DD>
Control warnings and errors according to
<SAMP><VAR>type</VAR></SAMP>.
This has the same effect as in <A HREF="nsgmls.htm#optw">nsgmls</A>.
</DL>
<P>
<ADDRESS>
James Clark<BR>
jjc@jclark.com
</ADDRESS>
</BODY>
</HTML>

View File

@@ -0,0 +1,418 @@
<!-- $XConsortium: sgmlsout.htm /main/1 1996/09/22 18:17:55 rws $ -->
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML Strict//EN">
<HTML>
<HEAD>
<TITLE>Nsgmls Output Format</TITLE>
</HEAD>
<BODY>
<H1>Nsgmls Output Format</H1>
<P>
The output is a series of lines.
Lines can be arbitrarily long.
Each line consists of an initial command character
and one or more arguments.
Arguments are separated by a single space,
but when a command takes a fixed number of arguments
the last argument can contain spaces.
There is no space between the command character and the first argument.
Arguments can contain the following escape sequences:
<DL>
<DT>
<CODE>\\</CODE>
<DD>
A
<CODE>\</CODE>.
<DT>
<CODE>\n</CODE>
<DD>
A record end character.
<DT>
<CODE>\|</CODE>
<DD>
Internal SDATA entities are bracketed by these.
<DT>
<CODE>\<VAR>nnn</VAR></CODE>
<DD>
The character whose code is
<CODE><VAR>nnn</VAR></CODE>
octal.
<P>
A record start character will be represented by
<CODE>\012</CODE>.
Most applications will need to ignore
<CODE>\012</CODE>
and translate
<CODE>\n</CODE>
into newline.
<DT>
<CODE>\#<VAR>n</VAR>;</CODE>
<DD>
The character whose number is
<CODE><VAR>n</VAR></CODE>
in decimal.
<CODE><VAR>n</VAR></CODE>
can have any number of digits.
This is used for characters that are not representable by the
encoding translation used for output
(as specified by the
<CODE>SP_BCTF</CODE>
environment variable).
This will only occur with the multibyte version of nsgmls.
</DL>
<P>
The possible command characters and arguments are as follows:
<DL>
<DT>
<CODE>(<VAR>gi</VAR></CODE>
<DD>
The start of an element whose generic identifier is
<CODE><VAR>gi</VAR></CODE>.
Any attributes for this element
will have been specified with
<CODE>A</CODE>
commands.
<DT>
<CODE>)<VAR>gi</VAR></CODE>
<DD>
The end of an element whose generic identifier is
<CODE><VAR>gi</VAR></CODE>.
<DT>
<CODE>-<VAR>data</VAR></CODE>
<DD>
Data.
<DT>
<CODE>&amp;<VAR>name</VAR></CODE>
<DD>
A reference to an external data entity
<CODE><VAR>name</VAR></CODE>;
<CODE><VAR>name</VAR></CODE>
will have been defined using an
<CODE>E</CODE>
command.
<DT>
<CODE>?<VAR>pi</VAR></CODE>
<DD>
A processing instruction with data
<CODE><VAR>pi</VAR></CODE>.
<DT>
<CODE>A<VAR>name</VAR> <VAR>val</VAR></CODE>
<DD>
The next element to start has an attribute
<CODE><VAR>name</VAR></CODE>
with value
<CODE><VAR>val</VAR></CODE>
which takes one of the following forms:
<DL>
<DT>
<CODE>IMPLIED</CODE>
<DD>
The value of the attribute is implied.
<DT>
<CODE>CDATA <VAR>data</VAR></CODE>
<DD>
The attribute is character data.
This is used for attributes whose declared value is
<CODE>CDATA</CODE>.
<DT>
<CODE>NOTATION <VAR>nname</VAR></CODE>
<DD>
The attribute is a notation name;
<CODE><VAR>nname</VAR></CODE>
will have been defined using a
<CODE>N</CODE>
command.
This is used for attributes whose declared value is
<CODE>NOTATION</CODE>.
<DT>
<CODE>ENTITY <VAR>name...</VAR></CODE>
<DD>
The attribute is a list of general entity names.
Each entity name will have been defined using an
<CODE>I</CODE>,
<CODE>E</CODE>
or
<CODE>S</CODE>
command.
This is used for attributes whose declared value is
<CODE>ENTITY</CODE>
or
<CODE>ENTITIES</CODE>.
<DT>
<CODE>TOKEN <VAR>token...</VAR></CODE>
<DD>
The attribute is a list of tokens.
This is used for attributes whose declared value is anything else.
<DT>
<CODE>ID <VAR>token</VAR></CODE>
<DD>
The attribute is an ID value.
This will be output only if the
<CODE>-oid</CODE>
option is specified.
Otherwise
<CODE>TOKEN</CODE>
will be used for ID values.
</DL>
<DT>
<CODE>D<VAR>ename</VAR> <VAR>name</VAR> <VAR>val</VAR></CODE>
<DD>
This is the same as the
<CODE>A</CODE>
command, except that it specifies a data attribute for an
external entity named
<CODE><VAR>ename</VAR></CODE>.
Any
<CODE>D</CODE>
commands will come after the
<CODE>E</CODE>
command that defines the entity to which they apply, but
before any
<CODE>&amp;</CODE>
or
<CODE>A</CODE>
commands that reference the entity.
<DT>
<CODE>a<VAR>type</VAR> <VAR>name</VAR> <VAR>val</VAR></CODE>
<DD>
The next element to start has a link attribute with link type
<CODE><VAR>type</VAR></CODE>,
name
<CODE><VAR>name</VAR></CODE>,
and value
<CODE><VAR>val</VAR></CODE>,
which takes the same form as with the
<CODE>A</CODE>
command.
<DT>
<CODE>N<VAR>nname</VAR></CODE>
<DD>
Define a notation <CODE><VAR>nname</VAR></CODE>.
This command will be preceded by a
<CODE>p</CODE>
command if the notation was declared with a public identifier,
and by a
<CODE>s</CODE>
command if the notation was declared with a system identifier.
If the
<CODE>-onotation-sysid</CODE>
option was specified,
this command will also be preceded by an
<CODE>f</CODE>
command giving the system identifier generated by the entity manager
(unless it was unable to generate one).
A notation will only be defined if it is to be referenced
in an
<CODE>E</CODE>
command or in an
<CODE>A</CODE>
command for an attribute with a declared value of
<CODE>NOTATION</CODE>.
<DT>
<CODE>E<VAR>ename</VAR> <VAR>typ</VAR> <VAR>nname</VAR></CODE>
<DD>
Define an external data entity named
<CODE><VAR>ename</VAR></CODE>
with type
<CODE><VAR>typ</VAR></CODE>
(<CODE>CDATA</CODE>, <CODE>NDATA</CODE> or <CODE>SDATA</CODE>)
and notation <CODE><VAR>not</VAR></CODE>.
Thiscommand will be preceded by an
<CODE>f</CODE>
command giving the system identifier generated by the entity manager
(unless it was unable to generate one),
by a
<CODE>p</CODE>
command if a public identifier was declared for the entity,
and by a
<CODE>s</CODE>
command if a system identifier was declared for the entity.
<CODE><VAR>not</VAR></CODE>
will have been defined using a
<CODE>N</CODE>
command.
Data attributes may be specified for the entity using
<CODE>D</CODE>
commands.
If the
<CODE>-oentity</CODE>
option is not specified,
an external data entity will only be defined if it is to be referenced in a
<CODE>&amp;</CODE>
command or in an
<CODE>A</CODE>
command for an attribute whose declared value is
<CODE>ENTITY</CODE>
or
<CODE>ENTITIES</CODE>.
<DT>
<CODE>I<VAR>ename</VAR> <VAR>typ</VAR> <VAR>text</VAR></CODE>
<DD>
Define an internal data entity named
<CODE><VAR>ename</VAR></CODE>
with type
<CODE><VAR>typ</VAR></CODE>
and entity text
<CODE><VAR>text</VAR></CODE>.
The
<CODE><VAR>typ</VAR></CODE>
will be
<CODE>CDATA</CODE>
or
<CODE>SDATA</CODE>
unless the
<CODE>-oentity</CODE>
option was specified,
in which case it can also be
<CODE>PI</CODE>
or
<CODE>TEXT</CODE>
(for an SGML text entity).
If the
<CODE>-oentity</CODE>
option is not specified,
an internal data entity will only be defined if it is referenced in an
<CODE>A</CODE>
command for an attribute whose declared value is
<CODE>ENTITY</CODE>
or
<CODE>ENTITIES</CODE>.
<DT>
<CODE>S<VAR>ename</VAR></CODE>
<DD>
Define a subdocument entity named
<CODE><VAR>ename</VAR></CODE>.
This command will be preceded by an
<CODE>f</CODE>
command giving the system identifier generated by the entity manager
(unless it was unable to generate one),
by a
<CODE>p</CODE>
command if a public identifier was declared for the entity,
and by a
<CODE>s</CODE>
command if a system identifier was declared for the entity.
If the
<CODE>-oentity</CODE>
option is not specified,
a subdocument entity will only be defined if it is referenced
in a
<CODE>{</CODE>
command
or in an
<CODE>A</CODE>
command for an attribute whose declared value is
<CODE>ENTITY</CODE>
or
<CODE>ENTITIES</CODE>.
<DT>
<CODE>T<VAR>ename</VAR></CODE>
<DD>
Define an external SGML text entity named
<CODE><VAR>ename</VAR></CODE>.
This command will be preceded by an
<CODE>f</CODE>
command giving the system identifier generated by the entity manager
(unless it was unable to generate one),
by a
<CODE>p</CODE>
command if a public identifier was declared for the entity,
and by a
<CODE>s</CODE>
command if a system identifier was declared for the entity.
This command will be output only if the
<CODE>-oentity</CODE>
option is specified.
<DT>
<CODE>s<VAR>sysid</VAR></CODE>
<DD>
This command applies to the next
<CODE>E</CODE>,
<CODE>S</CODE>,
<CODE>T</CODE>
or
<CODE>N</CODE>
command and specifies the associated system identifier.
<DT>
<CODE>p<VAR>pubid</VAR></CODE>
<DD>
This command applies to the next
<CODE>E</CODE>,
<CODE>S</CODE>,
<CODE>T</CODE>
or
<CODE>N</CODE>
command and specifies the associated public identifier.
<DT>
<CODE>f<VAR>sysid</VAR></CODE>
<DD>
This command applies to the next
<CODE>E</CODE>,
<CODE>S</CODE>,
<CODE>T</CODE>
or, if the
<CODE>-onotation-sysid</CODE>
option was specified,
<CODE>N</CODE>
command and specifies the system identifier
generated by the entity manager from the specified external identifier
and other information about the entity or notation.
<DT>
<CODE>{<VAR>ename</VAR></CODE>
<DD>
The start of the SGML subdocument entity
<CODE><VAR>ename</VAR></CODE>;
<CODE><VAR>ename</VAR></CODE>
will have been defined using a
<CODE>S</CODE>
command.
<DT>
<CODE>}<VAR>ename</VAR></CODE>
<DD>
The end of the SGML subdocument entity
<CODE><VAR>ename</VAR></CODE>.
<DT>
<CODE>L<VAR>lineno</VAR> <VAR>file</VAR></CODE>
<DT>
<CODE>L<VAR>lineno</VAR></CODE>
<DD>
Set the current line number and filename.
The
<CODE><VAR>file</VAR></CODE>
argument will be omitted if only the line number has changed.
This will be output only if the
<CODE>-l</CODE>
option has been given.
<DT>
<CODE>#<VAR>text</VAR></CODE>
<DD>
An APPINFO parameter of
<CODE><VAR>text</VAR></CODE>
was specified in the SGML declaration.
This is not strictly part of the ESIS, but a structure-controlled
application is permitted to act on it.
No
<CODE>#</CODE>
command will be output if
<CODE>APPINFO NONE</CODE>
was specified.
A
<CODE>#</CODE>
command will occur at most once,
and may be preceded only by a single
<CODE>L</CODE>
command.
<DT>
<CODE>C</CODE>
<DD>
This command indicates that the document was a conforming SGML document.
If this command is output, it will be the last command.
An SGML document is not conforming if it references a subdocument entity
that is not conforming.
</DL>
<P>
<ADDRESS>
James Clark<BR>
jjc@jclark.com
</ADDRESS>
</BODY>
</HTML>

View File

@@ -0,0 +1,272 @@
<!-- $XConsortium: spam.htm /main/1 1996/09/22 18:18:13 rws $ -->
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML Strict//EN">
<HTML>
<HEAD>
<TITLE>SPAM</TITLE>
</HEAD>
<BODY>
<H1>SPAM</H1>
<H4>
An SGML System Conforming to
International Standard ISO 8879 --<BR>
Standard Generalized Markup Language
</H4>
<H2>
SYNOPSIS
</H2>
<P>
<CODE>spam</CODE>
[
<CODE>-Cehilprvx</CODE>
]
[
<CODE>-c<VAR>catalog_file</VAR></CODE>
]
[
<CODE>-D<VAR>directory</VAR></CODE>
]
[
<CODE>-f<VAR>file</VAR></CODE>
]
[
<CODE>-m<VAR>markup_option</VAR></CODE>
]
[
<CODE>-o<VAR>entity_name</VAR></CODE>
]
[
<CODE>-w<VAR>warning_type</VAR></CODE>
]
<CODE><VAR>sysid...</VAR></CODE>
<H2>DESCRIPTION</H2>
<P>
Spam (SP Add Markup)
is an SGML markup stream editor implemented using the SP parser.
Spam parses the SGML document contained in
<CODE><VAR>sysid...</VAR></CODE>
and copies to the standard output
the portion of the document entity containing the document
instance, adding or changing markup as specified by the
<CODE>-m</CODE> options.
The <CODE>-p</CODE>
option can be used to include the SGML declaration and prolog
in the output.
The <CODE>-o</CODE>
option can be used to output other entities.
The
<CODE>-x</CODE>
option can be used to expand entity references.
<P>
The following options are available:
<DL>
<DT>
<CODE>-c<VAR>file</VAR></CODE>
<DD>
Use the catalog entry file
<CODE><VAR>file</VAR></CODE>.
<DT>
<CODE>-C</CODE>
<DD>
This has the same effect as in <A HREF="nsgmls#optC">nsgmls</A>.
<DT>
<CODE>-D<VAR>directory</VAR></CODE>
<DD>
Search
<CODE><VAR>directory</VAR></CODE>
for files specified in system identifiers.
This has the same effect as in <A HREF="nsgmls.htm#optD">nsgmls</A>.
<DT>
<CODE>-e</CODE>
<DD>
Describe open entities in error messages.
<DT>
<CODE>-f<VAR>file</VAR></CODE>
<DD>
Redirect errors to
<CODE><VAR>file</VAR></CODE>.
This is useful mainly with shells that do not support redirection
of stderr.
<DT>
<CODE>-h</CODE>
<DD>
Hoist omitted tags out from the start of internal entities.
If the text at the beginning of an internal entity causes
a tag to be implied,
the tag will usually be treated as being in that internal entity;
this option will instead cause it to be treated as being in the entity
that referenced the internal entity.
This option makes a difference in conjunction with
<CODE>-momittag</CODE>
or
<CODE>-x -x</CODE>.
<DT>
<CODE>-i<VAR>name</VAR></CODE>
<DD>
This has the same effect as in <A HREF="nsgmls.htm#opti">nsgmls</A>.
<DT>
<CODE>-l</CODE>
<DD>
Prefer lower-case.
Added names that were subject to upper-case substitution
will be converted to lower-case.
<DT>
<CODE>-m<VAR>markup_option</VAR></CODE>
<DD>
Change the markup in the output according to the value
of
<CODE><VAR>markup_option</VAR></CODE>
as follows:
<DL>
<DT>
<CODE>omittag</CODE>
<DD>
Add tags that were omitted using omitted tag minimization.
End tags that were omitted because the element has
a declared content of <SAMP>EMPTY</SAMP>
or an explicit content reference
will not be added.
<DT>
<CODE>shortref</CODE>
<DD>
Replace short references by named entity references.
<DT>
<CODE>net</CODE>
<DD>
Change null end-tags
into unminimized end-tags,
and change net-enabling start-tags
into unminimized start-tags.
<DT>
<CODE>emptytag</CODE>
<DD>
Change empty tags into unminimized tags.
<DT>
<CODE>unclosed</CODE>
<DD>
Change unclosed tags into unminimized tags.
<DT>
<CODE>attname</CODE>
<DD>
Add omitted attribute names and
<CODE>vi</CODE>s.
<DT>
<CODE>attvalue</CODE>
<DD>
Add literal delimiters omitted from attribute values.
<DT>
<CODE>attspec</CODE>
<DD>
Add omitted attribute specifications.
<DT>
<CODE>current</CODE>
<DD>
Add omitted attribute specifications for current attributes.
This option is implied by the
<CODE>attspec</CODE>
option.
<DT>
<CODE>shorttag</CODE>
<DD>
Equivalent to combination of
<CODE>net</CODE>,
<CODE>emptytag</CODE>,
<CODE>unclosed</CODE>,
<CODE>attname</CODE>,
<CODE>attvalue</CODE>
and
<CODE>attspec</CODE>
options.
<DT>
<CODE>rank</CODE>
<DD>
Add omitted rank suffixes.
<DT>
<CODE>reserved</CODE>
<DD>
Put reserved names in upper-case.
<DT>
<CODE>ms</CODE>
<DD>
Remove marked section declarations whose effective status
is IGNORE, and replace each marked section declaration
whose effective status is INCLUDE by its marked section.
In the document instance, empty comments will be added
before or after the marked section declaration to ensure
that ignored record ends remain ignored.
</DL>
<P>
Multiple
<CODE>-m</CODE>
options are allowed.
<DT>
<CODE>-o<VAR>name</VAR></CODE>
<DD>
Output the general entity
<CODE><VAR>name</VAR></CODE>
instead of the document entity.
The output will correspond to the first time
that the entity is referenced in content.
<DT>
<CODE>-p</CODE>
<DD>
Output the part of the document entity containing the SGML declaration
(if it was explicitly present in the document entity)
and the prolog before anything else.
If this option is specified two or more times,
then all entity references occurring between declarations
in the prolog will be expanded;
this includes the implicit reference to the entity
containing the external subset of the DTD, if there is one.
Note that the SGML declaration will not be included if it was
specified by an SGMLDECL entry in a catalog.
<DT>
<CODE>-r</CODE>
<DD>
Don't perform any conversion on RSs and REs when outputting the entity.
The entity would typically have the storage manager attribute
<CODE>records=asis</CODE>.
<DT>
<CODE>-v</CODE>
<DD>
Print the version number.
<DT>
<CODE>-w<VAR>type</VAR></CODE>
<DD>
Control warnings and errors according to
<CODE><VAR>type</VAR></CODE>.
This has the same effect as in <A HREF="nsgmls.htm#optw">nsgmls</A>.
<DT>
<CODE>-x</CODE>
<DD>
Expand references to entities that are changed.
If this option is specified two or more times,
then all references to entities that contain tags
will be expanded.
</DL>
<H2>BUGS</H2>
<P>
Omitted tags are added at the point where they are
implied by the SGML parser (except as modified
by the
<CODE>-h</CODE>
option); this is often not quite where they are wanted.
<P>
The case of general delimiters is not preserved.
<P>
Incorrect results may be produced if a variant concrete syntax is used
which is such that there are delimiters in markup to be added that have a
prefix that is a proper suffix of some other delimiter.
<P>
If an entity reference in a default value uses the default entity and
an entity with that name is subsequently defined and that default
value is added to the document instance, then the resulting document
may not be equivalent to the original document.
Spam will give a warning when the first two conditions are met.
<P>
<ADDRESS>
James Clark<BR>
jjc@jclark.com
</ADDRESS>
</BODY>
</HTML>

View File

@@ -0,0 +1,68 @@
<!-- $XConsortium: spent.htm /main/1 1996/09/22 18:18:33 rws $ -->
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML Strict//EN">
<HTML>
<HEAD>
<TITLE>SPENT</TITLE>
</HEAD>
<BODY>
<H1>SPENT</H1>
<H2>SYNOPSIS</H2>
<P>
<CODE>spent</CODE>
[
<CODE>-Crv</CODE>
]
[
<CODE>-b<VAR>bctf</VAR></CODE>
]
[
<CODE>-D<VAR>directory</VAR></CODE>
]
<CODE><VAR>sysid...</VAR></CODE>
<H2>DESCRIPTION</H2>
<P>
Spent (SGML print entity)
prints the concatenation of the entities with
<A HREF="sysid">system identifiers</A>
<CODE><VAR>sysid...</VAR></CODE>
on the standard output.
<P>
The following options are available:
<DL>
<DT>
<CODE>-b<VAR>bctf</VAR></CODE>
<DD>
Use the <A HREF="sysid.htm#bctf">BCTF</A> with name
<CODE><VAR>bctf</VAR></CODE>
for output.
<DT>
<CODE>-C</CODE>
<DD>
This has the same effect as in <A HREF="nsgmls#optC">nsgmls</A>.
<DT>
<CODE>-D<VAR>directory</VAR></CODE>
<DD>
Search
<CODE><VAR>directory</VAR></CODE>
for files specified in system identifiers.
This has the same effect as in <A HREF="nsgmls.htm#optD">nsgmls</A>.
<DT>
<CODE>-r</CODE>
<DD>
Raw output.
Don't perform any conversion on RSs and REs when printing the entity.
The entity would typically have the storage manager attribute
<CODE>records=asis</CODE>.
<DT>
<CODE>-v</CODE>
<DD>
Print the version number.
</DL>
<P>
<ADDRESS>
James Clark<BR>
jjc@jclark.com
</ADDRESS>
</BODY>
</HTML>

View File

@@ -0,0 +1,41 @@
<!-- $XConsortium: sysdecl.htm /main/1 1996/09/22 18:18:52 rws $ -->
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML Strict//EN">
<HTML>
<HEAD>
<TITLE>SP - System declaration</TITLE>
</HEAD>
<BODY>
<H1>SP System Declaration</H1>
<P>
The system declaration for SP is as follows:
<PRE>
&lt;!SYSTEM "ISO 8879:1986"
CHARSET
BASESET "ISO 646-1983//CHARSET
International Reference Version (IRV)//ESC 2/5 4/0"
DESCSET 0 128 0
CAPACITY PUBLIC "ISO 8879:1986//CAPACITY Reference//EN"
FEATURES
MINIMIZE DATATAG NO OMITTAG YES RANK YES SHORTTAG YES
LINK SIMPLE YES 65535 IMPLICIT YES EXPLICIT YES 1
OTHER CONCUR NO SUBDOC YES 100 FORMAL YES
SCOPE DOCUMENT
SYNTAX PUBLIC "ISO 8879:1986//SYNTAX Reference//EN"
SYNTAX PUBLIC "ISO 8879:1986//SYNTAX Core//EN"
VALIDATE
GENERAL YES MODEL YES EXCLUDE YES CAPACITY NO
NONSGML YES SGML YES FORMAL YES
SDIF
PACK NO UNPACK NO&gt;
</PRE>
<P>
The limit for the SUBDOC parameter is memory dependent.
<P>
Any legal concrete syntax may be used.
<P>
<ADDRESS>
James Clark<BR>
jjc@jclark.com
</ADDRESS>
</BODY>
</HTML>

View File

@@ -0,0 +1,305 @@
<!-- $XConsortium: sysid.htm /main/1 1996/09/22 18:19:13 rws $ -->
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML Strict//EN">
<HTML>
<HEAD>
<TITLE>SP - System identifiers</TITLE>
</HEAD>
<BODY>
<H1>System identifiers</H1>
<P>
There are two kinds of system identifier: formal system identifiers
and simple system identifiers. A system identifier that does not
start with <SAMP>&lt;</SAMP> will always be interpreted as a simple
system identifier. A simple system identifier will always be
interpreted either as a filename or as a URL.
<H2>Formal system identifiers</H2>
<P>
Formal system identifiers are based on the
System Identifier facility defined in ISO/IEC 10744 (HyTime) Technical
Corrigendum 1, Annex D.
A system identifier that is a formal system
identifier consists of a sequence of one or more storage object
specifications. The objects specified by the storage object
specifications are concatenated to form the entity. A storage object
specification consists of an SGML start-tag in the reference concrete
syntax followed by character data content. The generic identifier of
the start-tag is the name of a storage manager. The content is a
storage object identifier which identifies the storage object in a
manner dependent on the storage manager. The start-tag can also
specify attributes giving additional information about the storage
object. Numeric character references are recognized in storage object
identifiers and attribute value literals in the start-tag. Record
ends are ignored in the storage object identifier as with SGML. A
system identifier will be interpreted as a formal system identifier if
it starts with a <SAMP>&lt;</SAMP> followed by a storage manager name,
followed by either <SAMP>></SAMP> or white-space; otherwise it will be
interpreted as a simple system identifier. A storage object
identifier extends until the end of the system identifier or until the
first occurrence of <SAMP>&lt;</SAMP> followed by a storage manager
name, followed by either <SAMP>></SAMP> or white-space.
<P>
The following storage managers are available:
<DL>
<DT>
<A NAME="osfile"><SAMP>osfile</SAMP></A>
<DD>
The storage object identifier is a filename. If the filename is
relative it is resolved using a base filename. Normally the base
filename is the name of the file in which the storage object
identifier was specified, but this can be changed using the
<SAMP>base</SAMP> attribute. The filename will be searched for first
in the directory of the base filename. If it is not found there, then
it will be searched for in directories specified with the
<SAMP>-D</SAMP> option in the order in which they were specified on
the command line, and then in the list of directories specified by the
environment variable <SAMP>SGML_SEARCH_PATH</SAMP>. The list
is separated by colons under Unix and by semi-colons under MSDOS.
<DT>
<SAMP>osfd</SAMP>
<DD>
The storage object identifier is an integer specifying a file
descriptor. Thus a system identifier of <SAMP>&lt;osfd>0</SAMP> will
refer to the standard input.
<DT>
<SAMP>url</SAMP>
<DD>
The storage object identifier is a URL. Only the <SAMP>http</SAMP>
scheme is currently supported and not on all systems.
<DT>
<SAMP>neutral</SAMP>
<DD>
The storage manager is the storage manager of storage object in which
the system identifier was specified (the <I>underlying storage
manager</I>). However if the underlying storage manager does not
support named storage objects (ie it is <SAMP>osfd</SAMP>), then the
storage manager will be <SAMP>osfile</SAMP>. The storage object
identifier is treated as a relative, hierarchical name separated by
slashes (<SAMP>/</SAMP>) and will be transformed as appropriate for
the underlying storage manager.
<DT>
<SAMP>literal</SAMP>
<DD>
The bit combinations of the storage object identifier are
the contents of the storage object.
</DL>
<P>
The following attributes are supported:
<DL>
<DT>
<SAMP>records</SAMP>
<DD>
This describes how records are delimited in the storage object:
<DL>
<DT><SAMP>cr</SAMP>
<DD>
Records are terminated by a carriage return.
<DT>
<SAMP>lf</SAMP>
<DD>
Records are terminated by a line feed.
<DT>
<SAMP>crlf</SAMP>
<DD>
Records are terminated by a carriage return followed by a line feed.
<DT>
<SAMP>find</SAMP>
<DD>
Records are terminated by whichever of
<SAMP>cr</SAMP>,
<SAMP>lf</SAMP>
or
<SAMP>crlf</SAMP>
is first encountered in the storage object.
<DT>
<SAMP>asis</SAMP>
<DD>
No recognition of records is performed.
</DL>
<P>
The default is <SAMP>find</SAMP> except for NDATA entities for which
the default is <SAMP>asis</SAMP>. This attribute is not applicable to
the <SAMP>literal</SAMP> storage manager.
<P>
When records are recognized in a storage object, a record start is
inserted at the beginning of each record, and a record end at the end
of each record. If there is a partial record (a record that doesn't
end with the record terminator) at the end of the entity, then a
record start will be inserted before it but no record end will be
inserted after it.
<P>
The attribute name and <SAMP>=</SAMP> can be omitted for this attribute.
<DT>
<SAMP>zapeof</SAMP>
<DD>
This specifies whether a Control-Z character that occurs as the final byte
in the storage object should be stripped.
The following values are allowed:
<DL>
<DT><SAMP>zapeof</SAMP>
<DD>
A final Control-Z should be stripped.
<DT><SAMP>nozapeof</SAMP>
<DD>
A final Control-Z should not be stripped.
</DL>
<P>
The default is <SAMP>zapeof</SAMP> except for NDATA entities, entities
declared in storage objects with <SAMP>zapeof=nozapeof</SAMP> and
storage objects with <SAMP>records=asis</SAMP>. This attribute is not
applicable to the <SAMP>literal</SAMP> storage manager.
<P>
The attribute name and <SAMP>=</SAMP> can be omitted for this
attribute.
<DT>
<A NAME="bctf"><SAMP>bctf</SAMP></A>
<DD>
The bctf (bit combination transformation format) attribute describes
how the bit combinations of the storage object are transformed into
the sequence of bytes that are contained in the object identified by
the storage object identifier. This inverse of this transformation is
performed when the entity manager reads the storage object. It has
one of the following values:
<DL>
<DT>
<SAMP>identity</SAMP>
<DD>
Each bit combination is represented by a single byte.
<DT>
<SAMP>fixed-2</SAMP>
<DD>
Each bit combination is represented by exactly 2
bytes, with the more significant byte first.
<DT>
<SAMP>utf-8</SAMP>
<DD>
Each bit combination is represented by a variable number of bytes
according to UCS Transformation Format 8 defined in Annex P to be
added by the first proposed drafted amendment (PDAM 1) to ISO/IEC
10646-1:1993.
<DT>
<SAMP>euc-jp</SAMP>
<DD>
Each bit combination is treated as a pair of bytes, most significant
byte first, encoding a character using the
Extended_UNIX_Code_Fixed_Width_for_Japanese Internet charset, and is
transformed into the variable length sequence of octets that would
encode that character using the
Extended_UNIX_Code_Packed_Format_for_Japanese Internet charset.
<DT>
<SAMP>sjis</SAMP>
<DD>
Each bit combination is treated as a pair of bytes, most significant
byte first, encoding a character using the
Extended_UNIX_Code_Fixed_Width_for_Japanese Internet charset, and is
transformed into the variable length sequence of bytes that would
encode that character using the Shift_JIS Internet charset.
<DT>
<SAMP>unicode</SAMP>
<DD>
Each bit combination is represented by 2 bytes. The bytes
representing the entire storage object may be preceded by a pair of
bytes representing the byte order mark character (0xFEFF). The bytes
representing each bit combination are in the system byte order, unless
the byte order mark character is present, in which case the order of
its bytes determines the byte order. When the storage object is read,
any byte order mark character is discarded.
<DT>
<SAMP>is8859-<VAR>n</VAR></SAMP>
<DD>
<SAMP><VAR>n</VAR></SAMP> can be any single digit other than 0. Each
bit combination is interpreted as the number of a character in ISO/IEC
10646 and is represented by the single byte that would encode that
character in ISO 8859-<VAR>n</VAR>. These values are not supported
with the <SAMP>-b</SAMP> option.
</DL>
<P>
Values other than <SAMP>identity</SAMP> are supported only with the
multi-byte version of nsgmls. This attribute is not applicable to the
<SAMP>literal</SAMP> storage manager.
<DT>
<SAMP>tracking</SAMP>
<DD>
This specifies whether line boundaries should be tracked for this
object: a value of <SAMP>track</SAMP> specifies that they should; a
value of <SAMP>notrack</SAMP> specifies that they should not. The
default value is <SAMP>track</SAMP>. Keeping track of where line
boundaries occur in a storage object requires approximately one byte
of storage per line and it may be desirable to disable this for very
large storage objects.
<P>
The attribute name and
<SAMP>=</SAMP>
can be omitted for this attribute.
<DT>
<SAMP>base</SAMP>
<DD>
When the storage object identifier specified in the content of the
storage object specification is relative, this specifies the base
storage object identifier relative to which that storage object
identifier should be resolved.
When not specified a storage object identifier is interpreted
relative to the storage object in which it is specified,
provided that this has the same storage manager.
This applies both to system identifiers specified in SGML
documents and to system identifiers specified in the catalog entry
files.
<DT>
<SAMP>smcrd</SAMP>
<DD>
The value is a single character that will be recognized in storage
object identifiers (both in the content of storage object
specifications and in the value of <SAMP>base</SAMP> attributes) as a
storage manager character reference delimiter when followed by a
digit. A storage manager character reference is like an SGML numeric
character reference except that the number is interpreted as a
character number in the inherent character set of the storage manager
rather than the document character set. The default is for no
character to be recognized as a storage manager character reference
delimiter. Numeric character references cannot be used to prevent
recognition of storage manager character reference delimiters.
<DT>
<SAMP>fold</SAMP>
<DD>
This applies only to the <SAMP>neutral</SAMP> storage manager. It
specifies whether the storage object identifier should be folded to
the customary case of the underlying storage manager if storage object
identifiers for the underlying storage manager are case sensitive.
The following values are allowed:
<DL>
<DT><SAMP>fold</SAMP>
<DD>
The storage object identifier will be folded.
<DT>
<SAMP>nofold</SAMP>
<DD>
The storage object identifier will not be folded.
</DL>
<P>
The default value is <SAMP>fold</SAMP>. The attribute name and
<SAMP>=</SAMP> can be omitted for this attribute.
<P>
For example, on Unix filenames are case-sensitive and the customary
case is lower-case. So if the underlying storage manager were
<SAMP>osfile</SAMP> and the system was a Unix system, then
<SAMP>&lt;neutral>FOO.SGM</SAMP> would be equivalent to
<SAMP>&lt;osfile>foo.sgm</SAMP>.
</DL>
<H2>Simple system identfiers</H2>
<P>
A simple system identifier is interpreted as a storage object
identifier with a storage manager that depends on where the system
identifier was specified: if it was specified in a storage object
whose storage manager was <SAMP>url</SAMP> or if the system identifier
looks like an absolute URL in a supported scheme, the storage manager
will be <SAMP>url</SAMP>; otherwise the storage manager will be
<SAMP>osfile</SAMP>. The storage manager attributes are defaulted as
for a formal system identifier. Numeric character references are not
recognized in simple system identifiers.
<P>
<ADDRESS>
James Clark<BR>
jjc@jclark.com
</ADDRESS>
</BODY>
</HTML>

View File

@@ -0,0 +1,53 @@
<!-- $XConsortium: winntu.htm /main/1 1996/09/22 18:19:32 rws $ -->
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML Strict//EN">
<HTML>
<HEAD>
<TITLE>SP Unicode support under Windows NT</TITLE>
</HEAD>
<BODY>
<H1>Notes on SP Unicode support under Windows NT</H1>
<P>
When compiled with the appropriate preprocessor definition
(<CODE>UNICODE</CODE>), SP now uses Unicode interfaces to NT. This
means that the <SAMP>SP_BCTF</SAMP> environment variable applies only
to file input and output, and so <CODE>unicode</CODE> is allowed as
the value of <SAMP>SP_BCTF</SAMP>.
<P>
In order for non-ASCII characters to be correctly displayed on your
console you must select a TrueType font, such as Lucida Console, as your
console font.
<P>
If you define your own public character sets, you should use Unicode
(or a superset of Unicode) as your universal character set.
<P>
The following additional BCTFs are supported:
<DL>
<DT>
<SAMP>windows</SAMP>
<DD>
Specify this BCTF when a storage object is encoded using your
system's default Windows character set, and your document character
set is declared as Unicode. This uses the so-called ANSI code page.
<DT>
<SAMP>wunicode</SAMP>
<DD>
This uses the <SAMP>unicode</SAMP> BCTF if the storage object starts
with a byte order mark and otherwise the <SAMP>windows</SAMP> BCTF.
If you are working with Unicode, this is probably the best value
for <SAMP>SP_BCTF</SAMP>.
<DT>
<SAMP>ms-dos</SAMP>
<DD>
Specify this BCTF when a storage object (file) uses the OEM code page,
and your document character set is declared as Unicode.
The OEM code-page for a particular
machine is the code-page used by FAT file-systems on that machine and
is the default code-page for MS-DOS consoles.
</DL>
<P>
<ADDRESS>
James Clark<BR>
jjc@jclark.com
</ADDRESS>
</BODY>
</HTML>