662 lines
23 KiB
Plaintext
662 lines
23 KiB
Plaintext
# $XConsortium: sgmls.txt /main/2 1996/11/11 11:24:56 drk $
|
|
|
|
|
|
|
|
SGMLS(1) SGMLS(1)
|
|
|
|
|
|
NAME
|
|
sgmls - a validating SGML parser
|
|
|
|
An SGML System Conforming to
|
|
International Standard ISO 8879 --
|
|
Standard Generalized Markup Language
|
|
|
|
SYNOPSIS
|
|
sgmls [ -deglprsuv ] [ -cfile ] [ -iname ] [ filename...
|
|
]
|
|
|
|
DESCRIPTION
|
|
Sgmls parses and validates the SGML document entity in
|
|
filename... and prints on the standard output a simple
|
|
ASCII representation of its Element Structure Information
|
|
Set. (This is the information set which a structure-
|
|
controlled conforming SGML application should act upon.)
|
|
Note that the document entity may be spread amongst sev-
|
|
eral files; for example, the SGML declaration, document
|
|
type declaration and document instance set could each be
|
|
in a separate file. If no filenames are specified, then
|
|
sgmls will read the document entity from the standard
|
|
input. A filename of - can also be used to refer to the
|
|
standard input.
|
|
|
|
The following options are available:
|
|
|
|
-cfile Write a report of capacity usage to file. The
|
|
report is in the format of a RACT result. RACT is
|
|
the Reference Application for Capacity Testing
|
|
defined in the Proposed American National Standard
|
|
Conformance Testing for Standard Generalized Markup
|
|
Language (SGL) Systems (X3.190-199X), Draft July
|
|
1991.
|
|
|
|
-d Warn about duplicate entity declarations.
|
|
|
|
-e Describe open entities in error messages. Error
|
|
messages always include the position of the most
|
|
recently opened external entity.
|
|
|
|
-g Show the GIs of open elements in error messages.
|
|
|
|
-iname Pretend that
|
|
|
|
<!ENTITY % name "INCLUDE">
|
|
|
|
occurs at the start of the document type declara-
|
|
tion subset in the SGML document entity. Since
|
|
repeated definitions of an entity are ignored, this
|
|
definition will take precedence over any other def-
|
|
initions of this entity in the document type decla-
|
|
ration. Multiple -i options are allowed. If the
|
|
SGML declaration replaces the reserved name INCLUDE
|
|
|
|
|
|
|
|
1
|
|
|
|
|
|
|
|
|
|
|
|
SGMLS(1) SGMLS(1)
|
|
|
|
|
|
then the new reserved name will be the replacement
|
|
text of the entity. Typically the document type
|
|
declaration will contain
|
|
|
|
<!ENTITY % name "IGNORE">
|
|
|
|
and will use %name; in the status keyword specifi-
|
|
cation of a marked section declaration. In this
|
|
case the effect of the option will be to cause the
|
|
marked section not to be ignored.
|
|
|
|
-l Output L commands giving the current line number
|
|
and filename.
|
|
|
|
-p Parse only the prolog. Sgmls will exit after pars-
|
|
ing the document type declaration. Implies -s.
|
|
|
|
-r Warn about defaulted references.
|
|
|
|
-s Suppress output. Error messages will still be
|
|
printed.
|
|
|
|
-u Warn about undefined elements: elements used in the
|
|
DTD but not defined. Also warn about undefined
|
|
short reference maps.
|
|
|
|
-v Print the version number.
|
|
|
|
Entity Manager
|
|
An external entity resides in one or more files. The
|
|
entity manager component of sgmls maps a sequence of files
|
|
into an entity in three sequential stages:
|
|
|
|
1. each carriage return character is turned into a
|
|
non-SGML character;
|
|
|
|
2. each newline character is turned into a record end
|
|
character, and at the same time a record start
|
|
character is inserted at the beginning of each
|
|
line;
|
|
|
|
3. the files are concatenated.
|
|
|
|
A system identifier is interpreted as a list of filenames
|
|
separated by colons. A filename of - can be used to refer
|
|
to the standard input. If no system identifier is sup-
|
|
plied, then the entity manager will attempt to generate a
|
|
filename using the public identifier (if there is one) and
|
|
other information available to it. Notation identifiers
|
|
are not subject to this treatment. This process is con-
|
|
trolled by the environment variable SGML_PATH; this con-
|
|
tains a colon-separated list of filename templates. A
|
|
filename template is a filename that may contain substitu-
|
|
tion fields; a substitution field is a % character
|
|
|
|
|
|
|
|
2
|
|
|
|
|
|
|
|
|
|
|
|
SGMLS(1) SGMLS(1)
|
|
|
|
|
|
followed by a single letter that indicates the value of
|
|
the substitution. If SGML_PATH uses the %S field (the
|
|
value of which is the system identifier), then the entity
|
|
manager will also use SGML_PATH to generate a filename
|
|
when a system identifier that does not contain any colons
|
|
is supplied. The value of a substitution can either be a
|
|
string or it can be null. The entity manager transforms
|
|
the list of filename templates into a list of filenames by
|
|
substituting for each substitution field and discarding
|
|
any template that contained a substitution field whose
|
|
value was null. It then uses the first resulting filename
|
|
that exists and is readable. Substitution values are
|
|
transformed before being used for substitution: firstly,
|
|
any names that were subject to upper case substitution are
|
|
folded to lower case; secondly, space characters are
|
|
mapped to underscores and slashes are mapped to percents.
|
|
The value of the %S field is not transformed. The values
|
|
of substitution fields are as follows:
|
|
|
|
%% A single %.
|
|
|
|
%D The entity's data content notation. This substitu-
|
|
tion will succeed only for external data entities.
|
|
|
|
%N The entity, notation or document type name.
|
|
|
|
%P The public identifier if there was a public identi-
|
|
fier, otherwise null.
|
|
|
|
%S The system identifier if there was a system identi-
|
|
fier otherwise null.
|
|
|
|
%X (This is provided mainly for compatibility with
|
|
ARCSGML.) A three-letter string chosen as follows:
|
|
| |
|
|
| | With public identifier
|
|
| +-------------+-----------
|
|
| No public | Device | Device
|
|
| identifier | independent | dependent
|
|
---------------------------+------------+-------------+-----------
|
|
Data or subdocument entity | nsd | pns | vns
|
|
General SGML text entity | gml | pge | vge
|
|
Parameter entity | spe | ppe | vpe
|
|
Document type definition | dtd | pdt | vdt
|
|
Link process definition | lpd | plp | vlp
|
|
|
|
The device dependent version is selected if the
|
|
public text class allows a public text display ver-
|
|
sion but no public text display version was speci-
|
|
fied.
|
|
|
|
%Y The type of thing for which the filename is being
|
|
generated:
|
|
|
|
|
|
|
|
|
|
3
|
|
|
|
|
|
|
|
|
|
|
|
SGMLS(1) SGMLS(1)
|
|
|
|
|
|
SGML subdocument entity sgml
|
|
Data entity data
|
|
General text entity text
|
|
Parameter entity parm
|
|
Document type definition dtd
|
|
Link process definition lpd
|
|
|
|
The value of the following substitution fields will be
|
|
null unless a valid formal public identifier was supplied.
|
|
|
|
%A Null if the text identifier in the formal public
|
|
identifier contains an unavailable text indicator,
|
|
otherwise the empty string.
|
|
|
|
%C The public text class, mapped to lower case.
|
|
|
|
%E The public text designating sequence (escape
|
|
sequence) if the public text class is CHARSET, oth-
|
|
erwise null.
|
|
|
|
%I The empty string if the owner identifier in the
|
|
formal public identifier is an ISO owner identi-
|
|
fier, otherwise null.
|
|
|
|
%L The public text language, mapped to lower case,
|
|
unless the public text class is CHARSET, in which
|
|
case null.
|
|
|
|
%O The owner identifier (with the +// or -// prefix
|
|
stripped.)
|
|
|
|
%R The empty string if the owner identifier in the
|
|
formal public identifier is a registered owner
|
|
identifier, otherwise null.
|
|
|
|
%T The public text description.
|
|
|
|
%U The empty string if the owner identifier in the
|
|
formal public identifier is an unregistered owner
|
|
identifier, otherwise null.
|
|
|
|
%V The public text display version. This substitution
|
|
will be null if the public text class does not
|
|
allow a display version or if no version was speci-
|
|
fied. If an empty version was specified, a value
|
|
of default will be used.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
4
|
|
|
|
|
|
|
|
|
|
|
|
SGMLS(1) SGMLS(1)
|
|
|
|
|
|
System declaration
|
|
The system declaration for sgmls is as follows:
|
|
|
|
SYSTEM "ISO 8879:1986"
|
|
CHARSET
|
|
BASESET "ISO 646-1983//CHARSET
|
|
International Reference Version (IRV)//ESC 2/5 4/0"
|
|
DESCSET 0 128 0
|
|
CAPACITY PUBLIC "ISO 8879:1986//CAPACITY Reference//EN"
|
|
FEATURES
|
|
MINIMIZE DATATAG NO OMITTAG YES RANK NO SHORTTAG YES
|
|
LINK SIMPLE NO IMPLICIT NO EXPLICIT NO
|
|
OTHER CONCUR NO SUBDOC YES 1 FORMAL YES
|
|
SCOPE DOCUMENT
|
|
SYNTAX PUBLIC "ISO 8879:1986//SYNTAX Reference//EN"
|
|
SYNTAX PUBLIC "ISO 8879:1986//SYNTAX Core//EN"
|
|
VALIDATE
|
|
GENERAL YES MODEL YES EXCLUDE YES CAPACITY YES
|
|
NONSGML YES SGML YES FORMAL YES
|
|
SDIF
|
|
PACK NO UNPACK NO
|
|
|
|
The memory usage of sgmls is not a function of the capac-
|
|
ity points used by a document; however, sgmls can handle
|
|
capacities significantly greater than the reference capac-
|
|
ity set.
|
|
|
|
In some environments, higher values may be supported for
|
|
the SUBDOC parameter.
|
|
|
|
Documents that do not use optional features are also sup-
|
|
ported. For example, if FORMAL NO is specified in the
|
|
SGML declaration, public identifiers will not be required
|
|
to be valid formal public identifiers.
|
|
|
|
Certain parts of the concrete syntax may be changed:
|
|
|
|
The shunned character numbers can be changed.
|
|
|
|
Eight bit characters can be assigned to LCNMSTRT,
|
|
UCNMSTRT, LCNMCHAR and UCNMCHAR. Declaring this
|
|
requires that the syntax reference character set be
|
|
declared like this:
|
|
BASESET "ISO Registration Number 100//CHARSET
|
|
ECMA-94 Right Part of Latin Alphabet Nr. 1//ESC 2/13 4/1"
|
|
DESCSET 0 256 0
|
|
|
|
Uppercase substitution can be performed or not per-
|
|
formed both for entity names and for other names.
|
|
|
|
Either short reference delimiters assigned by the
|
|
reference delimiter set or no short reference
|
|
delimiters are supported.
|
|
|
|
|
|
|
|
|
|
5
|
|
|
|
|
|
|
|
|
|
|
|
SGMLS(1) SGMLS(1)
|
|
|
|
|
|
The reserved names can be changed.
|
|
|
|
The quantity set can be increased within certain
|
|
limits subject to there being sufficient memory
|
|
available. The upper limit on NAMELEN is 239. The
|
|
upper limits on ATTCNT, ATTSPLEN, BSEQLEN, ENTLVL,
|
|
LITLEN, PILEN, TAGLEN, and TAGLVL are more than
|
|
thirty times greater than the reference limits.
|
|
The upper limit on GRPCNT, GRPGTCNT, and GRPLVL is
|
|
253. NORMSEP cannot be changed. DTAGLEN are
|
|
DTEMPLEN irrelevant since sgmls does not support
|
|
the DATATAG feature.
|
|
|
|
SGML declaration
|
|
The SGML declaration may be omitted, the following decla-
|
|
ration will be implied:
|
|
<!SGML "ISO 8879:1986"
|
|
CHARSET
|
|
BASESET "ISO 646-1983//CHARSET
|
|
International Reference Version (IRV)//ESC 2/5 4/0"
|
|
DESCSET 0 9 UNUSED
|
|
9 2 9
|
|
11 2 UNUSED
|
|
13 1 13
|
|
14 18 UNUSED
|
|
32 95 32
|
|
127 1 UNUSED
|
|
CAPACITY PUBLIC "ISO 8879:1986//CAPACITY Reference//EN"
|
|
SCOPE DOCUMENT
|
|
SYNTAX PUBLIC "ISO 8879:1986//SYNTAX Reference//EN"
|
|
FEATURES
|
|
MINIMIZE DATATAG NO OMITTAG YES RANK NO SHORTTAG YES
|
|
LINK SIMPLE NO IMPLICIT NO EXPLICIT NO
|
|
OTHER CONCUR NO SUBDOC YES 99999999 FORMAL YES
|
|
APPINFO NONE>
|
|
with the exception that characters 128 through 254 will be
|
|
assigned to DATACHAR. When exporting documents that use
|
|
characters in this range, an accurate description of the
|
|
upper half of the document character set should be added
|
|
to this declaration. For ISO Latin-1, an appropriate
|
|
description would be:
|
|
BASESET "ISO Registration Number 100//CHARSET
|
|
ECMA-94 Right Part of Latin Alphabet Nr. 1//ESC 2/13 4/1"
|
|
DESCSET 128 32 UNUSED
|
|
160 95 32
|
|
255 1 UNUSED
|
|
|
|
Output format
|
|
The output is a series of lines. Lines can be arbitrarily
|
|
long. Each line consists of an initial command character
|
|
and one or more arguments. Arguments are separated by a
|
|
single space, but when a command takes a fixed number of
|
|
arguments the last argument can contain spaces. There is
|
|
no space between the command character and the first
|
|
|
|
|
|
|
|
6
|
|
|
|
|
|
|
|
|
|
|
|
SGMLS(1) SGMLS(1)
|
|
|
|
|
|
argument. Arguments can contain the following escape
|
|
sequences.
|
|
|
|
\\ A \.
|
|
|
|
\n A record end character.
|
|
|
|
\| Internal SDATA entities are bracketed by these.
|
|
|
|
\nnn The character whose code is nnn octal.
|
|
|
|
A record start character will be represented by \012.
|
|
Most applications will need to ignore \012 and translate
|
|
\n into newline.
|
|
|
|
The possible command characters and arguments are as fol-
|
|
lows:
|
|
|
|
(gi The start of an element whose generic identifier is
|
|
gi. Any attributes for this element will have been
|
|
specified with A commands.
|
|
|
|
)gi The end an element whose generic identifier is gi.
|
|
|
|
-data Data.
|
|
|
|
&name A reference to an external data entity name; name
|
|
will have been defined using an E command.
|
|
|
|
?pi A processing instruction with data pi.
|
|
|
|
Aname val
|
|
The next element to start has an attribute name
|
|
with value val which takes one of the following
|
|
forms:
|
|
|
|
IMPLIED
|
|
The value of the attribute is implied.
|
|
|
|
CDATA data
|
|
The attribute is character data. This is
|
|
used for attributes whose declared value is
|
|
CDATA.
|
|
|
|
NOTATION nname
|
|
The attribute is a notation name; nname will
|
|
have been defined using a N command. This
|
|
is used for attributes whose declared value
|
|
is NOTATION.
|
|
|
|
ENTITY name...
|
|
The attribute is a list of general entity
|
|
names. Each entity name will have been
|
|
defined using an I, E or S command. This is
|
|
|
|
|
|
|
|
7
|
|
|
|
|
|
|
|
|
|
|
|
SGMLS(1) SGMLS(1)
|
|
|
|
|
|
used for attributes whose declared value is
|
|
ENTITY or ENTITIES.
|
|
|
|
TOKEN token...
|
|
The attribute is a list of tokens. This is
|
|
used for attributes whose declared value is
|
|
anything else.
|
|
|
|
Dename name val
|
|
This is the same as the A command, except that it
|
|
specifies a data attribute for an external entity
|
|
named ename. Any D commands will come after the E
|
|
command that defines the entity to which they
|
|
apply, but before any & or A commands that refer-
|
|
ence the entity.
|
|
|
|
Nnname nname. Define a notation This command will be pre-
|
|
ceded by a p command if the notation was declared
|
|
with a public identifier, and by a s command if the
|
|
notation was declared with a system identifier. A
|
|
notation will only be defined if it is to be refer-
|
|
enced in an E command or in an A command for an
|
|
attribute with a declared value of NOTATION.
|
|
|
|
Eename typ nname
|
|
Define an external data entity named ename with
|
|
type typ (CDATA, NDATA or SDATA) and notation not.
|
|
This command will be preceded by one or more f com-
|
|
mands giving the filenames generated by the entity
|
|
manager from the system and public identifiers, by
|
|
a p command if a public identifier was declared for
|
|
the entity, and by a s command if a system identi-
|
|
fier was declared for the entity. not will have
|
|
been defined using a N command. Data attributes
|
|
may be specified for the entity using D commands.
|
|
An external data entity will only be defined if it
|
|
is to be referenced in a & command or in an A com-
|
|
mand for an attribute whose declared value is
|
|
ENTITY or ENTITIES.
|
|
|
|
Iename typ text
|
|
Define an internal data entity named ename with
|
|
type typ (CDATA or SDATA) and entity text text. An
|
|
internal data entity will only be defined if it is
|
|
referenced in an A command for an attribute whose
|
|
declared value is ENTITY or ENTITIES.
|
|
|
|
Sename Define a subdocument entity named ename. This com-
|
|
mand will be preceded by one or more f commands
|
|
giving the filenames generated by the entity man-
|
|
ager from the system and public identifiers, by a p
|
|
command if a public identifier was declared for the
|
|
entity, and by a s command if a system identifier
|
|
was declared for the entity. A subdocument entity
|
|
|
|
|
|
|
|
8
|
|
|
|
|
|
|
|
|
|
|
|
SGMLS(1) SGMLS(1)
|
|
|
|
|
|
will only be defined if it is referenced in a {
|
|
command or in an A command for an attribute whose
|
|
declared value is ENTITY or ENTITIES.
|
|
|
|
ssysid This command applies to the next E, S or N command
|
|
and specifies the associated system identifier.
|
|
|
|
ppubid This command applies to the next E, S or N command
|
|
and specifies the associated public identifier.
|
|
|
|
ffilename
|
|
This command applies to the next E or S command and
|
|
specifies an associated filename. There will be
|
|
more than one f command for a single E or S command
|
|
if the system identifier used a colon.
|
|
|
|
{ename The start of the SGML subdocument entity ename;
|
|
ename will have been defined using a S command.
|
|
|
|
}ename The end of the SGML subdocument entity ename.
|
|
|
|
Llineno file
|
|
Llineno
|
|
Set the current line number and filename. The
|
|
filename argument will be omitted if only the line
|
|
number has changed. This will be output only if
|
|
the -l option has been given.
|
|
|
|
#text An APPINFO parameter of text was specified in the
|
|
SGML declaration. This is not strictly part of the
|
|
ESIS, but a structure-controlled application is
|
|
permitted to act on it. No # command will be out-
|
|
put if APPINFO NONE was specified. A # command
|
|
will occur at most once, and may be preceded only
|
|
by a single L command.
|
|
|
|
C This command indicates that the document was a con-
|
|
forming SGML document. If this command is output,
|
|
it will be the last command. An SGML document is
|
|
not conforming if it references a subdocument
|
|
entity that is not conforming.
|
|
|
|
BUGS
|
|
Some non-SGML characters in literals are counted as two
|
|
characters for the purposes of quantity and capacity cal-
|
|
culations.
|
|
|
|
SEE ALSO
|
|
The SGML Handbook, Charles F. Goldfarb
|
|
ISO 8879 (Standard Generalized Markup Language), Interna-
|
|
tional Organization for Standardization
|
|
|
|
ORIGIN
|
|
ARCSGML was written by Charles F. Goldfarb.
|
|
|
|
|
|
|
|
9
|
|
|
|
|
|
|
|
|
|
|
|
SGMLS(1) SGMLS(1)
|
|
|
|
|
|
Sgmls was derived from ARCSGML by James Clark
|
|
(jjc@jclark.com), to whom bugs should be reported.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
10
|
|
|
|
|