doc2sdl: use POSIX regex functions.
This commit is contained in:
@@ -1,151 +0,0 @@
|
||||
/* $XConsortium: README /main/2 1996/07/15 14:09:31 drk $ */
|
||||
#
|
||||
# Copyright (c) 1994
|
||||
# Open Software Foundation, Inc.
|
||||
#
|
||||
# Permission is hereby granted to use, copy, modify and freely distribute
|
||||
# the software in this file and its documentation for any purpose without
|
||||
# fee, provided that the above copyright notice appears in all copies and
|
||||
# that both the copyright notice and this permission notice appear in
|
||||
# supporting documentation. Further, provided that the name of Open
|
||||
# Software Foundation, Inc. ("OSF") not be used in advertising or
|
||||
# publicity pertaining to distribution of the software without prior
|
||||
# written permission from OSF. OSF makes no representations about the
|
||||
# suitability of this software for any purpose. It is provided "as is"
|
||||
# without express or implied warranty.
|
||||
#
|
||||
|
||||
instant - a formatting application for OSF SGML instances
|
||||
____________________________________________________________________________
|
||||
|
||||
Requirements
|
||||
|
||||
ANSI C compiler (gcc is one)
|
||||
|
||||
sgmls 1.1 -- sgml parser from James Clark. Based on Goldfarb's ARC parser.
|
||||
|
||||
Vanilla unix make
|
||||
|
||||
POSIX C libraries
|
||||
|
||||
|
||||
Files for instant program
|
||||
|
||||
Module Function
|
||||
------ --------
|
||||
browse.c interactive browser
|
||||
general.h general definitions
|
||||
info.c print information about the instances
|
||||
main.c main entry, arg parsing, instance reading
|
||||
tables.c table-specific formatting routines (TeX and tbl)
|
||||
traninit.c translator initialization (read spec, etc.)
|
||||
translate.c main translator
|
||||
translate.h structure definitions for translation code
|
||||
tranvar.c routines for handling "special variables"
|
||||
util.c general utilities
|
||||
|
||||
|
||||
Also required
|
||||
|
||||
1. Translation spec (transpec) files. (../transpecs/*.ts)
|
||||
2. SDATA mapping files for mapping sdata entities. (../transpecs/*.sdata)
|
||||
3. Character mapping files for mapping characters. (../transpecs/*.cmap)
|
||||
|
||||
|
||||
Platforms tried on
|
||||
|
||||
OSF1 1.3 (i486)
|
||||
Ultrix 4.2 (mips)
|
||||
HP-UX 9.01 (hp 9000/700)
|
||||
AIX 3.2 (rs6000)
|
||||
SunOS [missing strerror()]
|
||||
|
||||
____________________________________________________________________________
|
||||
|
||||
General outline of program
|
||||
------- ------- -- -------
|
||||
|
||||
To summarize in a sentence, instant reads the output of sgmls, builds a tree
|
||||
of the instnace in memory, then traverses the tree in in-order, processing
|
||||
the nodes according to a translation spec.
|
||||
|
||||
Element tree storage
|
||||
------- ---- -------
|
||||
|
||||
The first thing instant must do is read the ESIS (output of sgmls) from the
|
||||
specified file or stdin, forming a tree in memory. (Nothing makes sense
|
||||
without an instance...) Each element of the instance is a node in the tree,
|
||||
stored as a structure called Element_t. Elements contain content (what
|
||||
else?), which is a mixture of data (#PCDATA, #CDATA, #RCDATA - all the same
|
||||
in the ESIS), child elements, and PIs. Each 'chunk' of content is referred
|
||||
to by a Content_t structure. A Content_t contains an enum that can point to
|
||||
a string (data or PI), another Element_t. For example, if a <p> element
|
||||
contains some characters, an <emphasis> element, some more characters,
|
||||
a <function> element, then some more characters, it has 5 Content_t children
|
||||
as an array.
|
||||
|
||||
Element_t's have pointers to their parents, and a next element in a linked
|
||||
list (they're stored as a linked list, for cases when you'd want to quickly
|
||||
travers all the nodes, in no particular order).
|
||||
For convenience, Element_t's have an array of pointers to it's child
|
||||
Element_t's. These are just pointers to the same thing contained in the
|
||||
Content_t array, without the intervening data or PIs. This makes it easier
|
||||
for the program to traverse the elements of the tree (it does not have to
|
||||
be concerned with skipping data, etc.). There is an analagous array of
|
||||
pointers for the data content, an array of (char *)'s. This makes it easier
|
||||
to consider the immediate character content of an element.
|
||||
|
||||
Attributes are kept as an array of name-value mappings (using the typedef
|
||||
Mapping_t). ID attributes are also stored in a separate list of ID value -
|
||||
element pointer pairs so that it is quick to find an element by ID.
|
||||
|
||||
Other information kept about each element (in the Element_t struct) includes
|
||||
the line number in the EISI where the element occurs, the input filename.
|
||||
(These depend on sgmls being run with the "-l" option.) Also stored is
|
||||
an element's order in its parent's collection of children and an element's
|
||||
depth in the tree.
|
||||
|
||||
Translation specs
|
||||
----------- -----
|
||||
|
||||
A translation spec is read into a linked list in memory. As the instance
|
||||
tree is traversed, the transpecs are searched in order for a match. As a
|
||||
rule, one should position the less specific transpecs later in the file.
|
||||
Also, specs for seldom-used element are best placed later in the file, since
|
||||
it takes cpu cycles to skip over them for each of the more-used elements.
|
||||
|
||||
During translation of a particular element, the list of Content_t structures
|
||||
are processed in order. If a content 'chunk' is data, it is printed to
|
||||
the output stream. If it is an element, the translation routine is called
|
||||
for that elemen, recursively. Hence, in-order traversal.
|
||||
|
||||
Miscellaneous information displays
|
||||
------------- ----------- --------
|
||||
|
||||
There are several informational display options available. They include:
|
||||
- a list of element usage (-u) -- lists each element in the instance,
|
||||
it's attributes, number of children, parent, data content 'nodes'.
|
||||
- statistics about elements (-S) -- lists elements, number of times
|
||||
each is used, percent of elements that this is, total char content
|
||||
in that element, average number of characters in they element.
|
||||
- show context of each element (-x) -- lists each element and its
|
||||
context, looking up the tree (this is the same context that
|
||||
would match the Context field of a transpec).
|
||||
- show the hierarchy of the instance (-h) -- show an ascii 'tree' of
|
||||
the instance, where elements deeper in the tree are indented more.
|
||||
Numbers after the element name in parentheses are the number of
|
||||
element content nodes and the number of data content nodes.
|
||||
|
||||
Interactive browser
|
||||
----------- -------
|
||||
|
||||
Originally, the interactive browser was intended as a debugging aid for
|
||||
the code developer. It was a way to examine a particular node of the
|
||||
instance tree or process a subtree without being distracted by the rest
|
||||
of the instance. Many of the commands test functionality of the query
|
||||
and search code (such as testing whether a certain element has a given
|
||||
relationship to the current position in the tree).
|
||||
|
||||
|
||||
____________________________________________________________________________
|
||||
|
||||
Reference in New Issue
Block a user