Initial import of the CDE 2.1.30 sources from the Open Group.
This commit is contained in:
489
cde/programs/nsgmls/doc/ideas.htm
Normal file
489
cde/programs/nsgmls/doc/ideas.htm
Normal file
@@ -0,0 +1,489 @@
|
||||
<!-- $XConsortium: ideas.htm /main/1 1996/09/22 18:15:57 rws $ -->
|
||||
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML Strict//EN">
|
||||
<HTML>
|
||||
<HEAD>
|
||||
<TITLE>Ideas for improving SP</TITLE>
|
||||
</HEAD>
|
||||
<BODY>
|
||||
<H1>Ideas for improving SP</H1>
|
||||
<H2>
|
||||
Parser
|
||||
</H2>
|
||||
<P>
|
||||
Have option (fixedDocCharset) in which document charcater set cannot
|
||||
be changed by SGML declaration; declared document character set used
|
||||
for character references, and to determine which characters are
|
||||
non-SGML. Would need separate event for non-SGML character.
|
||||
In Text would need separate TextItem for non-SGML data.
|
||||
Disallow non-SGML charcters in internal entities.
|
||||
<P>
|
||||
Supporting caching across multiple runs of parser in single
|
||||
process.
|
||||
<P>
|
||||
Make Dtd copiable.
|
||||
<P>
|
||||
?Subdoc parser needs character set for system id (should be system
|
||||
character set).
|
||||
<P>
|
||||
Recover better from non-existent documents or subdocuments.
|
||||
<P>
|
||||
Think about entity declarations/references in inactive LPDs.
|
||||
<P>
|
||||
Don't allow name groups in parameter entity references in document
|
||||
type specifications in start-/end-tags.
|
||||
<P>
|
||||
With link, don't do a pass 2 unless we replace a referenced entity
|
||||
(what about default entity?).
|
||||
<P>
|
||||
Options to warn about things that HTML disallows: marked sections in
|
||||
instance, explicit subsets.
|
||||
<P>
|
||||
Option to warn about MDCs in comments in comment declarations.
|
||||
<P>
|
||||
Option to warn about omitted REFC.
|
||||
<P>
|
||||
Check that names of added functions are valid names in concrete syntax
|
||||
(both characters and lengths). Also need to do upper-case
|
||||
substitution on them?
|
||||
<P>
|
||||
Recover from nested doctype declaration intelligently.
|
||||
<P>
|
||||
Recover from missing doctype declaration intelligently.
|
||||
<P>
|
||||
Could optimize parsing of attribute literals using technique similar
|
||||
to extendData().
|
||||
<P>
|
||||
attributeValueLength error should give actual length of value.
|
||||
<P>
|
||||
Recover better from entity reference with name group in literal.
|
||||
<P>
|
||||
At start of pass 2 clear everything in pass1LPDs except entity sets.
|
||||
<P>
|
||||
Give an error if EXPLICIT > 1 and LPDs don't chain as required by
|
||||
436:5-7 and 436:18-20.
|
||||
<P>
|
||||
Handle quantity errors by reporting at the end of the prolog and the
|
||||
end of the instance any quantities that need to be increased.
|
||||
<P>
|
||||
Make noSuchReservedName error more helpful.
|
||||
<P>
|
||||
Function characters should perform their function even when markup
|
||||
recognition is suppressed. (I think I've handled this.)
|
||||
<P>
|
||||
Give a warning for notation attribute that is #CONREF.
|
||||
<P>
|
||||
Try to separate out Parser::compileModes().
|
||||
<P>
|
||||
In CompiledModelGroup have vector that gives an index for each element type
|
||||
that occurs in the model group. Then in each leaf content token have a
|
||||
vector that maps this index to a LeafContentToken *, if there
|
||||
is a simple transition (no and groups involved) to that element type.
|
||||
<P>
|
||||
MatchState::minAndDepth and MatchState::AndInfo should be separated
|
||||
off info object pointed to from MatchState; pointer would be null for
|
||||
elements with no AND groups.
|
||||
<P>
|
||||
What to do if we encounter USELINK or USEMAP declaration after DTD in
|
||||
prolog? Should stop prolog and start DTD. If we have SCOPE INSTANCE
|
||||
then if we get an unknown declaration type in prolog, don't give
|
||||
error, but unget token and start instance.
|
||||
<P>
|
||||
?Have separate version of reportNonSgml() for case where datachar is allowed.
|
||||
<P>
|
||||
Implement CONCUR.
|
||||
<P>
|
||||
AttributeDefinition constructors should have Owner<DeclaredValue> &,
|
||||
arguments to avoid storage leaks when exceptions are thrown.
|
||||
<P>
|
||||
Create a list like IList but which keeps track of length. Then
|
||||
combine tagLevel into openElement stack, and inputLevel into
|
||||
inputStack.
|
||||
<P>
|
||||
AttributeDefinition::makeValue should return
|
||||
ConstResourcePointer<AttributeValue>.
|
||||
<P>
|
||||
Syntax member functions should use reference for result.
|
||||
<P>
|
||||
Have a LocationKey data structure that can be used to determine the
|
||||
relative order of locations in possibly different concurrent
|
||||
instances. Contains: offset in document instance; is it a replacement
|
||||
of named character reference; for each entity and numeric character
|
||||
reference: location in entity and index of dtd in which instance is
|
||||
declared.
|
||||
<P>
|
||||
On systems with fixed stacks, avoid unlimited stack growth: hard
|
||||
limits on number of SUBDOCS and GRPLVL.
|
||||
<P>
|
||||
With extendData and extendS don't extend more than some fixed amount
|
||||
(eg 1024), otherwise could overrun InputSource buffer on 16-bit
|
||||
system.
|
||||
<P>
|
||||
Have a location in ElementType saying where the first mention of the
|
||||
element name was. Useful for giving warnings about undefined
|
||||
elements.
|
||||
<P>
|
||||
How to detect 310:8-10?
|
||||
<P>
|
||||
AttributeSemantics should return const pointers rather than ResourcePointer's
|
||||
<P>
|
||||
Rename Parser -> ParserImpl SgmlParser -> Parser
|
||||
Syntax::isB -> Syntax::isBlank
|
||||
<P>
|
||||
What mode should be used for parsing other prolog after document element?
|
||||
<P>
|
||||
Flag out of context data.
|
||||
<P>
|
||||
Provide mechanism to allow character names to be mapped onto universal
|
||||
character numbers.
|
||||
<P>
|
||||
Provide mechanism to allow specification of wbat characters are
|
||||
control characters (for the purposes of SHUNCHAR controls).
|
||||
<P>
|
||||
With SCOPE INSTANCE, which syntax should be used for delimiters in
|
||||
bracketed text entities?
|
||||
<P>
|
||||
Better error messages for ambiguous delimiters.
|
||||
<P>
|
||||
Do we need both EndLpd and ComplexLink/SimpleLink events?
|
||||
<P>
|
||||
What to do about 457:19-21?
|
||||
<P>
|
||||
Rename lpd_ to activeLpd_; allLpd_ to lpd_.
|
||||
<P>
|
||||
Test for validity of character numbers in syn ref charset (perhaps
|
||||
unnecessary, because bad numbers won't be translateable into doc
|
||||
charset).
|
||||
<P>
|
||||
Option to read bootstrap character set from entity.
|
||||
<P>
|
||||
In AttributeDefinitionList have a flag that is true if any checking of
|
||||
unspecified values in attribute list is needed (ie CURRENT, REQUIRED,
|
||||
non-implied ENTITY, non-implied NOTATION). In this case can avoid
|
||||
running over attributes in AttributeList::finish, by computing value
|
||||
only when user calls Attribute::value().
|
||||
<P>
|
||||
Construct link attributes from definition if no applicable link rule.
|
||||
(RAST maybe doesn't want this. Make it a separate method in LinkProcess and
|
||||
use in SgmlsEventHandler. Very useful with ArcEngine.)
|
||||
<P>
|
||||
Shouldn't have OpenElementInfo in Message. Instead use RTTI.
|
||||
<P>
|
||||
noSuchAttribute: include gi in message; if element is undefined, don't
|
||||
give error at all
|
||||
<P>
|
||||
noSuchAttributeToken: say what element or entity
|
||||
<P>
|
||||
nonExistentEntityRef should say document/link type
|
||||
<P>
|
||||
Distinguish errors that are totally recoverable.
|
||||
<P>
|
||||
Find better way to unpack entity information in entity attribute.
|
||||
|
||||
<H2>
|
||||
Entity Manager
|
||||
</H2>
|
||||
<P>
|
||||
Avoid requiring that BASE sysid exist.
|
||||
<P>
|
||||
When FSI has only a single storage manager and that is a literal,
|
||||
return an InternalInputSource.
|
||||
<P>
|
||||
Allow user of InputSource to specify what bit combinations they
|
||||
want to see for RS and RE.
|
||||
<P>
|
||||
Have environment variable SP_INPUT_BCTF that overrides SP_BCTF for
|
||||
input.
|
||||
<P>
|
||||
Avoid using numeric character references for all characters in storage
|
||||
object identifier of literal storage manager in effective system
|
||||
identifier.
|
||||
<P>
|
||||
Instead of registering coding system pass CodingSystemKit that can create
|
||||
that can create coding systems.
|
||||
<P>
|
||||
Need BCTF entry in catalog that specifies default BCTF.
|
||||
<P>
|
||||
Have catalog entry that describes internet charset as BCTF plus PUBLIC
|
||||
identifier of SGML character set; then have charset= storage attribute
|
||||
that does the translation.
|
||||
<P>
|
||||
An SOEntityCatalog should consist of a Vector<ConstPtr<EntityCatalog>
|
||||
> which can be shared between several catalogs. This would facilitate
|
||||
> caching.
|
||||
<P>
|
||||
Maybe need to be able to specify two types of catalog entry file: one
|
||||
used for all documents; one used for this document alone.
|
||||
<P>
|
||||
Allow end-tags in FSIs. Support alternative SOSs.
|
||||
<P>
|
||||
Character sets in the catalog need rethinking. Also character set of
|
||||
ParsedSystemId::Map::publicId.
|
||||
<P>
|
||||
Allow for HTTP proxy.
|
||||
<P>
|
||||
Cache catalogs.
|
||||
<P>
|
||||
Use Microsoft ActiveX (formerly Sweeper) DLL on Win95 or NT.
|
||||
<P>
|
||||
Implement DTDDECL catalog entry.
|
||||
<P>
|
||||
Support FILE URLs.
|
||||
<P>
|
||||
Perhaps don't want to do searching for catalog files (and perhaps
|
||||
command line files).
|
||||
<P>
|
||||
Provide mechanism for specifying when (if at all) base dir is searched
|
||||
relative to other dirs.
|
||||
<P>
|
||||
Provide extension to catalog format to distinguish entities declared
|
||||
in non-base DTDs. Perhaps precede entity name by document type name
|
||||
surrounded by GRPO/GRPC delimiters.
|
||||
<P>
|
||||
URLStorageManager should use a DescriptorManager shared with
|
||||
PosixStorageManager.
|
||||
<P>
|
||||
URLStorageManager::resolveRelative should delete "xxx/../" and "./"
|
||||
components. Might also be a good idea to resolve host names.
|
||||
<P>
|
||||
Implement JIS encoding system (what should be done with half-width yen
|
||||
and overbar in JIS-Roman? translate to Unicode).
|
||||
<P>
|
||||
ExternalInfoImpl::convertOffset: when the position is the character
|
||||
past the last character and the last character was a newline, line
|
||||
number should be number of lines + 1.
|
||||
<P>
|
||||
Try harder to rewind in StdioStorageObject.
|
||||
<P>
|
||||
charset= storage attribute that infers BCTF from MIME charset assuming
|
||||
10646 document character set.
|
||||
|
||||
<H2>
|
||||
Generic
|
||||
</H2>
|
||||
<P>
|
||||
Provide mechanism to access data entities using generated system id.
|
||||
<P>
|
||||
Support IMPLICIT/SIMPLE LINK.
|
||||
<P>
|
||||
Character set information.
|
||||
<P>
|
||||
Need to know space character that separates token. Alternatively
|
||||
provide broken down view of tokens.
|
||||
<P>
|
||||
Need to know IDREF (and other declared values)?
|
||||
|
||||
<H2>
|
||||
nsgmls
|
||||
</H2>
|
||||
<P>
|
||||
Problem with "\#n;" escape sequence is that it might get used other
|
||||
than in data. Probably should get rid of this feature, and give
|
||||
a warning when there's an unencodable character.
|
||||
|
||||
<H2>
|
||||
Internal
|
||||
</H2>
|
||||
<P>
|
||||
Make all macros that occur in headers begin with SP.
|
||||
<P>
|
||||
Make sure all files use #pragma i/i.
|
||||
<P>
|
||||
Get rid of assumption that Vector<T>::size_type, String<T>::size_type
|
||||
is size_t.
|
||||
<P>
|
||||
Maybe align Owner with auto_ptr.
|
||||
<P>
|
||||
Get rid of uses of string as identifier.
|
||||
<P>
|
||||
?Maybe support non-const copy constructors for NCVector/Owner.
|
||||
<P>
|
||||
Get rid of asEntityOrigin (as far as possible). Make
|
||||
InputSourceOrigin::defLocation virtual on origin. Avoid excessive use
|
||||
of asInputSourceOrigin.
|
||||
<P>
|
||||
Hash should define Hash(String<unsigned char>),
|
||||
Hash(String<unsigned short>) etc.
|
||||
<P>
|
||||
Invert sense of SP_HAVE_BOOL define.
|
||||
<P>
|
||||
Get rid of OutputCharStream::open. Instead have
|
||||
OutputCharStream::setEncoding. (Perhaps make a friend so we can use
|
||||
ostream if we're not interested in encodings.) Allow use of ostream
|
||||
instead of OutputCharStream. Change ParserToolkit::errorStream_'s coding
|
||||
system when we change the coding system.
|
||||
<P>
|
||||
Support 32-bit Char. Need to fix XcharMap and SubstTable.
|
||||
Detemplatize SubstTable. Then support UTF-16.
|
||||
<P>
|
||||
Have a common version of Ptr for things that have a virtual
|
||||
destructor.
|
||||
<P>
|
||||
Have a common version of Owner for all things that have a virtual
|
||||
destructor.
|
||||
<P>
|
||||
Inheritance in AttributeSemantics unnecesary.
|
||||
<P>
|
||||
Rename ISet -> RangeSet.
|
||||
<P>
|
||||
ISet and RangeMap should use binary search.
|
||||
<P>
|
||||
Better hash function for wide characters.
|
||||
<P>
|
||||
OutputCharStream should canonically use RS/RE and translate to system
|
||||
newline char with raw option that prevents this.
|
||||
<P>
|
||||
Avoid having Entity.h depend on ParserState, perhaps by double
|
||||
dispatching.
|
||||
<P>
|
||||
Add uses of explicit keyword.
|
||||
<P>
|
||||
When generating message.h file; if we don't have .cxx file and
|
||||
namespaces are supported, use anonymous namespace.
|
||||
|
||||
<H2>
|
||||
Application framework
|
||||
</H2>
|
||||
<P>
|
||||
Only use static programName for outOfMemory message.
|
||||
<P>
|
||||
Need to use AppChar *const * not AppChar ** in CmdLineApp.
|
||||
<P>
|
||||
When reporting message with MessageEventHandler need to be able to
|
||||
update error count.
|
||||
<P>
|
||||
Option argument names need to be internationalized.
|
||||
<P>
|
||||
Support response files for DOS.
|
||||
<P>
|
||||
Sort options in usage message.
|
||||
<P>
|
||||
StringMessageArg should be associated with a character set (in
|
||||
particular, need to distinguish parser character sets from
|
||||
StorageManager character sets).
|
||||
<P>
|
||||
Should translate StringMessageArg from document character set to
|
||||
system character set. Have MessageReporter::setDocumentCharacter
|
||||
function.
|
||||
<P>
|
||||
In MessageReporter, maybe distinguish messages coming from the parser.
|
||||
<P>
|
||||
Don't ever give a non-existent file as a location in a error message.
|
||||
<P>
|
||||
Text of messages should be able to specify that an open quote or close
|
||||
quote should be inserted at a particular point.
|
||||
<P>
|
||||
When outputting a StringMessageArg translate \r to \n.
|
||||
<P>
|
||||
Make sure wild cards work in VC++ and MS-DOS.
|
||||
|
||||
<H2>
|
||||
Win32
|
||||
</H2>
|
||||
<P>
|
||||
Compilers can typically eliminate unused templates. Reengineer Vector
|
||||
to reduce code size with such compilers.
|
||||
<P>
|
||||
Store messages in resources; requires numeric tags for messages.
|
||||
<P>
|
||||
Should automatically register all available code pages.
|
||||
<P>
|
||||
Make use of IsTextUnicode() API.
|
||||
<P>
|
||||
Have StorageManager that uses Win32 API directly. Would avoid limits
|
||||
on number of open files. Also use flag that says file is being
|
||||
accessed sequentially.
|
||||
<P>
|
||||
Allow DTDs to be compiled into binary by having storage manager that
|
||||
uses resource ids.
|
||||
|
||||
<H2>
|
||||
Architecture engine
|
||||
</H2>
|
||||
<P>
|
||||
Should give an error with -A if the specified arch does not exist.
|
||||
<P>
|
||||
Interpret APPINFO parameter, and automatically enable architectural
|
||||
processing based on this.
|
||||
<P>
|
||||
Handle derived architecture support attributes.
|
||||
<P>
|
||||
When doing architectural processing in link type, not possible to have
|
||||
notation declaration, so need some other way to specify public
|
||||
identifier for architecture.
|
||||
<P>
|
||||
Allow DOCTYPE to be declared inline (as with CONCUR or EXPLICIT LINK).
|
||||
<P>
|
||||
Grok conventional comments.
|
||||
<P>
|
||||
Make work automatically with EventHandlers that process subdoc. Make
|
||||
references to subdocs architectural.
|
||||
<P>
|
||||
Support different SGML declaration for meta-DTD.
|
||||
<P>
|
||||
Maybe should map internal sdata/cdata entities to copies in meta-DTD.
|
||||
<P>
|
||||
Perhaps when getting open element info should indicate that gis are
|
||||
architectural.
|
||||
<P>
|
||||
Think about references to SDATA entities in default values in meta-DTD.
|
||||
<P>
|
||||
Add default entity from real DTD to meta-DTD.
|
||||
<P>
|
||||
Tokenize ArcForm attribute appropriately.
|
||||
<P>
|
||||
Make special case for parsing DTD when entity can't be accessed.
|
||||
<P>
|
||||
Try to provide extension that would allow architecture elements be
|
||||
asynchronous with actual elements? This would provide CONCUR
|
||||
functionality.
|
||||
|
||||
<H2>
|
||||
sgmlnorm
|
||||
</H2>
|
||||
<P>
|
||||
Avoid bogus newline from invalid empty document.
|
||||
<P>
|
||||
Avoid always escaping >.
|
||||
<P>
|
||||
Option to say whether to use character references for 8-bit characters.
|
||||
<P>
|
||||
Option to output implied attributes.
|
||||
<P>
|
||||
Option to output all non-implied attributes.
|
||||
<P>
|
||||
Option to omit attribute name with name tokens.
|
||||
<P>
|
||||
Protect against recognition of short references.
|
||||
<P>
|
||||
Option to preserve CDATA entity references.
|
||||
<P>
|
||||
Option to output general entity declarations in DTD subset
|
||||
(but what about data attributes)?
|
||||
|
||||
<H2>
|
||||
spam
|
||||
</H2>
|
||||
<P>
|
||||
Option to normalize names.
|
||||
<P>
|
||||
Add comments round expanded entities to prevent false delimiter
|
||||
recognition.
|
||||
<P>
|
||||
Add newline at the end if last thing was omitted tag.
|
||||
<P>
|
||||
Option to warn about changes in internal entities when not expanding.
|
||||
|
||||
<H2>
|
||||
Documentation
|
||||
</H2>
|
||||
<P>
|
||||
Error message format.
|
||||
<P>
|
||||
<catalog> FSI tag.
|
||||
<P>
|
||||
<ADDRESS>
|
||||
James Clark<BR>
|
||||
jjc@jclark.com
|
||||
</ADDRESS>
|
||||
</BODY>
|
||||
</HTML>
|
||||
Reference in New Issue
Block a user