This glossary provides definitions for SGML technical terms, as well as terms and concepts that we introduce as part of our methodology and techniques. For terms that have an ISO 8879 definition, we supply that definition with its clause reference (though without any notes that accompany the definition in the standard), along with additional explanation as appropriate.
The functional roles of pieces of SGML markup, for example, a “start-tag open” (STAGO). A concrete syntax maps actual character strings to the functional roles; for example, the reference concrete syntax maps an STAGO to the left angle bracket ( < ).
The ISO 8879 definition is as follows:
Rules that define how markup is added to the data of a document, without regard to the specific characters used to represent the markup. (4.1)
See Also concrete syntax.
An element that contains another element, directly or indirectly; the first is said to be an ancestor of the second.
Named set of rules for and constraints on the declaration and processing of an element or an attribute definition list, usually expressed as a markup declaration and accompanying documentation. A declaration conforming to an architectural form references it by supplying the form's name as the value of a certain attribute.
Markup that allows further description of an element. If you think of an element as a noun, you can think of an attribute is an adjective modifying a noun. Attribute information for an element is stored in its start-tag.
The ISO 8879 definition is as follows:
A characteristic quality, other than type or content. (4.9)
A label for an attribute value. Attribute information in an element's start-tag is not positionally sensitive; the attribute name helps to distinguish between values for different attributes.
A string that provides additional description for an element. An attribute's declared value determines the rules an attribute value must follow to be valid, for example, indicating that the value must be a NUMBER
(a string made only of the characters 0–9).
See Also declared value.
The ISO 8879 definition is as follows:
A member of an attribute specification list; it specifies the value of a single attribute. (4.15)
A variant of a reference DTD whose markup model has been optimized for use in authoring, editing, and modifying documents. Authoring DTDs are sometimes created to solve problems in specific software environments or to simplify the markup process.
See Also conversion DTD, interchange DTD, presentational DTD, reference DTD.
A file that maps public identifiers (primarily used in entity declarations) to objects (such as files) on a computer system, so that the contents of each object can be substituted. The format of the most commonly used catalog file was standardized by SGML Open in its Technical Resolution 9401.
An element that is directly contained by another element; the first is said to be a child of the second.
A “palette” of elements from which authors can choose freely (possibly along with data characters) in a particular context, without restriction on number or order, other than potentially requiring a single element to be supplied.
An element declaration achieves this effect by using a repeatable OR
group in its content model. If the optional-repeatable indicator is used or if the collection allows #PCDATA
, the content model can be satisfied by an absence of any content. If the required-repeatable indicator is used and the collection specifies only elements, at least one of the elements must be present to satisfy the content model.
Special markup and content that is solely for the eyes of readers of the “source” files. In a document instance, the comment is usually in its own comment declaration, surrounded with <!-- -->
characters. In a DTD, comments are sometimes interspersed throughout other markup declarations.
The ISO 8879 definition is as follows:
A portion of a markup declaration that contains explanations or remarks intended to aid persons working with the document. (4.46)
See semantic component.
A semantic component that is primarily descriptive of information content, rather than structure or presentation. For example, a “mailing address” component is content-based.
See Also presentational component, structural component.
The expression of functional roles of pieces of SGML markup in terms of character strings. For example, a “start-tag open” (STAGO) in the abstract syntax is mapped to a left angle bracket ( < ) in the reference concrete syntax.
The ISO 8879 definition is as follows:
A binding of the abstract syntax to particular delimiter characters, quantities, markup declaration names, etc. (4.48)
See Also abstract syntax.
The rules for the configuration of element and/or data content allowable in instances of an element type.
The ISO 8879 definition is as follows:
Parameter of an element declaration that specifies the model group and exceptions that define the allowed content of the element. (4.55)
The specific arrangement of document text in which a particular kind of markup or content is found (or can be found, if you are examining a markup model rather than a document instance).
In a document instance, context is usually understood to mean the list of element ancestors of a certain element. For example, the context of a “recipe instruction step” element might be represented as “recipe→instruction-list→step.” However, other factors, such as the values of particular attributes, can also be examined. Most kinds of document utilization, such as searching and formatting, involve locating material in a certain bounded context.
A markup system for which not all individual pieces of markup are allowable in all locations in a document. SGML is contextual, whereas most word-processing systems are not.
See Also noncontextual markup.
The process of changing a document's system-specific markup, usually permanently, to conform to an SGML DTD.
See Also transformation.
A variant of a reference DTD that is optimized for receiving the results of converting non-SGML document sources to SGML form. Typically, conversion DTDs relax the content models and attribute rules of the reference DTD.
See Also authoring DTD, interchange DTD, presentational DTD, reference DTD.
The ISO 8879 definition is as follows:
The characters of a document that represent the inherent information content; characters that are not recognized as markup. (4.72)
SGML distinguishes between data and markup, calling the combination of the two text.
See Also text, data-level component, data-level element.
A component or element that represents a small piece of information that needs to be processed or handled specially. A data-level element usually has a simple internal structure and would be meaningless without its surrounding context, which almost always consists of character data, usually in prose form.
See Also information unit (IU).
A markup system that describes the document content rather than describing how a computer system should process that content. Markup that effectively says, “This is a paragraph” is declarative, while markup that says, “Wrap this region of text to fit a line length of 26 picas using 10–point Times font on 11–point leading” is procedural.
See Also procedural markup.
Instructions for the content of an element type that are represented with a single keyword. The three choices of element declared content are CDATA
, RCDATA
, and EMPTY
.
The constraints imposed by the attribute definition list declaration, which any value for that attribute must follow in a document. The declared value of an attribute serves as a kind of “data type” for the value. Table A.1, “Attribute Declared Values” describes the available declared values.
An element that is contained by another element, directly or indirectly; the first is said to be a descendant of the second.
See declarative markup.
A goal arising from the overall SGML project goals, stated specifically and unambiguously, that should be used by the document type design team and the DTD implementor in their work.
The ISO 8879 definition is as follows:
A collection of information that is processed as a unit. A document is classified as being of a particular document type. (4.96)
The formal, written results of the needs analysis and document type design work performed by the document type design team. This report, along with the project documents, is the main source of information from which the DTD implementor works.
The ISO 8879 definition is as follows:
The element that is the outermost element of an instance of a document type; that is, the element whose generic identifier is the document type name. (4.99)
The overall structure of a document type; the highest levels of markup that dictate the characteristic “shape” of the documents.
See Also information pool.
The ISO 8879 definition is as follows:
Instance of a document type. (4.100)
The ISO 8879 definition of an “instance of a document type” is as follows:
The data and markup for a hierarchy of elements that conforms to a document type definition. (4.160)
See Also presentation instance.
The ISO 8879 definition is as follows:
A class of documents having similar characteristics; for example, journal, article, technical manual, or memo. (4.102)
The declaration at the top of an SGML document (after the SGML declaration, if one is present) that indicates the DTD rules to which the document instance is intended to conform.
The ISO 8879 definition is as follows:
A markup declaration that formally specifies a portion of a document type definition. (4.103)
A formal expression of the SGML-based rules that a document's markup must follow.
The ISO 8879 definition is as follows:
Rules, determined by an application, that apply SGML to the markup of documents of a particular type. (4.105)
See Also markup model.
A named collection of document content. Most such collections can contain and/or be contained in other collections.
The ISO 8879 definition is as follows:
A component of the hierarchical structure defined by a document type definition; it is identified in a document instance by descriptive markup, usually a start-tag and end-tag. (4.110)
See Also element type.
The markup declaration that specifies the rules for an element type.
The ISO 8879 definition is as follows:
A markup declaration that contains the formal specification of the part of an element type definition that deals with the content and markup minimization. (4.111)
A portion of a DTD, usually containing element declarations, that “travels together” and can be used easily in multiple DTDs. An element set is stored in its own parameter entity.
The ISO 8879 definition is as follows:
A set of element, attribute definition list, and notation declarations that are used together. (4.112)
The definition of an element; an element in the abstract sense, as opposed to any instances of that element type in an actual document. Any one element declaration, even if it specifies multiple generic identifiers, defines a single element type.
The ISO 8879 definition is as follows:
A class of elements having similar characteristics; for example, paragraph, chapter, abstract, footnote, or bibliography. (4.114)
See Also element.
A graphically based description of the desired markup model for part or all of a document type being designed, or a similar description of the model for an existing DTD, using the notation explained in Appendix B, Tree Diagram Reference. “Elm” is an acronym for “enables lucid models.”
The ISO 8879 definition is as follows:
Descriptive markup that identifies the end of an element. (4.119)
A named fragment of document content that is stored separately from other fragments and that can be included in a document one or more times by reference to its name.
The ISO 8879 definition is as follows:
A collection of characters that can be referenced as a unit. (4.120)
Markup that indicates a location in a document where the content of an entity should be included.
The ISO 8879 definition is as follows:
A reference that is replaced by an entity. (4.124)
A DTD whose markup model has been modifed from that of an original (usually standard) DTD, such that some or all instances conforming to the modified one can potentially be invalid according to the original one.
See Also renamed DTD, subsetted DTD.
The ISO 8879 definition is as follows:
A parameter that identifies an external entity or data content notation. (4.135)
The ISO 8879 definition is as follows:
A name that identifies the element type of an element. (4.145)
A markup system that is not specific to a single vendor, document producer, or computer hardware or software configuration.
See Also system-specific markup.
Arranged by means of successive levels of containment, where “lower” (or “inner”) units are nested entirely within “higher” (or “outer”) ones. Elements in an SGML document are arranged hierarchically.
The Hypermedia/Time-based Structuring Language; ISO standard 10744. HyTime is a language, defined largely by means of architectural forms, for representing hypertext links and the scheduling and synchronization of events. To use HyTime-based processing applications, you map the relevant markup in your DTD to the architectural forms specified in the HyTime standard, following the constraints set forth by the forms.
See Also architectural form.
The body of markup available to authors in the contexts where they supply the “main content” of a document. These contexts typically offer great discretion in choosing and applying markup. The information pool is a kind of “supercollection” encompassing all the information units and data-level elements.
See Also document hierarchy.
A high-level component or element that can, to some degree, “stand alone” in order to be understood by a reader, such that it must “travel together” during information processing and assembly. An information unit typically has a complex internal structure. However, the most common information unit, the paragraph, often has a very simple content model.
See Also data-level component, data-level element.
See document instance.
A DTD that has been agreed on as the standard form for document interchange by the senders and recipients of SGML documents. For example, DocBook Version 2.2.1 was the interchange DTD agreed on by the authors and publisher of this book. Reference DTDs often must use an interchange DTD as their design base.
See Also authoring DTD, conversion DTD, presentational DTD, reference DTD.
The portion of a DTD's markup declarations that are provided directly inside the document type declaration, between square brackets.
ISO describes itself and explains its work as follows:
ISO (the International Organization for Standarization) is a worldwide federation of national standards bodies (ISO member bodies). The work of preparing International Standards is normally carried out through ISO technical committees. Each member body interested in a subject for which a technical commitee has been established has the right to be represented on that committee. International organizations, governmental and non-governmental, in liaison with ISO, also take part in the work.
SGML was created under the auspices of ISO/IEC JTC1/SC18/WG8—Working Group 8 of Subcommittee 18 of Joint Technical Committee 1 of the combined effort of ISO and the International Electrotechnical Commission.
See Also SGML (Standard Generalized Markup Language) .
A data-level component or element that is highly content-based and specifically related to the information domain of the document type under discussion. For example, in software documentation, a “command name” would be key data.
A component that records the relationship of two or more pieces of information. Two common kinds of links are those that join document content to locations where that content should be reproduced, and those that constitute a suggestion to the reader to seek out additional information.
The ISO 8879 definition of markup is as follows:
Text that is added to the data of a document in order to convey information about it. (4.183)
To mark up data is to add markup to it.
A “statement” in the SGML language that defines a portion of a markup model or other markup characteristics of a document. Most markup declarations appear in DTDs, but a few (such as comment declarations) can appear in document instances.
The ISO 8879 definition is as follows:
Markup that controls how other markup of a document is to be interpreted. (4.186)
The markup “vocabulary” and “grammar” defined by a DTD (or some part of a DTD), which serve as the rules of the language “spoken” by documents conforming to that DTD. Many people simply use the term “DTD” for this concept, but we use a unique term for it because of the need to distinguish between the actual markup characteristics defined in a DTD and the various implementation techniques used to make the design readable, maintainable, and so on.
Information about information; facts about a document (or smaller piece of information) as a body of information. For example, a document's publication date is metainformation.
A language that is used to create or define other languages. SGML is a metalanguage used to define DTDs that specify markup models; these models function as unique document markup “languages.”
The act of designing markup requirements in a way that makes the results suitable for expression in SGML markup declarations.
A markup system that places no formal restrictions on the appearance or order of the individual pieces of markup. Most word-processing systems are noncontextual, whereas SGML is contextual.
See Also contextual markup.
An element that directly contains another element; the first is said to be an ancestor of the second.
The ISO 8879 definition of “SGML parser” is as follows:
A program (or portion of a program or a combination of programs) that recognizes markup in SGML documents. (4.285)
An oval containing an element collection or any-order group. Also, an herb of the nightshade family that is widely cultivated as a vegetable crop.
One form of an SGML document as presented to a user, possibly with some content changed, added, or removed compared to other presentations.
See Also document instance.
A semantic component that is primarily descriptive of information appearance, rather than structure or meaning. For example, a “bold font” component is presentational.
See Also content-based component, structural component.
A variant of a reference DTD that is optimized to assist the process of transforming SGML documents into presented or otherwise processed form. Typically, presentation DTDs allow for the “augmenting” of the original document to contain generated material such as tables of contents and to contain formatting-related information.
See Also authoring DTD, conversion DTD, interchange DTD, reference DTD.
See design principle.
A markup system that describes how a computer system should process the document content rather than describing what the content means. Markup that effectively says, “Wrap this region of text to fit a line length of 26 picas using 10–point Times font on 11–point leading” is procedural, while markup that says, “This is a paragraph” is declarative.
See Also declarative markup.
The assumptions about markup that constrain and inform its use in document authoring, management, and processing. For example, the processing expectations about a cross-reference element might include the requirement that it be replaced with generated text when it is formatted for printing. Some people use the term “semantics” or “processing semantics” for this meaning—which accounts for our use of the terms semantic component and semantic extension—but as a noun, semantics is too ambiguous for our taste.
See record end (RE).
An invisible character that occurs at the end of units of stored data that are known as records or, sometimes, “lines.”
The ISO 8879 definition is as follows:
A function character, assigned by the concrete syntax, that represents the end of a record. (4.254)
The default concrete syntax for SGML documents, and the one used in SGML declarations.
The ISO 8879 definition is as follows:
A concrete syntax, defined in this International Standard, that is used in all SGML declarations. (4.258)
A DTD that encodes the “ideal” markup model for complete documents of a specified type. A reference DTD may be based on (that is, a variant of) an interchange DTD, but otherwise it typically provides the design base for the other variant DTDs, such as an authoring DTD.
See Also authoring DTD, conversion DTD, interchange DTD, presentational DTD.
A DTD that is identical to another DTD, except that some or all of the element names and other markup names have been changed to be more suitable for use with authors who use a different jargon or write in a different language.
See Also extended DTD, subsetted DTD.
A unit of specification representing a requirement for the design of a document type model, which corresponds to a kind of information that must be distinguished from all others. A semantic component often results in the DTD having a new element type, but can also result in other kinds of markup distinctions.
See Also processing expectations.
A technique for markup model design that allows a DTD's markup to be used for making novel distinctions among kinds of information, even if the markup didn't previously recognize the distinction. The technique is useful for DTDs that cannot be updated frequently enough to satisfy new requirements at the rate at which they are created.
The ISO 8879 definition of Standard Generalized Markup Language is as follows:
A language for document representation that formalizes markup and frees it of system and processing dependencies. (4.305)
SGML was published in 1986 as ISO standard 8879. Amendment 1 to the standard was published in 1988.
See document.
An element that occurs at the same level as another element that has the same parent; the two are said to be siblings of each other.
The ISO 8879 definition is as follows:
Descriptive markup that identifies the start of an element and specifies its generic identifier and attributes. (4.306)
A semantic component that is primarily descriptive of information structure, rather than meaning or appearance. For example, a “list” component is structural.
See Also content-based component, presentational component.
A DTD whose markup model has been modifed from that of an original (usually standard) DTD, such that all instances conforming to the modified one are still valid according to the original one. Note that a subsetted DTD is unrelated to a DTD internal subset, which is a the portion of a DTD that is “local” to a document by virtue of being supplied inside the DOCTYPE
declaration's square brackets ( [ ] ).
See Also extended DTD, renamed DTD.
A markup system that is specific to a single vendor, document producer, or computer hardware or software configuration.
See Also generic markup.
The ISO 8879 definition is as follows:
Descriptive markup. (4.314)
A condition that afflicts authors who choose inappropriate markup to get a certain formatting effect or choose markup that isn't as precise or accurate as possible. A poor DTD design often exacerbates the problem.
Data and markup making up a document.
The ISO 8879 definition is as follows:
Characters. (4.316)
Where we use the term for its colloquial meaning—the main content in the flow of a document, exclusive of the document hierarchy—we use quotation marks around it.
See Also data.
The process of changing the data and markup within SGML documents to make them conform to a different DTD or to another kind of markup, typically one that can be directly interpreted by printers, display devices, or further transformation software.
See Also conversion.
See elm tree diagram.
The ISO 8879 definition is as follows:
A conforming SGML parser that can find and report a reportable markup error if (and only if) one exists. (4.329)
See attribute value.
A DTD whose design is based closely on the markup model of another DTD.
See Also reference DTD.
“What you see is what you get.” A description of word processing systems and desktop publishing systems that let authors see a representation of the formatted appearance of a document on the computer screen as they work. Most such systems also allow authors to customize the formatted appearance by manipulating the screen display dynamically.