# Message from: Ian Feldman, the Current Setext Oracle # Date: Sun, 16 Aug 92 08:19:00 +0100 (CET) # Reply-To: setext-list@random.se (Keepers of The Setext Flame[tm]) # Replaces: setext_concepts_Mar92.etx # Lines: 240 # Subject: setext_concepts_Aug92.etxThank you for your interest in the setext format. Enclosed is an advance sheet that will remain in effect until the first public release of the setext format package (originally planned for around March 1st, 1992, now delayed).
If you recognize some of the arguments presented here then that is the price that you are paying for having been an early bird. ;-)) Please note that my email address may change in the near future; consult the trailer of weekly issues of TidBITS for the most current one.
As originally explained in TidBITS#100 and mentioned there from now on, that publication now comes "wrapped as a setext." The noun itself stands for both a method to wrap (format) texts according to specific layout rules and for a single structure enhanced text. The latter is a text which has been formatted in such a fashion that it contains clues as to the typographical and logical structure of its source (word-processed) document(s), if any. Those clues, which I call "typotags," facilitate later automatic detection of that structure so it can be validated and extracted/ processed/ transformed/ enhanced as needed, if needed.
It follows that setexts, being nothing but pure text (albeit with a special layout), are eminently readable using ANY editor or word processor in existence today or tommorrow, and not only on the Macintosh either. ANY computer, any computer program that is capable of opening and reading text files can be used for reading setexts. By default all properly setext-ized files will have an ".etx" or ".ETX" suffix. This stands for an "emailable/ enhanced text", the ExtraTerrestrial overtones nothwistanding ;-))
Unlike other forms of text encoding that use explicit, visible tag elements such as <this> and <\that>, the setext format relies solely on the presence of implicit typotags, carefully chosen to be as visually unobtrusive as possible. The underlined word above is one such instance of the defacto "invisible" coding. Inserted typotags will at worst appear as mere "typos" in the text.
Similarly, just to give an example, here is a short description of the four types of word emphasis typotags that setexts MAY contain, limited to one emphasis type ONLY per word or word group:
------------------- ---------------------------- --------------
**aBoldWord** **multiple bold words** ; bold-tt
_anUnderlinedWord_ _multiple underlined words_ ; underline-tt
~anItalicWord~ ; italic-tt
aHotWord_ multiple_hot_words_ ; hot-tt
-----------------------------------------------------------------
the 'hot-tt' is synonymous with the 'grouped' style of HyperCard
only single ~italic~ words are allowed for visual-clarity reasons
Please note, however, that the <end> strings previously found in
TidBITS #100-110 were not part of the format as such, but were
added by Adam Engst for a specific setext-raterrestrial purpose.
I need to state explicitly that although TidBITS is currently the only setext publication in wide distribution, the setext is NOT synonymous with that of TidBITS's layout. Many other distinctive layouts are possible. TidBITS is therefore just an instance of the format, not THE setext format. More specifically, that also means that any of you thinking of writing a "TidBITS browser" should in reality be considering a "setext browser." Otherwise your program will in all probability be able to recognize only today's specifically-formatted TidBITS and no other future setext publications (which are in the making), including that of a future possibly changed or modified TidBITS.
As can be seen from the above setext is not some quickie project, though up and finalized in a few afternoons. A lot of thought has gone into it and some of it has survived to the present day. Needless to say the format definition will be placed in the public domain and its use actively promoted by the many parties that have expressed an interest in adopting it for their own use.
While setext does, indeed, allow the preservation of a source text's structure it does not, by definition, guarantee the 100% ability to recreate it at the destination. Any word originally styled as bold may in effect end up as Yellow-On-Black or be set in a different font, or considered a candidate for a cumulative keywords list or be deemphasized at will. There are not now and never will be any rules to govern how decoded setexts should be presented at the receiving end. It will be up to each front-end's author to ensure that decoded (no-longer-)setexts are presented in a fashion that's agreeable to his/ her end users. There is plenty of sound advice and recommendations on how to achieve that but that's an entirely different matter.
Those principles also apply to decoding of a setext's logical, rather than merely its typographical, structure. The format does not rely on some large set of predefined, unambiguous, mutually- exclusive rules. Rather, it "knows of" just the barest set of typotags (1 required, 12 optional), knows their symbolic purpose and what criteria to use when looking for and validating them in a setext. This approach differs some from the commonly heard programmers' wish for clearly-delimited data patterns that could be scanned for quickly and their position used as an offset to the text to be displayed.
Setext has those patterns too but, since it relies primarily on defacto "invisible" elements that could also be part of the text itself, it must validate them first before proceeding with any enhancements. Writing a real setext decoder is therefore conceptually much closer to (though nowhere near as hard as) writing an SGML application than it is to writing a macro routine to munge some data in one predefined fashion. In spite of all that, setext tools should be easily implementable with, and no more complex than, typical HyperTalk, sed, awk and perl scripts. The barest minimum required for such an attempt is an intelligent search/ replace function in a programmable macro editor. Though yet to be proven, conceptually there is nothing in the format to prevent implementation of real-time setext browsers written in, say, some advanced pattern-matching macro language of a terminal emulator program.
Other than that I have a working prototype of a setext front-end, which has been "not far from completion" for the last half year or so (draw your own conclusions). A paging macro routine for the rn, a popular newsreader under unix, allowing forward jumps to the next topic of any TidBITS read online in comp.sys.mac.digest group has been published in TidBITS#110/09-Mar-92. On top of that there is a mailing list for developers and future setext publishers: <setext-list@random.se>. If interested, please send me a short note stating degree of your future involvement (wants to write a setext tool or 'just an observer/ future user') and your Internet- accessible email address and I will put you on the list and/ or reply as soon as possible.
If you're among those that have already written a prototype that's based mainly on a reverse-engineered layout of the current TidBITS then you'd be well advised not to release it without prior validation of it by me. Please do not call your product a "setext browser" (or whatever) UNLESS it is truly capable of parsing all (future) setextized docs, not solely the TidBITS.