Plain Text

One of the themes you will find in this blog is the use of plain text to create notes, articles, documents, and other information files. Plain text is simple, portable, and durable. 

Almost every application has the capability to import and use plain text. The publisher of an application that uses a proprietary format can change that format at any time. Let's look at the example of Microsoft Word. For the 2007 version of Word, Microsoft changed the format of the created documents to one based on XML. The change is an improvement, but older versions of Word cannot open the new format without additional software. In other words, the new format is not forward compatible. Right now, the additional software to work around the incompatibilities is readily available. But consider the situation 10 years from now. As this article from Macworld magazine says, sometimes you can't even open your older documents. You never have to worry about that situation with plain text. Or the situation of changing to different software. As noted, almost all software can import plain text so it has ultimate portability.

If that's the case, why doesn't all software simply use plain text as its format? Because it's "plain." No formatting, no bold, no italics, no spacing control other than a blank line and spaces or tabs. The plain in plain text results from the fact that the foundation of it is ASCII encoding that included only the upper and lower case English alphabet, numbers 0 - 9, basic punctuation, and a few control characters. Encoding is a code relating the alphanumeric characters to numbers ('cause computers don't speak English). Encoding in ASCII is a basic foundation of plain text. For a full definition, read the Linux Information Project's Plain Text Definition.

The ASCII encoding is one of the earliest, having been established in 1963. Because ASCII is limited to the English alphabet, more recent encodings have been established to encompass more languages. The current trend is toward Unicode encoding, which has the ambitious goal of encoding every character in every language in the world. Unfortunately, there are several versions of Unicode (e.g., UTF-8, UTF-16) as well as other encodings (e.g., ISO-8559, Windows 1252, and many more). Fortunately, the most frequently used ones are a superset of ASCII. In other words, even the most modern encodings, now fifty years from when ASCII was established, can still display ASCII files correctly. Thus, from a durability point of view, it's a good bet that files created in plain text today will be accessible over a lifetime at least.

That does not mean that there might not be some issues. For example, web pages are plain text, with markup used to control the display of the pages: fonts, colors, etc. Almost all of us have seen occasions when a web page displays a few characters that do not make any sense. For example, quotation marks may appear as gibberish or a question mark rather than an actual quotation mark. This is almost always the result of the browser using a certain encoding to display the page when the text is actually encoded using a different encoding. Web pages are supposed to inform the browser what encoding is used. Many, however, don't and the browser has to guess. If it guesses wrong, the user suffers by having to figure out what those gibberish characters are.

You can create your notes and documents in plain text and still display or print them with formatting such as bold, italics, title size, etc. There are many ways to do this. I will get into some detail about some of these ways in future posts. But in general, the plain text file would incorporate markup to instruct a formatting program, such as a web browser or Pandoc or Rich Text Format (.rtf), to produce the formatted output.

The bottom line is that if you predominately use the English alphabet for your documents, keeping to plain text will ensure that you can access and use them in the future.

Comments

Popular posts from this blog

What is Everyday Computer User?

PC or Mac?