Weblog:   It's the Data, Stupid
Subject:   A hierarchy of textual formats?
Date:   2004-05-05 14:04:37
From:   Paddy3118
I find that their is a hierarchy to data stored as human readable text. From a series of notes in a file made without regard to prgrammatic filtering/Searching - when I try to keep spellings consistant and may leave text marker strings around to help when searching in vi.
- through more structured text - tabular data that is easily read by awk (and so by most other scripting languages)
- and on to text with yet more structure where i will write it so that it could be parsed by a scripting language (I have used this technique to format written data in lisp and Python data structures). Personally I find XML syntax very verbose for typing by hand and since I rarely use other tools that read or write XML, I survive without writing XML

For your data repository you might want to do things like, restrict the characters used in file and directory names as some tools/OSs don't like spaces or exclamation marks etc in names, or have problems with manipulating them in command line shells. Don't use too many characters in file and directory names, and don't have more than one file or directory name that is only distinguishable by case - that will cause problems on case insensitive systems.
You might also like to have handy a utility to change the line endings of text files between that supported on multiple OSs.