|
| Sat, May 17th | home | browse | articles | contact | chat | submit | faq | newsletter | about | stats | scoop | 00:53 PDT |
|
login « register « recover password « |
| [Article] | add comment | [Article] |
I see two trends in progress. In one, we're continuing movement towards application-independent data storage. In the other, we're witnessing a proliferation of devices that each store the same data in a unique and incompatible way. I believe it's a time to watch developments carefully, and to be ready to move our advocacy efforts to a new arena. Copyright notice: All reader-contributed material on freshmeat.net is the property and responsibility of its author; for reprint rights, please contact the author directly. A brief tour of my history with computersIn 1983, Texas Instruments acknowledged that their home computer business was beyond hope. The TI 99/4A, originally sold for $1,150, was discontinued, and stores (needing to dump their stockpiles during the Christmas shopping season) put the machines on sale for $50-$100 each. Lines formed in the parking lots in scenes presaging the crowds of hopefuls waiting all night for Playstations (or products of more questionable quality, such as Windows95 and Star Wars I). For some time leading up to this, I had been left in the electronics section of the department store whenever the family went shopping (I had pong at home, but it was still unimaginable magic to hit keys on a keyboard and see letters appear on the screen), so there was no question what to get me for Christmas. My 99/4A was connected to a TV on Christmas morning, and everyone knew where to find me for the next few years. Eventually, we even got a cable that connected the computer to a cassette recorder, and it was possible to write programs to a cassette so you didn't have to type them in again every time you turned the machine on! If the volume was set just right and the wind was blowing south-southwest, the computer just might understand the contents of the tape and load your program every fifth time. This was the system I had through the first couple of years of high school. I spent a geek-appropriate amount of time playing with "TI Extended BASIC" -- a lot of graphics and sound programming, a text adventure in which you explored the sunken city of Atlantis, a program that did virtual dice rolls to create random Dungeons & Dragons characters, one that plotted the Cartesian graphs we were studying in Math, etc. At the start of my Junior year, I transfered to a school that had a lab in the basement filled with Apple ][s. For the first time, I started to use a computer medium (the 5.25" floppy disk) to store information that mattered to me -- my school papers, letters, etc. When I went to college, I took my newly-purchased Apple ][e clone, and that was the machine that handled all my writing chores, balanced my checkbook, etc. for several years. My geek power points rating was extremely low during this period; I just used the computer as a tool, and was no more involved with it than was the average secretary. It wasn't until I started living with someone who had a PC that I began to be interested in computers for their own sake again. We decided that the 286 with GeoWorks was not going to be sufficient for surfing the Web, so I bought some books and learned to build a 486. After two years of struggling with Windows95, I put my first Linux CD in the drive, and everything was better. Mostly. What it takes to retrieve old dataI now own information scattered across a variety of media -- audio cassettes, 5.25" floppies, 3.5" floppies, hard drives, and zip disks -- all at different levels of accessibility.
The good news is that how hard it is to retrieve the data varies inversely to how important it is to me. I don't care much about what's on my TI cassettes (though I would be curious to see how I wrote the engine for my adventure game). The Apple disks are filled with school papers, short stories, and really bad poetry that I have in hard copy and should burn someday anyway. Why I'm boring you with thisThe good news is deceptive; I believe things have been getting progressively better, but we've turned a corner, and now they're getting worse, from both software and hardware perspectives. As the good news says, it's become easier for personal/home computers to share information. My first computer, that TI 99/4A, had no way to share data with a Commodore 64, a Timex Sinclair, an Atari ST, a Tandy TRS-80, or any of the other computers around at the time. Say what you will about the near monopoly of IBM PC-compatible hardware, but it gave us a de facto standard that made it easier for our machines to share information. Today, the worst problem I can imagine is that someone would hand me a Mac floppy, and Internet use is now widespread enough that I could get away with asking her to email the files to me instead. The new problems are:
The software problemLet's look at how I handle these important pieces of information:
I only started keeping this information on a computer when I got online, so I only have to trace back what I've done to the time of using a terminal program under GeoWorks:
The heart of the software problem is this question: How hard was it to move data from each application to the next? PINE stored messages in mbox format. Pegasus used binary folder files. IIRC, I didn't have many saved messages at the time, and I just forwarded them all to myself. Going from Pegasus back to mbox for VM and mutt required something mildly unpleasant like getting Pegasus to write all the messages to separate files and then coercing them into one. I don't remember exactly what I had to do, but it wasn't too bad. My first collection of email addresses were kept in my PINE address book. I downloaded an application from somewhere that converted PINE address books to Pegasus ones. When I moved to BBDB, I believe I entered everything again by hand while I was adding street addresses and phone numbers. Transferring notes took some time, but went smoothly. notes-mode keeps a separate note file for each day, stored like ~/NOTES/199909/9909. note keeps everything in ~/.notedb (by default; it can also use MySQL, etc.). Luckily, it can read notes by STDIN, so, after sending Perl in to change the syntax of the topics in my notes to match that used by note, I could just use find to locate and cat each file and feed it to note. (Oh, how I love Unix.) A little fine tuning, and I was done. My schedule was recreated from scratch each time. Adding all the birthdays again was no fun, but I survived. These experiences indicate that the software problem is not very great, at least for Unix users. People who have to deal with word processor files on Windows are in bad shape, but the rest of us can usually just look at a pair of file formats and fire up Emacs, vi, or Perl to make the necessary changes. In spite of that, when I switch applications, experience has taught me to think carefully about the long view. My calendar, for instance, is not exactly locked in to Yahoo!, but transferring it somewhere would not be as trivial as it should be. Yahoo! gives me two options for creating backup files of my calendar. One is the Palm format, Date Book Archive, which stores the info in a binary file. If I looked, I would probably find tools for handling these files, but it still doesn't feel as secure to me as having the data in a text file. The other format is "Outlook format" (sic), or Comma Separated Values, which is quite ridiculous. (For example, instead of saying, "This birthday occurs on this date every year", it creates 37 copies of the birthday record, one for each of the next 37 years. How is the application that imports that supposed to know what was intended?) The best I can hope for is that DBA turns out to be a reasonable representation of my data, or another export format becomes available. Take BBDB as another example. Now that I'm no longer using Emacs for everything (I've switched to using dedicated programs that can call an external editor, and call XEmacs with gnuclient), BBDB's VM and GNUS features don't matter to me, and there's no reason I couldn't move to using another address book, perhaps one that has good mutt support. I have to think about this two steps ahead; not only is there the problem of converting my ~/.bbdb to the format used by whichever application I pick, there's the problem of considering what I might have to do to move from that application to the next one I decide to use. There's the crux of the software problem -- all the way down the line, my data doesn't change, but the ways in which my data is stored do. I still want to track the same information, so why shouldn't it be stored the same way whether I'm using an AppleWorks mail merge function, a Windows GUI address book, an Emacs lisp program interacting with mail and news readers, or a Web interface? If we could hop back in time and declare, "This is how address information will be stored. This is how a schedule will be stored. This is how notes and their cross-referencing information will be stored.", I could have used the same files for the last 15 years. My Apple, Windows, Linux, and Web applications would have all read and written the same files. There would have been no need to convert from one format to another. I could have switched back-and-forth between applications at will without a worry. At this point, the XML alarms may be ringing in your head, and you may be eager to point out that, although it's coming to the game late, we're about to move into that happy situation. Well, maybe, assuming everyone can agree on DTDs and actually uses them properly instead of adding proprietary extensions at every turn. (Look at what happened to HTML; "This page best viewed with browser x" could evolve to "This calendar best viewed with calendar application y".) I certainly hope it works out. The problem lies in the unspoken assumption that you can upgrade your software to take advantage of the new, and hopefully final, format, and this takes us to my real worry: The hardware problemA few months ago, I joined the cellphone age. One of the initiation rituals consisted of an entire evening spent punching numbers from my address book into the phone. This is where the unspoken assumption fails. I can't upgrade my phone to new software capable of using the format used by the address book on my computer. Even if I could, I don't have a way of making my computer talk to the phone to pass my collection of numbers to it. When a number changes, I have to change it in BBDB and on the phone. I have over 80 records in the phone, many with multiple numbers attached to them. If I drop it in the Baltimore harbor tomorrow, those numbers -- and the time spent entering them -- are gone. When I buy a replacement phone, I'll have to enter it all again. There are no backups, because there's no way to create a backup. This is only going to expand as we move from using desktop computers to using more and more dedicated information appliances. It's a bad situation turning worse at the moment; we're going back to the days of incompatibility, but now with a wide variety of devices instead of just with computers. My TI couldn't talk to your Tandy; neither can my cellphone talk to yours. This is what I meant by my ominous "Mostly." earlier. I can mostly be happy with the present situation. When I need to convert data on my computer, I have the tools at hand to do it. The problem is that that doesn't help me when I buy a phone and I don't have a shell on it. Even though all the data is sitting on my box waiting to be transfered, I'm stuck using the phone's only interface, the numeric keypad, punching "7" four times to get an "s". Again, XML is put forward as the way out of this mess, and it holds great promise. When I mark a note as urgent on my laptop, XML should make it possible that my desktop machine and my Pilot will note the change and do whatever I've told them to do about it -- mark it off with a different color, beep me to remind me about it, or whatever. When I change Joe's phone number on my cellphone, it should be changed on the speed dial of all the phones in my house, in my address book, and in Joe-related events on my calendar. It sounds wonderful, but I'm going to permit myself a dose of skepticism because the implementation requires the cooperation of a large number of people who prefer competition to cooperation. First, they have to reconfigure their devices to take advantage of XML. Then they have to agree on DTDs for the data they're using. Then they have to stick to the agreed-upon format and find other ways to distinguish their products now that they can no longer lock their customer base into their proprietary way of storing data. I'm not saying it's impossible; the Internet has proven that it is. Proprietary protocols have been forgotten in the face of TCP/IP, HTTP, SMTP, etc. because software makers have to conform or die in all these areas. What I'm saying is that we need to be aware of the issue and keep manufacturers honest. Standards come into being in two ways -- people decide on a standard and implement it, or, more commonly, something becomes so widely used that it becomes the standard, even if it's unbearably awful. XML authors are trying to do it right the first time, but they're going to be outmaneuvered if manufacturers are allowed to implement the standards only in the ways and to the extents that they suit them. It will eventually sort itself out -- a toaster that doesn't work with the other appliances on your home network is just not going to sell -- but there will be an initial competitive period that could be dangerously similar to the early days of personal computers, when nothing worked with anything else. You can help shorten this period by being aware of the standards as they are created and checking that the products you buy are in compliance. If your new cellphone is supposed to use the new name & number storage format but you find that you can't share numbers from your address book with your friend's phones, take it back to the store, and let the manufacturer know that you exchanged their product for someone else's. How long will it take for the old formats to go away? Will all our devices really be speaking the same language, and how soon? I don't know, but I do know that it will happen faster if we demand it. It's worth the effort, because it extends the ideals of the Internet into all the electronic accessories of our lives. When we can get there, there won't be TI information, Apple information, Windows information, Unix information, or Web information. There won't be information known only to your phone, your car, your Pilot, or your workstation. There will just be information, freely shared everywhere.
Jeff Covey received his degree in classical guitar performance but
spent so much time with his computer that he fell in with a bad crowd
and ended up working for
T-Shirts and Fame!We're eager to find people interested in writing editorials on software-related topics. We're flexible on length, style, and topic, so long as you know what you're talking about and back up your opinions with facts. Anyone who writes an editorial gets a freshmeat t-shirt from ThinkGeek in addition to 15 minutes of fame. If you think you'd like to try your hand at it, let jeff.covey@freshmeat.net know what you'd like to write about.[Comments are disabled]
[»]
Nice editorial :) Hah- And I thought I was the only one who faced the same problem!
[»]
Setting data free with LDAP aks Internet task Setting data free with LDAP aks Internet task commettee!
[»]
More standards for data exchange What articles like this (and comments from people such as my boss who is a
"typical" computer user) make me realise is that there is
definitely a need for open standards that are built upon "lowest
common denominator" technologies such as ASCII text. By this I mean
markup languages such as HTML or even better, standards built upon XML (as
many have already noted in these comments).
[»]
Tarballs How is a tarball not a single file? That is the way Siag Office has been storing structured documents (documents containing other documents) for years. It works great, the contents can be examined with standard tools and so on.
[»]
a tarball is <strong>NOT</strong> the answer Sorry, a tarball does not qualify as a single file, and therefore neither does an XML document bundled with the images inside a tarball. one, single binary file, please.
[»]
storing XML and complex binary data in one file This already works in koffice. The document is pure XML, but if there is
binary data it is treated as a separate file or files appended to the xml
(not interfering with it) and decoded or encoded afterward. This is
necessary with image data, as it's not feasible to store many megabytes of
binary image data in text format with tags. It works beautifully. The
whole thing is tarred and gzipped into one file, internally preserving the
directory structure that the extra data, usually binary, uses.
[»]
XSLT solves DTD agreement problem You are right that manufacturers are not going to always use the same DTDs.
Yes, it's trying to lock-out the competition, and reap monolopy profits,
but also natural evolution as well. It takes effort to stick to a
standard.
[»]
TEXT ONLY PLEASE I use pine on my own personal linux server (for mail, and notes, any
anything else of importance). I can read my mail from pretty much
anywhere.
[»]
re: against proprietary formats Regarding images and text not being able to be stored in a s single file... this is not a complex problem to overcome. Good 'ol .tar.gz can do the trick can it not? *KEEP* it seperate, thats fine. But treat a document as a tar.gz. If the application can handle the tar/compression on the fly when opening/closing then is this not as good as one large binary solution? With the notable exception of being better as far as some readability from a text editor for the txt portions of the document?
[»]
against proprietary formats Every time a potential employer e-mails me something in Microsoft's crappy Word format, they have the unpleasant surprise of receiving a reply in Papyrus format. An eye for an eye, a proprietary format for another; everybody looses. Lately, proprietary formats for documents is an issue I have examined repeatedly, because while e-mail messages, HTML markup and JPEG/PNG images are well-documented and highly portable formats, Excel and Word are not (no matter what acceptance they have found in offices of Wall Street), neither is PDF, yet those are imposed upon the scholar and technical crowd (not to mention the casual home user) by management staff. My main conclusion has been that those formats have one undeniable advantage: text and pictures are binarized and result in a single file that can easily be e-mailed and printed out. By contrast, markup languages (including XML) are very good at document structuring and generation (e.g. using XSLT schemes), but simply do not have the convenience of a single file whose appearance and layout can be controlled in a predictable manner. The answer is therefore obvious: This world needs open-source documents formats that match the possibilities of Word and PDF, but whose specs are determined by a panel similar to W3C. Until such formats exist, Microsoft will be able to shove their crap down everyone's throat, without anyone offering the means to offer any alternatives.
[»]
Why distribute data over various devices ? What initially came into my mind, as I left WinWorld was: "Why did I
use 1 calendar + 1 adressbook + 1 todo list etc... " Why not store
all your personal data in one DBMS. I currently work on my personal
Postgres DB storing all those things. When I need data in the future I use
my nifty (not yet written) web interface to access it from all 'round the
globe.
[»]
varying data formats Just my $.02:
[»]
Setting data free Contrary to what you title suggests, you did not talk a lot about the antagonism free/proprietary data formats. I think it is a shame that it is possible to copyright or patent data formats. It is even more a shame that users are not warned about this : how many software packages are there which issue a message like "attention, the GIF format is proprietary and can only be used by ..." each time you try to save an image using the GIF format? In other words, after buying a program, the customer might not be the unique owner of the documents he produces with it, and he is not even aware of this...
[»]
Re: Setting data free with LDAP aks Internet task commettee! Hello all - all these concens were addressed by LDAP protocol check openldap.org, MS AC, Oracle Internet Directory, iplanet=netscape LDAP, NOVEL NDS etc stay cool Alan
|