|
| Sat, Jul 19th | home | browse | articles | contact | chat | submit | faq | newsletter | about | stats | scoop | 04:00 UTC |
|
login « register « recover password « |
| [Article] | add comment | [Article] |
A more defined process is needed for development, distribution, and deployment of software. Specifically, we need to revise the current process which makes the end product of software development an archive file (gzipped tarball, Debian package, zip file, etc.) which is distributed on a CDROM or downloaded through the Internet via FTP or the Web and finally installed and configured. Software development, distribution, and deployment is a group activity carried out through collaboration over the Internet; it should include application developers, component developers, software users, and software testers, auditors, and reviewers, among others. Copyright notice: All reader-contributed material on freshmeat.net is the property and responsibility of its author; for reprint rights, please contact the author directly. Currently, the interaction between these collaborators is ad hoc, carried out via email, FTP, CVS, newsgroups, and Web sites such as freshmeat and SourceForge. This ad hoc process is both a symptom of, and reinforces, some of the unwanted features of existing software development practices. These include:
I contend that formalization of this communication, particularly across the software development-distribution-deployment process, would grease the wheel of productivity by removing communication barriers and simplifying tasks that are frustrating and unnecessary. ProblemsThe existing process has unnecessary manual steps, loses useful information, and undermines the dynamic nature of software development and deployment. Using application softwareThe difficulties that can arise when attempting to use a perfectly good piece of software are outrageous. The most heinous case arises when the software fails due to an implicit dependency on a feature that is present in the development environment but not the deployment environment (such as a file location, a system library version, or a system call). Dependencies on software components are also a common and major source of pain. While the mechanisms exist to describe the dependencies and automatically resolve them, the current development process allows them to be incorrectly modeled. Why don't the tools that we use to develop software automatically capture the environment and software dependencies that are being used? There are many reasons, but no compelling ones. The instability of deployed software is accentuated not only by the lack of modeling of the deployment environment and the incomplete capturing of dependencies, but also by the lack of versioning on all the artifacts involved in the process. Development tools do not support the ability to test the correctness of the deployment of software into a defined environment. Software is tested to work in the environment in which it is developed, and a great deal of extra work is required to ensure that it will work in any other instance of the deployment environment. Reusing softwareSimilarly, finding a component that is suitable for use when developing an application is problematic. Although you may find a component that has the functionality you want, will it be compatible? Is it designed to run on the deployment environment you are programming for? Will it cause a software dependency conflict? The result of these issues is that there are significant barriers to using components that are published by other developers for sharing. This leads to unnecessary software duplication and encourages monolithic applications. Developer/user collaborationCommunication between the user of software and its developer (new feature requests, feedback, defect reports) is ad hoc at best and often not bothered with because it requires work on the user's part. Instead, it should be automatic, standard, and part of the user interface. The proposalThe protocolA protocol should be designed for the purpose of communication in the collaborative development-distribution-deployment process. It should formalize:
The infrastructure
The infrastructure components that use this protocol are:
How is this different from existing practice?
ClarificationsWhere is my software?The biggest departure from existing practice is the concept that software is not stored on the deployment environment. Consequentially, instantiation of a piece of software is dependent on:
Can anyone seriously consider not storing software in the deployment environment? Yes and No. From a practical implementation perspective, No. There must be a mechanism which provides fast and reliable software instantiation. Given a transient network connection, there must be a local copy of the software. From a conceptual perspective, Yes. Absolutely Yes. This is a deliberate and central idea. Software should not be viewed as something that is copied and hoarded in an unconstrained and ill-formalized manner. In fact, it is the practical outcomes of this attitude that lead to many of the problems identified. The prevailing attitude fails to recognize a reality: The deployer does depend on the provider/maintainer already, whether you choose to recognize it or not. The failure to recognize this reality serves to magnify the timeframe and extent of problems when software ceases to be actively maintained. The flip side of the software hoarding attitude is the "throw it over the wall" developer attitude. With this approach, software is developed and tested on a sole instance of the deployment environment, and when finished (and only then) is packaged and made available for installation on many instances of the deployment environment. It is precisely because it's hard to deploy software to multiple instances of the target deployment environment that it's necessary to overcome the difficulty. This can be achieved by automating the process, doing it frequently, and improving proficiency, rather than leaving it as an unpalatable task at the "end" of the software development process. The crucial point is to force the issue. That is, to force reliable instantiation in the generalized deployment environment as part of the default development process. If this isn't done, it creates dormant problems that show themselves in complex, obscure, and unresolvable ways, leading to ridiculous solutions, such as "Just reinstall the operating system". Design principles
InfrastructureThe component infrastructure proposed is analogous to the Web in that it is a client/server architecture. This analogy equates Web servers, browsers, and HTML editors to software repositories, software deployment environments, and software development tools. Software repositories/software publishing serviceThe software repository stores the software. There are multiple physical software repositories that create a distributed global software repository, in the same way that Web servers create a distributed global document repository. Software dependencies can cross physical software repositories (just as hyperlinks can cross Web servers). Software deployment serviceThe deployment service resides with (but is conceptually outside) the deployment environment. The software deployment service maintains the information required so that it can instantiate software. Software development toolsSoftware development tools search software repositories for useful components. They automatically capture the deployment environment and software dependencies and communicate them to software repositories. It is part of the software development process to publish software prior to instantiation. FeaturesWhile this essay does not intend to be prescriptive, but rather to communicate an idea, there are some key features that this mechanism would need in order to make the proposal workable. Well-defined deployment environmentsHaving well-defined deployment environments is crucial to this mechanism of reliable software deployment. The features of the deployment environment must be defined. Software to be deployed in that environment can assume only those features and no others. Examples of deployment environments may include interpreters such as the Java Virtual Machine, Perl, or a particular version of an operating system distribution. Verifiability of deployment environmentsA deployment environment must be verifiable against its definition. For example, the Linux Standard Base (LSB) provides programs that can verify that a deployment environment conforms to LSB1.3. Verifiability of software instantiationSoftware must be verified as having no implicit dependencies. This can be achieved through automated testing of the deployment on a vanilla environment with only the explicit dependencies available. Registration of artifact useThe deployment environment must register use of software so the software repository/software publishing service will continue to store that software (or a version of it) while it is needed. Version controlVersion control of all artifacts is a central design necessity if reliable software instantiation is required. Any change to an artifact must be reflected as a change in the version number of that software. Support data management across software versionsIt is the responsibility of the software to manage the data it requires on the deployment environment. However, support for the ability to automatically transform data or support old versions of interfaces needs to be factored into the protocol. AuthenticationAuthentication must be part of the protocol, so that the software that is being used can be verified as being from a trusted (or at least known) source. DetailsThe unreliable InternetTo implement this protocol using the Internet, a reliable mechanism will be required, probably involving caching on the local host or at least the local network segment. This is a crucial (and non-trivial) implementation detail because of the stance this proposal takes on the primary location of software storage. Many approaches are possible to achieve the necessary reliability and speed, such as the approach used by the domain name system. It may be worth considering using Freenet as the central infrastructure component to deliver this requirement. Protocol technologyThe most obvious technologies to use for protocol definition are CORBA or XML/HTTP. Information stored and published by the protocolInformation that is required by the deployment environment should be supplied by software developers. Examples of information stored on the software repository and made available via the software publishing service are:
The communication of the user back to the software developerExamples of information created by users and stored on the software repository are:
Development workflowDuring development, the artifacts that make up any given software application or component are produced over time. The protocol must recognize and support software that is only partially complete, but also classify it as partially complete. For example, a piece of software may have source code and an executable, but no unit test case. It would form part of the classification that it was incomplete but usable (assuming the slightly controversial nature of the example is accepted). Further, the status of the software should be maintained during its lifetime. For example, if the software has a known defect (in this version), this must be made explicit. The pathTo implement this idea, support for the existing methods of software distribution and deployment must be part of the implementation. The protocol must support and provide a migration path for the existing archive files that are in use. Stepping stonesMany of the features of the proposed deployment mechanism are transformations of features already available. Much of the information required by the deployment environments is defined as fields in the RPM and Debian package file formats. Much of the communication between developer and user has been decomposed by the SourceForge (Alexandria project) user interface. An implementation approach could start with creating a protocol (rather than a user interface) to access SourceForge functionality. There is a correlation between the idea being put forward in this paper and work that has been done at Colorado University, the "Software Dock". There is plenty of analysis, published papers, and working code available from this project. SummaryThis essay essentially suggests the application of workflow automation and knowledge management disciplines to the software development-deployment-distribution process. The main concrete outcome would be the creation of a protocol which would act as both a human interface and a machine interface between the developer and deployer of software. This approach would result in less manual, more collaborative, and ultimately more productive software creation, leading to more reliable software. Author's bio: Michael Free is a software developer and strategist with 20 years' experience, primarily in the financial industry, which has given him a chance to run a small software development company, run an ISP, and perform most of the roles available in information technology, from system administrator to test analyst. When not hacking, he is hang gliding or running. T-Shirts and Fame! We're eager to find people interested in writing articles on software-related topics. We're flexible on length, style, and topic, so long as you know what you're talking about and back up your opinions with facts. Anyone who writes an article gets a t-shirt from ThinkGeek in addition to 15 minutes of fame. If you think you'd like to try your hand at it, let jeff.covey@freshmeat.net know what you'd like to write about. [Comments are disabled]
[»]
The problem I see I don't see a problem with getting the latest package, running configure for it, running make && make install... The problem I see is that of maintaining configuration. First of all, if you want to have your piece of software installed in /u123/misc/ you have to specify the path each time you run configure. This by far the easiest configure option to remember. Unfortunately the harder to remember options are also sometimes subject to change. What worked last time, doesn't necessarily work now, which means you can't really use a static script. Another thing is "make install" which is supposed to install the compiled version. However, often this means your configuration files get overwritten with defaults, and sometimes old modules from the previous version are left floating in the filesystem, that might cause problems, when they don't work with the new version. If I could have two things, I'd take a configuration repository for configure that remembers how it was called for the last version and the other thing would be that NO program EVER overwrite a config file on install if one is already present. If you have to do these by hand everytime you update something, you start reading release notes and weighting the bugs against the pain of upgrading. No good. --
[»]
Re: The problem I see
[»]
Re: The problem I see Good reply
[»]
division of tasks in a small/medium/large software deployment
Why not let the host environment decide how IT wants things to be
done, rather than doing this at the application level? A fine point... but you're hiding a deep truth under a veneer of obviousness... see below development vs packaging vs integration >"... lack of modeling of the deployment environment..." Posix not good enough? That's why I write configure scripts (or rather have autoconf generate them for me). Why not let the host environment decide how IT wants things to be done, rather than doing this at the application level? ALL the software you use lets IT decide where to put that config file? or that odd logfile? You are very lucky... *remembering bad experiences with write-only partitions for /etc, or nosuid flags for /var in some contexts* As for posix, it's a very general standard... while the article was about something MUCH more detailed, and quite possibly, much larger in scope... such as the deployment of a multi-module financial system in a client-server environment comprising 10 servers and over 40 workstations(at least that was my read of it, see the comment below about Sun) The LSB is closer... in terms of standardizing the environment... but even the lsb doesn't say exactly what to do if the software needs to be in /var and /var was full before you installed the application
http://developers.slashdot.org/article.pl?sid=03/02/09/1347215&mode=flat&tid=108 http://www.internalmemos.com/memos/memodetails.php?memo_id=1321 which mentions some of the updatability issues when dealing with a monolitic development "runtime engine" and its usability to developers As for the comment I saw earlier on the thread about "updating if there's no security issue" that's missing the point... if you got to reinstall the software or even the machine each time a new version comes out that fixes a bug, the packaging was wrong... it doesn't mean the software was worse for it though, at least, not worse than any other software with the same bug... All in all I thought it was a nice editorial... Trying to add a process without making the developers understand the division of work it implies sounds a bit dangerous to me though... It smacks of "people got to make it better without admitting what's better first and why". The article's greatest and least noticed point seemed to me that VERY FEW components and component-based software are actually packaged and installed... Probably due to a similar problem noticed by sun: some components can depend on some versions of some software which means you end up having to choose between software A based on component Andromeda version Arcturus or software B based on component Andromeda version Orion because the two components are incompatible.
[»]
Deployment to Broad User Population As someone who is trying to move up the learning curve into Linux-OS but
finding it tough going, this article hits the nail on the head. I also
think the developer population has every interest in making Linux easier to
install, configure and use. --
[»]
Deployment and configuration Might I recommend to people interested in the ideas put forward above that
they read http://www.infrastructures.org.
[»]
what's wrong with "old" versions? I'd like to amend/bark on the thesis regarding deployed old versions as a problem. It is *not* problem as long as it doesn't have known sec holes or production critical bugs. "Upgrade mania" is, well, a mania that's promoted by commercial vendors. Writing free software, you shouldn't be counting on easy and free update as an excuse for buggy code. buggy code. Solid free software tends to have each stable version just work and not cause any trouble, and that's best for production -- when you leave it in place mumbling "if it ain't broken, don't fix it" when management runs around asking stupid questions. (Mike, an IT manager and ALT Linux Team member) --
[»]
or not. admittedly, i breezed through the essay, but: --
[»]
Re: or not.
[»]
Software deployment on Windows To be honest I am very rookie like on software deployment on the linux platform. On windows, I prefer using policy based software deployment/software distribution using Group Policy and for example the tool Specops Deploy. A question, if you would do policy based software deployment on the linux platform, is there a way to do that today? Best, Rob
|