Outside of the Windows world (which is slowly moving towards very simple "packages" with Microsoft Installer, ".msi", files), the majority of operating system (OS) distributions (i.e., various, useful software as a collective whole) come in two forms:
- Ports
- Packages
EARLY CONTEMPORARY SOFTWARE DEVELOPMENT
In the golden days of community software development in the original APRANET, programmer source code -- typically C code as of '70s on-ward (through today, long story) -- was always used. One of the hallmarks of AT&T's UNIX System Laboratories (USL) and, subsequent due to the fact that the US government would not let AT&T (who had a monopoly on the telephone) enter the computer/OS market, involvement of academia and research (such as the University of California at Berkeley, UCB, Software Distribution, BSD) UNIX, was freely available C source code. Most everyone used the same C compiler (compilers turn source code, like C .h header and .c code, into .o object code aka "binary code" for specific computers so they can run -- NOTE: that is a mega-oversimplification) and other development tools, so "building" software for a particular UNIX platform was not too terribly difficult.
While AT&T was left outside the market, various commercial entities started cornering the commercial software market. Most well known as Bill Gates and Microsoft, who had ported Digital Basic to most fledging consumer systems (e.g., Altair, Apple, etc...) without a license. Back in the early days of compuing, code was still fairly simple across platforms, so coders shared source code freely. Gates was the first one in 1975 to suggest no one would write good software if they didn't pay for it. Ironicially and hypocritically enough, Gates' "anti-piracy letter" was actually written after he had swiped code from someone else, then modified it, because the original developers swiped his changes back.
Software quickly turned from having available source code to virtually no source code in the late '70s and early '80s. At first, this was manageable. Most utilities were simple, standalone and relied on little more than a kernel (typically the first, main and single, monolithic program that controls everything and is always loaded -- NOTE: again, that's an oversimplification) and maybe a C library (a C library is typically a set of object code that other programs can use, either "linking dynamically" at run-time, or statically by "linking statically" into the program itself, no longer requiring the library -- long story). They were usually one program and maybe a few support files other than the data files created by the program.
INSTALLERS
Once programs became more complex, with lots of different support files, installing software was not so simple. To add more issues, some programs only worked on specific versions of an OS. This is because unlike software that is released as source code, and can typically be built against different components and versions of an OS (with little or no effort), object code is linked with other object code into an OS-specific version on a specific hardware platform. Even the same hardware platform and OS, like the Disk Operating System (DOS) developed by Seattle Computer Products as an unlicensed version of CP/M (which Microsoft bought all rights to for $50,000 and IBM would later settle out of court with CP/M's creator for $800,000), had incompatibilities between versions.
One quick fix in the MS-DOS world was to use an "installer" which was a specific program that was only use to test the system to see how to install the program. This was done because DOS itself did not provide any formal mechanisms to provide any concise, relevant information to the program. Installing programs was arbitrary, and the details (and headaches) was left to any installer (or lackthereof), with shortcoming being left as "an exercise for the user." Microsoft provided virtually no mechanisms for formal archiving (files inside of one file), tracking of installed files on a system, modification of core system resources, etc... until the mid-to-late '90s, and only then after another company solved it for them.
Today, Windows still suffers from "conflicts" in software, largely because it does not allow multiple files of the same use, but different versions, to be installed or accommodated easily. This is known as "DLL Hell." The only mitigation that Windows continues to offer is "protection" of some core system files, like core Windows DLLs. Microsoft has been working with InstallShield and proliferating their Microsoft Installer (.msi) "archive/package," which most installers are now based on, but it still has massive and inherent weaknesses compared to most "real" package systems as we'll see. In fact, one of the major reasons for the .msi format is not to combat configuration management issues (more on that later), but to prevent the installation of trojan horses which are commonly in .exe or other executables but appear to be harmless installes.
PACKAGED DISTRIBUTION
Unlike the DOS world, UNIX included a wealth of utilities developed by USL, BSD and the academia/enthusiast community in the '70s through early '80s. Two of the most well-known and used are the tape archiver (tar) and copy I/O (cpio) programs. Many systems came with software in tar or cpio formats. Although tar may be more well-known to most Linux users today, cpio has explicit functions like "-i" (literally for "install") and is still popular with some UNIX flavors. After the shackles were lifted off in AT&T breakup after 1984, AT&T began a standardizations effort on their new System V (often abbreviated as SysV or SV) Version 1 UNIX. Most tar and cpio programs today are actually the same, streaming format (known as "USTAR", tar uses a 10KiB blocking by default, cpio uses a 5KiB blocking by default -- NOTE: POSIX2001+/SUSv3+ augment the streaming formats and introduce a replacement known as "pax").
UNIX, unlike DOS until 1993 with Windows NT (a true, Protected386 operating system with its own NTFS and VFAT filesystem extentions for FAT12/16 -- although VFAT was not popular until DOS7.x/Windows4.x in Windows 95/98/Me), typically has very large filename limitations. 31 characters were a typical minimum, and 255+ was almost standard in most implementations -- especially after the Institute of Electrical and Electronics Engineers (IEEE) began their Portable Operating Systems Interface (POSIX) committee to document UNIX standards. So it was commonplace to put version numbers on libraries and other system utilities, removing the possible conflicts that some programs could have -- something Windows still does not do today (with many ill effects). POSIX systems also have filesystems with symbolic links (symlinks) which allow one file to be referenced as another name, so a version like mylibrary-1.1.1.so (Shared Object, .so -- the common System-V format for libraries in the UNIX world -- somewhat of the equivalent to a "DLL" in Windows) could also appear as mylibrary-1.so or even mylibrary.so.
As software grew increasingly complex, UNIX systems found themselves having packages that conflicted -- some libraries not working with others, or requring a different (typically newer) standard C library (the foundation of "binary compatibility" in any POSIX/UNIX system). This gave rise to new packages with dependency checking -- packaging systems that would check to see if any other packages are required or could conflict. Although some installers could do the same, their arbitrary nature meant there were not only no guarantees, but older installers were often totally unaware of newer changes. Packaged distribution moves the "logic" of package management from the installer/program itself to the operating system, removing the issue of older installers with "legacy" information that may be ignorant of new programs and interfaces.
If you've ever installed an older Windows program only to completely trash a newer Windows program or vice-versa -- Microsoft's own Office product is a regular offender of this -- this is a major advantage of packaged OSes.
CONFIGURATION MANAGEMENT
The typical goal of packaged distribution, at least in an enterprise, is configuration management. The idea that if you can test software against other software on specific systems, and package it in a way that it can be pushed out to systems, installed and configured and resolve any conflicts (possibly uninstalling or interrupting its own install) automatically is a major reduction in IT overhead need and cost. Because even if software is stable and reliable on its own, there is no guarantee there will not be an issue once the software is integrated with other software on a system -- and systems can vary. Such "integration testing" is a crucial part of engineering, and information technology (IT) continues to be a practical extension of engineering principles (at least when addressed).
This is especially the case in the Windows world, where Windows itself lacks a good package management system, but there are mitigating systems. Even Microsoft has its own System Management Server (SMS) product, and even includes elementary package management starting with Windows [NT 5.0] (2000). Systems can have software "pushed down" to the unit and automatically installed without operator intervention, with conflicts being handled gracefully. After the SQL Slammer worm of January 2003, Microsoft finally started taking security seriously (despite patronizing to the contrary 10 months earlier), and released more patch management.
There are countless patch and package management systems for Windows, including some enterprise quality solutions that Microsoft itself relies on more than their own -- such as Altiris' line of solution. But in the Linux world, package and configuration management has been an all-in-one solution for some time, including encrypted signatures on packages to avoid trojan horse software.
LINUX PACKAGE MANAGEMENT SYSTEMS
The evolution of a sprawling GNU platform (GNU = GNU's Not UNIX, a "clean room UNIX-like) project founded after AT&T's USL started asserting copyrights and closing up source code post-AT&T breakup) known as Linux brought forth some extremely innovative approaches in package management. One of the earliest distributions of Linux, Slackware, uses its own Tar format, although it has added package management in more recent versions (NOTE: it's actually .tar.gz, a LZ77 compressed aka "Gzip" tar stream archive -- for those unfamiliar, PKZip is actually both a compressing LZ77 and block archiver in one).
Two of the most well-known among Linux users since the mid-'90s have been the Debian Package (DPKG) and the RPM Package Manager (RPM, fka Red Hat Package Manager) systems. A little known fact is that both formats use the USTAR format internally, System-V cpio with 5KiB blocking and the "cpio -i" install functionality, plus their own set of rich meta-data which is a hallmark of their respective debates.
Debian's DPKG was actually a very capable implementation from initial creation, with RPM gaining more features as Red Hat developed the system. A hallmark of Debian's DPKG system was founder Ian Murdock's strong attention to proper software packaging in general, avoidance of unnecessary depenedencies (e.g., avoiding allowing packagers to require little used programs, like scripting languages, just to insatll) and other "standards" in Debian's formal guidelines. Red Hat, on the other hand, developed RPM for more of an immediate need and has added more functions since.
Red Hat Linux v8 included RPM v4, which is considered to be the foundation of Linux Standards Base (LSB) package management, and is very similar to DPKG in capabilities including use of alternatives, multiple versions and other LSB-compliant features. Unfortunately for package management systems, especially the ever-popular Red Hat, the wealth of "protection" in dependency checking gives rise to countless situations where the system will prevent you from installing software. This quickly came to be known as "dependency hell" -- often personalized as "RPM hell" in reference to Red Hat (or other Linux distributions that used RPM).
Debian's early proliferation came about due to a set of features that allowed it to avoid "dependency hell." In addition to good packaging standards combined with a "maintainer Democracy," which helped address unnecessary dependency resolution, the Advantage Package Tool (APT).
PACKAGE MANAGEMENT FRONT-ENDS
APT is a package management front-end designed for the distributed development nature of the Debian Project. It automates package (as well as package source code, if a user wishes to rebuild) fetching from Debian's extensive respository of packages, dependency resolution (including automatic fetching of any additional packages required) and interactive (or even automated "fix") resolution. Literally overnight many people found themselves pleased with the avoidance of "dependency hell."
Most RPM distributions, to the contrary, merely focused on their own service issues. I.e., they introduced package management systems that would only update their own packages and updates for them. The Red Hat Network (RHN) developed the Up2Date service and client software, SuSE created its own services and clients for its YaST (Yet another System Tool) and other RPM distributions did likewise. Although they all used RPM for the actual package management back-end, the front-ends were vendor-specific and some were subscription-based.
Contrary to popular opinion, while APT was designed for DPKG and the Debian repositories, including source code fetching, APT is applicable to any package management system "back-end." E.g., Slapt-apt is the Slackware implementation. Connectiva, a distribution popular with Spanish-Portugese (South American) users, ported the APT system to RPM and used it as its base for updates turn of the millenia. By Red Hat Linux 7, many independent repositories sprung up and started to use this implementation, often referred to as APT-RPM, to download updates, new software as well as base software not installed.
SIDE-BAR: FROM RED HAT LINUX TO FEDORA CORE
Probably the most significant repository was the University of Hawaii's Fedora Project, started in 2002, who's APT-RPM showed that a completely Red Hat aligned repository and RPM-specific modifications could be made to solve community distribution issues -- both for the core OS distribution as well as add-on software. Facing increased scrutiny that its trademark was public domain due to lack of past enforcement, as well as the introduction of its new, almost competing "enterprise" product as its prior attempt to bundle Service Level Agreements (SLAs) with its community-released Red Hat Linux 6.2 "E" product (at least in comparison of sales to SuSE Linux Enterprise Server, SLES, at the time), Red Hat decided to "unproduct" its Red Hat Linux into a community project. The resulting Fedora Project, taking the name from U of Hawaii, was now a Fedora Core (fka Red Hat Linux) and Fedora Extras (community add-ons) and a Fedora Legacy (post-end-of-life support -- including all the way back to Red Hat Linux 7.2).
One of the first and underappreciated moved was the fact that Red Hat halted development on Red Hat Linux 10, now Fedora Core 1, and started addressing a lot of "inter-dependency" among other non-sense (e.g., setuid root on too many binaries) in its core packages over the span of 2 months. In a nutshell, since Red Hat Linux, now Fedora Core, was no longer a "product," there was no longer the attitude of "what we ship we have to support" that has plagued Red Hat Linux with criticism for so long -- let alone it was redundant with Red Hat Enterprise Linux (RHEL) existing (which is far more anal, because Service Level Agreements are sold on it, something that failed to sell as the unified Red Hat Linux 6.2 "E" offering prior).
The second move for Red Hat was to integrate the two most popular front-end systems, APT and Yellow Dog (a Red Hat Linux fork for PowerPC systems) Updater, Modified (YUM), into its Up2Date tools. Although Red Hat now provides formerly subscription Red Hat Network (RHN) access for free for Fedora Core, Extras, etc..., use of APT and YUM are encouraged. Although the formal Fedora Core project bundles only YUM and downplays the role of APT support in Up2Date itself, many believed APT-YUM to be superior due to its mature logic and availability of a GUI in Synaptic. But as of RPM v4.3 and YUM in Fedora Core 3, additional capabilities have been added with support for multiple architectures (e.g., running x86-64 and i386 binaries simultaneously on x86-64 systems -- NOTE: Debian solves this by using a chroot environment for i386 under x86-64, an innovative approach), 3-tier packaging sets (which the Anaconda installer for Red Hat has had for a long, long time, but not in the package system itself) and other capabilities.
YUM still seems to have some growing pains, as GUIs are not forthcoming. E.g., YUM Extender (yumex) is not developed by and for Fedora, and it has caused major issues in the past. Fortunately, a new front-end solution from the creator of APT-RPM is on the horizon -- SmartPM which we'll discuss a bit later.
TAKING A BREATH ...
Now at this point, you're probably wondering what distribution I think is best? In reality, I don't think any are "best." What distribution is for you? It really has nothing to do with brand name (unless there is some "commercial misconceptions" which I'll get to later), and I'm only providing this as a "foundation" of "technologies." I know a lot of people talk about Linux and "choice," but many times they find themselves arguing about "brand name choice," instead of actual "technical choice."
The reality is that if something works well in one distribution, it is typically adopted by another. In any democracy of infinite choice, most people will standardize around 2-3 implementations -- something that the natural laws of sigma statistics (1 ~ 67%, 2 ~ 96%, 3 ~ 99+%, etc...) seem to support. While some might argue that "foundations" of one distro might be better than another -- and I'll be the first to admit, as an engineer, I agree with the Debian Project as a community and Ian Murdock's Progeny endeavor as a commercial services company is most ideal (configuration management is a corporate-specific detail, and you can't get that out of a "shrink wrapped" box or even fixed product SLAs that may not address all your needs) -- all distributions have offered many other things to others, and they all leech on each other. Because Linux is about a choice of "technologies," not "brands," although our history and exposure to "marketing" in the commercial world still tends to distract our focus.
Which brings me to our next focus, "Ports" ... that's right, all we've talked about to this point is OS distributions that use "Packages."
GETTING BACK TO BASICS: PORTS
The aftermath of the original 4.3BSD-based 386BSD project, and the resulting 4.4BSDLite codebase agreed by both AT&T USL and UCB to be copyright/ownership of UCB that would eventually see 3 new, community BSD UNIX flavors of FreeBSD, NetBSD and, later from NetBSD, OpenBSD, about the same time as the founding of the Debian Project and Red Hat, Inc. Instead of focusing on the traditional realm of package management to solve system configuration management, most community BSD implementations have stuck with the basic premise that UNIX is C source code that can be built against any OS implementation it has been "ported" to. GNU system tools like Autoconf are designed to software can be easily ported and built on different GNU, POSIX-like platforms, depending on how well the software developer wrote the software (including using the Autoconf and other tools, as well as following GNU coding or other portability guidelines).
To start with a specific example, FreeBSD's "ports" system is a different approach to distributed software installation, updates and other software. Instead of a "package maintainer" taking software, building a package configuration file (e.g., a "SPEC" file in the case of RPM), building it from the "source package" (e.g., a .src.rpm aka SRPM) on one more more systems for one or more "packages" distro releases, the "ports maintainer" merely includes the small configuration so the "ports system" can fetch the software, as well as any additional developer files (e.g., a modified Makefile, Autofoncf configuration output, etc..) needed to build the software. That way, instead of having to make packages for different system configurations -- a redundant and disk space bloating option -- the end system fetches all the files it needs to build the software for just itself.
In other words, "ports" distributions are a front-end for automating software building from source code. "Ports" distributions maintain a centralized repository where "ports maintainers" can collaborate and release new support files (if any are required) for new software releases. In many cases of various ports repositories, the software itself is actually not stored in the "ports tree/repository," but is fetched by the client. This is quite unlike "packages" distros, which include not only the software in the binary/usable form, but the source packages as well.
The Linux world was without a good "ports" distribution for a long time until Daniel Robbins created the Gentoo Project.
PORTS V. PACKAGES: ADVANTAGES
Although opinions vary, especially those who focus on "branding" instead of "technology," here are some key and distinct advantages/disadvantages of Ports v. Packages.
- Always Current -- ADVANTAGE: PORTS
Ports are almost always current. Unless a software release requires a significant change to the build process, ports trees/repositories are updated almost immediately. There may be some testing done by the "ports maintainer" for the software, but he/shee has a lot less effort to deal with than an"packages maintainer" which brings us to the next advantage ...
- Distribution Release Effort -- ADVANTAGE: PORTS
By far the build effort is greatly reduced on the distribution maintainers themselves in a ports distro. Other than making sure newer versions can built with the existing support files in the ports tree/repository, the tree remains "current" without much effort. This, of course, means that end-user systems have more effort in actually building the software (which can introduce delays). But with a good configuation management roll-out with distributed builds and binary distribution, this can be mitigated -- even in a SMB organization.
The effort at the packages distribution gets heavy, especially when the distributor is maintaining several different, simultaneous versions (which is what Red Hat was doing when Red Hat Linux 9 was released -- supporting 6-7 community versions, not including enterprise!). This is why most packages distros don't support more than 2-3 releases simultaneously, or don't guarantee timely or well tested updates for older releases (e.g., Fedora Legacy). The only main advantage is that once the software gets to the end-user system, assuming the version is supported, the effort (and assuming compatibility -- see the next point) is next to nothing.
- Update/software compatibility/availability/customization -- ADVANTAGE: PORTS
Now this is where Ports really shine. Because the software is built at the end system (even if distributed in binary form to all other systems of the same configuration), it can be built by and for a significant number of different configurations. Packages distros often only have 2-3 maintained releases (going back to the "release effort" required), and while the dependency resolution is typically sound on major distributions, and the front-ends very accommodating, it can get very messy when "package maintainers" of the official, distribution's packages do not match their build processes/assumptions with the "package maintainers" of other, 3rd party packages. In a "ports" distribution, the build processes/assumptions are unified at the end building system.
Compatibility of updates and software availability is also extensive, and "ports" distributions like Gentoo outstrip Debian in sheer number of available packages (which goes back to the "release effort"). Small changes do not have a significant impact on "ports" distros, or they can be accommodated by supporting different, core system "profiles" (using the Gentoo term) whereas many "packages" distros don't. I.e., a "packages" version may dictate whether /dev has actual files in it, or uses a virtual /dev filesystem like devfs or udev -- and an "older" version is left behind in available software. A "ports" distribution could be built to either, but run the same, latest user-space software could be updated.
Now you _could_ use the source code fetching and rebuilding features of a "packages" distribution to rebuild the entire distribution from source, solving many issues. Indeed, both Debian and, now that it is completely open, Fedora have extensive build frameworks for rebuilding each system with many, many options that rival "ports" distros, they still make a lot of assumptions on and do not offer many inherent features of ports. E.g., referring back to the previous statements, the "profiles" that tend to be pre-determined for each "packages" distribution (although Debian is clearly less of an issue here than Red Hat, for more approach reasons than anything, each having merit).
- System footprint -- ADVANTAGE: PORTS
Now this one is very misunderstood.
First off, "ports" distros are very nice when disk space is limited. "Ports" distros make it the easiest to build a system with just what the system needs in software, whereas "packages" distros take a set of software and often build the whole kitchen in case different systems need to use different parts of the program. This makes "ports" distros the most ideal for application-specific systems such as appliances and other solutions where disk space is at a premium. Although some embedded work will require a more "elementary" level that the assumptions the "port maintainers" make, "ports" distros still make a far better "starting point" than common "build from scratch" approaches is not true. Only when you reach a level where support tools are necessasry for the development/targetting of the platform do "ports" distros fall short (e.g., targets where something like the embedded/loader/tool support in Monta Vista Hard Hat Linux might be more useful).
Secondly, but differing, there is a common believe that "ports" distros are significantly faster on newer hardware than "packages" distros. It really depends on how the "packages" are optimized, but leading edge "packages" distros tend to build all software as optimized for the most common platform. In the case of "extensions," such optimizations are not typically a compile-time function, but the design of the software itself. E.g., using SSE units in a processor instead of ALU or FPU is a decision made by the software, not the compiler, because SSE is not as precise and could typically does significantly adversely affect calculations (neligable for games, detrimental for sci/eng and sometimes system calculations). "Packages maintainers" and "ports maintainers" are in the same boat -- without re-writing the software, the former typically finds itself building the packages configuration to support _all_ extensions and optimal performance while the latter allows end-systems to build as optimally for one system.
[ TECHNICAL NOTE: I recommend against throwing the -O3 switch system wide. On Intel P3/P4 with 1/2 SSE pipes, respectively, you will see signficant ALU/FPU calculation errors. And even on the AMD Athlon-Opteron, which uses its percise, pipe-abundant 3+3 issue ALU/FPU to do SSE, -O3 often causes risky optimizations that interfere with its built-in out-of-order execution and register renaming. GCC defines -O3 to explicitly try optimizations that are "risky." -O2 is the "safest, recommended" setting. As much as "ports" advocates might claim that a ports distro has an advantage by using -O3, they are not only incorrect on the advantage due to lose stability, but _all_ major "packages" distros can be completely rebuilt with -O3 (let alone different --march/--mtune settings) with their respective build systems (e.g., the dist-tools for Fedora). ]
But reversing that discussion, the previous assumes you are talking popular, modern systems. If you are talking early hardware platforms that are not optimized like today's sytems, definitely, "ports" distributions will perform much better. E.g., Red Hat currently builds its Fedora Core 2+ x86 releases for i486 ISA (instruction set architecture, essentially i386 + FPU + TLB -- the TLB being the major boost over i386), optimized (scheduler, registers, units, etc...) for Pentium 4 (which also is optimized for Athlon), and these will perform less than optimally on less than a Pentium 4 or Athlon (although they will still boot). Even Red Hat Linux 8 (7.3 too?) was built for i486 ISA and prior was i686 (Pentium Pro and K6, but _not_ Pentium) optimized.
Especially if you have a true Pentium or Pentium MMX (and not Pentium Pro/II/III/4, K6, Athlon, etc...) which has optimizations that are actually "de-optimizations" on other processors, even i686+. A "ports" distro like Gentoo is probably the highest performing distribution for true i586 platforms. Same deal with "embedded," 500MHz Pentium II/III "class" processor that only has an i486 ISA and does not support the i686 ISA, or is at least not designed to be i686-class superscalar. AMD SCLAN, Cyrix Mx686/M1, IDT Centaur WinChip/2, SGS Thompson, and several other processors, available in even 500MHz+, are actually only i486 ISA. And while even the AMD/Cyrix/NS M2/Geode, IDT Centaur WinChip4 and the Cyrix-Centaur evolution into the ViA C3 might support the i686 ISA, they are clearly not i686 class superscalar designs like the Intel Pentium Pro-P3 or AMD Athlon-Opteron that will do well with --march=i686 (let alone --mtune=p4/athlon).
These latter points with older/embedded hardware are cleary cases where compile-time ISA/optimizations will have a significant impact on performance, and "ports" _do_ offer advantages -- at least advantages that are inherent to the process (whereas a "packages" distro requires their build tools, and might "break" if i686 is required for some things).
- Legal Redistribution -- ADVANTAGE: PORTS
Another obvious advantage for ports from a legal standpoint is the fact that some legal issues are bypassed. For example, while some software is free, many are not freely redistributable. Packages maintainers may not have permission to repackage and redistribute software, and by providing another entity's software in a package repository, that repository could be liable. Not so with ports trees, because in the case capacity that an individual ports distribution system can fetch the build information from the ports tree/repository, it could also fetch the software from the vendor's site. With exception of any click-thru agreement requirements, there is no redistribution involved, other than the support files (which are typically not from the vendor anyway).
Now it may seem there are no negatives to "ports" distros other than the build-time effort. Depending on your viewpoint, this may be correct. But let's re-visit an old friend of engineering and key to configuration management, "integration testing."
PORTS v. PACKAGES: INTEGRATION AND REGRESSION TESTING
Many Open Source advocates claim that all bugs are shallow with the availability of code, and this results in better code. Furthermore, they claim that the availability of code makes it easier to test software with other software, and change either software to accommodate each other. For the sake of this article, even if you don't agree, assume these are true facts about Open Source. Why do I ask this? Because Open Source projects on their own still doesn't solve the final detail of building an OS distribution: integration testing.
Integration testing is usually the primary reason why engineering projects slip. This is no different in software than in anything else. You could have teams designing the best software components for use in a system, and even sharing interfaces and standards with each other (which certainly helps), but until you start actually integrating the different components as a single unit, there is no guarantee they work with each other. While one project may involve many others, because of the assumptions of thousands of packages in a single distribution may bring to the table, and differ on, integration testing is necessary.
There is absolutely no guarantee that a "packages" distro will offer any integration testing. Packages distros can just plunk out packages from different Open Source projects, hoping the developers have built their software to work together with other software the "packages" distribution has chosen -- possibly something the original software developer didn't think of. There are countless "packages" distributions out there, many forks of others, or partially based on the prior works of others. Many are innovative, many are offering new packages as standard, but they all fall under the same need for integration testing.
Some "packages" distros do more than others. Some "packages" distros are known not only for their integration testing, but integration plus regression testing approach. For example, both Debian and Red Hat have 3-tier development approach for every new distribution. Now Debian and Red Hat's cycles differ greatly, but they use the same set of package regression, distribution integration and distribution integration testing ...
Package Regression: Done by maintainers, then put to ...
Debian Unstable, Fedora Development (fka Red Hat Rawhide), etc...
"Package maintainers" do their own package-level regression testing. Sometimes they do it privately or as part of their team, othertimes they release it into the "new package repository" of their respective distros where others can regression test along with them. Many times they are doing regression testing and building upon prior regression testing on each revision or patch level of software. E.g., Red Hat regularly maintains up-to-the-date kernel developments and keeps applying patches to kernels, and releasing their continuing regression tests of their patch kernels, typically within days after kernels are released from kernel.org.
Distribution Integration: Done with "new package repository," eventually becoming ...
Debian Testing and Fedora Test (fka Red Hat Beta)
As packages are dropped into the new package area, both the distro and the users who are running those packages as a whole (e.g., Debian Unstable or Fedora Development) will quickly and most definitely quickly run into integration issues between packages. As such, integration testing will occur, changes that need to be made to those packages made, put back out in the "new package area" to be downloaded, tested, etc... again. Eventually, at some point there are freezes on new package submissions as most integration issues have been worked out. And thus, a formal integration test begins, such as Debian Testing and Fedora Test are. Whether the changes and releases are formally interated or just done over time, the integration test quickly turns into a series of ...
Distribution Regression: Done by distribution release and community testers
How well this is done, I am not here to tell you. I am merely pointing out that when a "packages" distribution goes to build a release of end-user usable binary packages, this is the model they use. Many people debate whether or not Fedora/Red HAT's 2-2-2 (Development/Rawhide-Test/Beta-Release) @ 6-6-6 (0-1-2 -> Enterprise) month model is better than Debian's more direct 6-6-6 model. But it is clear that one of the reasons why Red Hat "pushed adoption" of things like GLibC 2, GCC 3 (and now GCC 4), kernel 2.4, 2.6 (and backports) is because they push out community revisions every 6 months, with the first of a new 18 month cycle being a ".0" release (in old Red Hat Linux speak) with a lot of things changed.
In most "ports" distribution, there typically are no formal binary releases typically because most are an "always current" release. Now cases can be made to show that many "ports maintainers" take the time to regression test a software release before adding it into the ports tree/repository, and there can be and there often is even ports distro-wide, formalized integration testing at times -- especially for major changes. And some may argue that with small changes, the 3-phase approach of the more popular "packages" distros are overkill and inefficient, and this holds true in the eyes of many.
And lastly, some would argue that "packages" distros are trying to solve a problem that was introduced by the commercial software model, whereas "ports" are just a return to the foundation of UNIX ... source code.
ENTERPRISE CONFIGURATION MANAGEMENT
SuSE was the first vendor to introduce an "enterprise" specific distribution. Before SuSE Linux Enterprise Server (SLES), Red Hat was the first to offer a Service Level Agreement (SLA) with Red Hat Linux 6.2 "E" -- the last 6 month revision (a total of 18 months of releases) in the 6 release. The corporate Linux world responded by awarding SuSE with sales for its "separate" product, and Red Hat was left scrambling to introduce the same (which they did with Red Hat Linux Advanced Server 2.1 based on Red Hat Linux 7.2, and refreshes of different products with Red Hat Enterprise Linux 2 based on Red Hat Linux 7.2/7.3).
In a typical enterprise, there is this common tendency to trust something that is fixed and shrink-wrapped. After all, with any arbitrary Linux, how do I know if it will run Oracle -- at least without doing much research? But despite all the the belief in how "out-of-the-box usable" supposed fixed and shrink-wrapped software is, organizations still do formal configuration management whereby they install software, test it for their applications, etc... before rolling it out. So not only is a "packages" (or other binary) distribution doing their own configuration management prior and after release, but the end user is as well.
So at what point are the duplications not worth making?
This is really a question left to organizations who will believe they can answer it best for themselves, and this is probably an understatement. Organizations who believe they can do a better, more relevant and custom drop of assembling, building and supporting Linux will see little value in an "enterprise" Linux. In fact, there is much proof of this in the fact that BSD UNIX is still far more popular than people believe, and its only the marketing and product availability and resulting perceptions that say otherwise -- just like more community endeavors including Debian and Gentoo versus Novell and Red Hat.
But what I can tell you is the strategy of each project or organization, which then leaves the decision up to you. First off is Novell and Red Hat, who have seemingly taken turns mirroring each other.
SPLIT PACKAGE DISTRIBUTIONS: SuSE and Red Hat
Both SuSE and Red Hat's core models have been more similar than different. Each maintains 2-4 "revisions" of a largely "binary compatible" series over 18-24 months. The last release is typically the most compatible with the ones before it, and the most used and tested and therefore stable. Both "push the envelop" in their ".0" releases, purposely changing the core kernel, GLibC, GCC and/or other components, sometimes even adopting "beta/pre-release" or "backporting" software for the first revision, because the release version of the software will most likely be out for later revisions.
As mentioned previously, Red Hat attempted to keep its product unified by offering Service Level Agreements (SLAs) on its Red Hat Linux 6.2 release, known as 6.2 "E" for enterprise. Red Hat has always maintained the status that "what we ship, we support" and would not entertain extra packages.
SuSE, on the otherhand, often included the kitchen sink, whether they supported it or not. Starting with SuSE Linux 7, SuSE released a subset package release built for enterprises and offered SLAs with its contents and, unlike Red Hat, called it a different distribution: SuSE Linux Enterprise Server (SLES).
Red Hat quickly followed SuSE's lead as the corporate world was willing to believe that a separate enterprise product was better. Red Hat still maintains its 6-6-6 release model in Red Hat Enterprise Linux, but they hide it more. This release model, and the sub-2-2-2 model dev-test-release, has not changed. And the same developers who work on Red Hat Enterprise Linux will work on Fedora Core, because their is a 1:1 package relationship, just like with Red Hat Linux prior.
Despite belief to the contrary, Red Hat never supported releases long term (typically only the last ".2" release and the current 1-2 -- about 18 months), and only when Red Hat Linux 6.2-7.1 became popular did Red Hat indulge to support up to 6-7 revisions simultaneously -- something they eventually found was a great waste of resources. As of Red Hat Linux 8, Red Hat officially declared they would only support revisions for 1 years (basically only 2 back), which was really just a clarification of their past model since Red Hat Linux 4.2 (4.x/5.x were the first "modern" release under the 2-2-2/6-6-6 approach -- "Rawhide" being formally designated introduced around the 5.x release cycle). Red Hat never offered Service Level Agreements for Red Hat Linux (except 6.2"E", their "RHELv1" retroactively if you will), and the RHN access wasn't dropped, direct updates via Up2Ddate were just opened up for free.
COMMUNITY AND SERVICE: DEBIAN AND PROGENY
I have already discussed Ian Murdock's Debian Project in many advantages, many of which continue to today. About the only note I can make from a political standpoint is that some of the same "advantages" that were argued by many Debian proponents against RPM distributions: more packages, better/automatic updates, no dependency hell, etc... are now the same arguments that Gentoo users are making (thanx to its "ports" model). In reality, as always, technologies are key to understanding differences in Linux, and not making them a marketing stick because many technologies do overlap distributions.
E.g., there are even some "packages" distributions that are starting to use some collectiosn of software via "ports" (e.g., Perl, Java, etc...) repositories where formal configuration management is less of an issue (e.g., especially during development, hence development software), as well as more "pre-packaged" binaries for "ports" distributions that are good for specific (e.g., modern) hardware to reduce build-times.
Debian is still a single project and set of releases. Although many independent, commercial distributions are based on it -- from Xandros (fka Corel) to Lindows (actually based on Xandros), Ian Murdock did not create a commercial distribution from Debian, but a commercial endeavor, Progeny. Progeny markets itself as the "Linux Platform Company," and clearly sees Linux configuration management as a service it can provide to organizations to help themselves best. After all, even when most organizations buy "shrink-wrapped" or "enterprise" versions of software, they still tend to do their own integration testing (if not regression testing as well).
So it's not surprising that larger organizations like the City of Munich picked Debian over its prior initial evaluation of Windows and Linux with Novell-SuSE with IBM Global services as the Linux option. Now even Murdock has confirmed in his blog that between the initial OS choice and final solution decision to go with Debian has been swayed by the fact SuSE was no longer SuSE AG, its own German corporation, but now a division of Novell, Inc., an American corporation. But that on its own would and could not sustain the endeavor the City of Munich (with ~70,000 desktops) with Debian.
The reality is that many believe that you do not need an "enterprise" distribution to roll out Linux, because you're often going to be doing "configuration management" anyway that involves integration testing. After all, even if the "enterprise" distribution goes through a lot of regression testing and offers formal Service Level Agreements (SLAs) to 4 hours (and even lower), they still only offer this on their fixed, limited set of software. Because most corporations run more than just what the vendor supports.
[ SIDE NOTE: Which is why I often say if you choose Microsoft solutions, stick with only Microsoft software -- across-the-board. But if you find it is limiting, you shouldn't just limit yourself to software that only runs on Windows, and consider a more "portable/open" future. ]
At the same time, people still do purchase SLES and RHEL because it runs the few, certified applications. Which is why it is not surprising that one of the new endeavors for Murdock's Progeny is to ensure Debian Linux is binary compatible with Oracle and other commercial software that normally runs on RPM. The idea here is not to create an "enterprise" distribution of Debian, but a Debian configuration that is ready for select applications that are certified against select, marketed "enterprise" releases like SLES and RHEL.
As I have said repeatedly, Ian Murdock is ahead of his time and the engineer in myself completely appreciates the unleashing of his foresight onto the corporate Linux world.
FOOTNOTE: SMARTPM AND EMERGE FOR EVERYONE?
This blog ended up being a lot longer than just a simple discussion of packages v. ports, eh?
The reality is that with any innovative, technical approach, everyone has to accommodate. "Packages" distros are already starting to use "ports" mechanisms for some portions of their distribution (e.g., development software like Java, Perl, etc...). Gentoo's excellent "emerge" is commonly used now for such, even atop of Debian and Fedora (and even found in repositories for both now!). All "ports" distributions to offer good portions of their system in "binary" form that is testing to various levels, including Gentoo.
With that said, even the "packages" world is undergoing a unification of sorts. Progeny already has its "Componentized Linux" which makes support of Debian and Fedora systems uniform (even if package exchange/support is not quite). And many endeavors to port APT-RPM to other platforms, like Solaris, for unified configuration management have actual, major usage at universities like Rutgers. And even the "front-end" for package management systems is starting to unify if Mandriva has anything to say about it.
Mandriva, formerly RPM-based Mandrake, gobbled up both RPM-based Connectiva and DPKG-based Lycoris. The same people at Connectiva that first adopted APT-RPM has now introduced SmartPM -- a near-universal front-end for package management systems (e.g., DPKG, RPM, Slack, etc...) that set out to solve at least three major issues from the get-go:
- Advanced dependency resolution (improvements over both APT and YUM)
- multiple respositories (beyond what APT's pinning can do, and can even support multiple repository formats) and
- Comes with a variety of interfaces as standard (CLI, CLI+shell, GUI like APT's Synaptic, etc...).
It's still in testing but many people are switching, including DAG which is one of the largest APT/YUM-RPM repositories for Red Hat Linux, Fedora Core and Red Hat Enterprise Linux (as well as CentOS). The FAQ is extremely enlightening, including more information on all the support, as well as case studies on how it approaches issues with the APT and YUM systems differently (and more completely) while supporting their repository formats so it can be used today without any change:
http://www.smartpm.org/