3.6. Package management tools

In any distribution, the packages are the basic item for handling the tasks of installing new software, updating existing software or eliminating unused software.


Basically, a package is a set of files that form an application or the combination of several related applications, normally forming a single file (known as a package), with its own format, normally compressed, which is distributed via CD/DVD or downloading service (ftp or http repositories).

The use of packages is helpful for adding or removing software, because it considers it as a unit instead of having to work with the individual files.

In the distribution's content (its CD/DVDs) the packages tend to be grouped into categories such as: a) base: essential packages for the system's functioning (tools, start-up programs, system libraries); b) system: administration tools, utility commands; c) development: programming tools: editors, compilers, debuggers... d) graphics: graphics controllers and interfaces, desktops, windows managers... e) other categories.

Normally, to install a package we will need to follow a series of steps:

1) Preliminary steps (pre-installation): check that the required software exists (and with the correct versions) for its functioning (dependencies), whether system libraries or other applications used by the software.

2) Decompress the package content, copying the files to their definitive locations, whether absolute (with a fixed position) or can be relocated to other directories.

3) Post-installation: retouching the necessary files, configuring possible software parameters, adjusting it to the system...

Depending on the types of packages, these steps may be mostly automatic (this is the case in RPM [Bai03] and DEB [Deb02]) or they may all be needed to be done by hand (.tgz case) depending on the tools provided by the distribution.

Next, let's see perhaps the three most classical packages of most distributions. Each distribution has one as standard and supports one of the others.

3.6.1. TGZ package

TGZ packages are perhaps those that have been used for longest. The first GNU/Linux distributions used them for installing the software, and several distributions still use it (for example, Slackware) and some commercial UNIX. They are a combination of files joined by the tar command in a single .tar file that has then been compressed using the gzip utility, and that tends to appear with the .tgz or .tar.gz extension. At the same time, nowadays it is common to find tar.bz2 which instead of gzip use another utility called bzip2, which in some cases obtains greater file compression.

Example 3-8. Note

TGZ packages are a basic tool when it comes to installing unorganised software. Besides, they are a useful tool for backup processes and restoring files.

Contrary to what it may seem, it is a commonly used format especially by the creators or distributors of software external to the distribution. Many software creators that work for various platforms, such as various commercial UNIX and different distributions of GNU/Linux prefer it as a simpler and more portable system.

An example of this case is the GNU project, which distributes its software in this format (in the form of source code), since it can be used in any UNIX, whether a proprietary system, a BSD variant or a GNU/Linux distribution.

If in binary format, we will have to bear in mind that it is suitable for our system, for example a denomination such as the following one is common (in this case, version 1.4 of the Mozilla web navigator):


where we have the package name, as Mozilla, designed for i686 architecture (Pentium II or above or compatible), it could be i386, i586, i686, k6 (amd k6), k7 (amd athlon), amd64 u x86_64 (for AMD64 and some 64bit intels with em64t), o ia64 (intel Itaniums) others for the architectures of other machines such as sparc, powerpc, mips, hppa, alpha... then it tells us that it is for Linux, on a PC machine, software version 1.4.

If it were in source format, it could appear as:


where we are shown the word source; in this case it does not mention the machine's architecture version, this tells us that it is ready for compiling on different architectures.

Otherwise, there would be different codes for every operating system or source: GNU/Linux, Solaris, Irix, bsd...

The basic process with these packages consists of:

1) Decompressing the package (they do not tend to use absolute path, meaning that they can be decompressed anywhere):

tar -zxvf file.tar.gz (or .tgz file)

With the tar command we use z options: decompress, x: extract files, v: view process, f: name the file to be treated.

It can also be done separately (without the tar's z):

gunzip file.tar.gz

(leaves us with a tar file)

tar -xvf file.tar

2) Once we have decompressed the tgz, we will have the files it contained, normally the software should include some file of the readme or install type, which specifies the installation options step by step, and also possible software dependencies.

In the first place, we should check the dependencies to see if we have the right software, and if not, look for it and install it.

If it is a binary package, the installation is usually quite easy, since it will be either directly executable from wherever we have left it or it will carry its own installer. Another possibility is that we may have to do it manually, meaning that it will be enough to copy it (cp -r, recursive copy) or to move it (mv command) to the desired position.

Another case is the source code format. Then, before installing the software we will first have to do a compilation. For this we will need to read the instruction that the program carries in some detail. But most developers use a GNU system called autoconf (from autoconfiguration), which normally uses the following steps (if no errors appear):

• ./configure: it is a script that configures the code so that it can be compiled on our machine and that verifies that the right tools exist. The --prefix = directory option makes it possible to specify where the software will be installed.

make: compilation itself.

make install: installing the software in the right place, normally previously specified as an option to configure or assumed by default.

This is a general process, but it depends on the software whether it follows it or not, there are fairly worse cases where the entire process needs to be carried out by hand, retouching configuration files or the makefile, and/or compiling the files one by one, but luckily this is becoming less and less common.

In the case of wanting to delete all of the installed software, we will have to use the uninstaller if provided or, otherwise, directly delete the directory or installed files, looking out for potential dependencies.

The tgz packages are fairly common as a backup mechanism for administration tasks, for example, for saving copies of important data, making backups of user accounts or saving old copies of data that we do not know if we will need again. The following process tends to be used: let's suppose that we want to save a copy of the directory "dir", we can type: tar -cvf dir.tar dir (c: compact dir in the file dir.tar) gzip dir.tar (compress) or in a single instruction like:

tar -cvzf dir.tgz dir

The result will be a dir.tgz file. We need to be careful if we are interested in conserving the file attributes and user permissions, as well as possibly links that may exist (we must examine the tar options so that it adjusts to the required backup options).

3.6.2. Fedora/Red Hat: RPM packages


The RPM packages system [Bai03] created by Red Hat represents a step forward, since it includes the management of software configuration tasks and dependencies. Also, the system stores a small database with the already installed packages, which can be consulted and updated with new installations.

Conventionally, RPM packages use a name such as:


where package is the name of the software, version is the numbering of the software version, rev normally indicates the revision of the RPM package, which indicates the number of times it has been built and arq refers to the architecture that it is designed for, whether Intel/AMD (i386, i586, i686, x86_64, em64t, ia64) or others such as alpha, sparc, PPC... The noarch "architecture" is normally used when it is independent, for example, a Set of scripts and src in the case of dealing with source code packages. A typical execution will include running rpm, the options of the operation to be performed, together with one or more names of packages to be processed together.

Example 3-9. Note

The package: apache-1.3.19-23.i686.rpm would indicate that it is Apache software (the web server), in its version 1.3.19, package revision RPM 23, for Pentium II architectures or above.

Typical operations with RPM packages include:

Example 3-10. Example

For a remote case:

rpm -i ftp://site/directory/package.rpm

would allow us to download the package from the provided FTP or web site, with its directory location, and proceed in this case to install the package.

We need to control where the packages come from and only use known and reliable package sources, such as the distribution's own manufacturer or trustworthy sites. Normally, together with the packages, we are offered a digital signature for them, so that we can check their authenticity. The sums md5 are normally used for checking that the package has not been altered and other systems, such as GPG (GNU version of PGP), for checking the authenticity of the package issuer. Similarly, we can find different RPM package stores on Internet, where they are available for different distributions that use or allow the RPM format.

Example 3-11. Note

View the site: www.rpmfind.net.

For a secure use of the packages, official and some third party repositories currently sign the packages electronically, for example, using the abovementioned GPG; this helps us to make sure (if we have the signatures) that the packages come from a reliable source. Normally, every provider (the repository) will include some PGP signature files with the key for its site. From official repositories they are normally already installed, if they come from third parties we will need to obtain the key file and include it in RPM, typically:

$ rpm –import GPG-KEY-FILE

With GPP-KEY-FILE being the GPG key file or URL of the file, normally this file will also have sum md5 to check its integrity. And we can find the keys in the system with:

$ rpm -qa | grep ^gpg-pubkey

we can observe more details on the basis of the obtained key:

$ rpm -qi gpg-key-xxxxx-yyyyy

For a specific RPM package we will be able to check whether it has a signature and with which one it has been used:

$ rpm –checksig -v <package>.rpm

And to check that a package is correct based on the available signatures, we can use:

$ rpm -K <package.rpm>

We need to be careful to import just the keys from the sites that we trust. When RPM finds packages with a signature that we do not have on our system or when the package is not signed, it will tell us and, then, we will have to decide on what we do.

Regarding RPM support in the distributions, in Fedora (Red Hat and also in its derivatives), RPM is the default package format and the one used extensively by the distribution for updates and software installation. Debian uses the format called DEB (as we will see), there is support for RPM (the rpm command exists), but only for consulting or package information. If it is essential to install an rpm package in Debian, we advise using the alien utility, which can convert package formats, in this case from RPM to DEB, and proceed to install with the converted package.

In addition to the distribution's basic packaging system, nowadays each one tends to support an intermediate higher level software management system, which adds an upper layer to the basic system, helping with software management tasks, and adding a number of utilities to improve control of the process.

In the case of Fedora (Red Hat and derivatives) it uses the YUM system, which allows as a higher level tool to install and manage packages in rpm systems, as well as automatic management of dependencies between packages. It allows access to various different repositories, centralises their configuration in a file (/etc/yum.conf normally), and has a simple commands interface.

Example 3-12. Note

YUM in: http://yum.baseurl.org

The yum configuration is based on:

A summary of the typical yum operations would be:

Finally, Fedora also offers a couple of graphics utilities for YUM, pup for controlling recently available updates, and pirutas a software management package. There are also others like yumex, with greater control of yum's internal configuration.

3.6.3. Debian: DEB packages

Debian has interactive tools such as tasksel, which makes it possible to select sub-sets of packages grouped into types of tasks: packages for X, for development, for documentation etc., or such as dselect, which allows us to navigate the entire list of available packages (there are thousands) and select those we wish to install or uninstall. In fact, these are only a front-end of the APT mid-level software manager.

At the command line level it has dpkg, which is the lowest level command (would be the equivalent to rpm), for managing the DEB software packages directly [Deb02], typically dpkg -i package.deb to perform the installation. All sorts of tasks related to information, installation, removal or making internal changes to the software packages can be performed.

The intermediary level (as in the case of Yum in Fedora) is presented by the APT tools (most are apt-xxx commands). APT allows us to manage the packages from a list of current and available packages based on various software sources, whether the installation's own CDs, FTP or web (HTTP) sites. This management is conducted transparently, in such a way that the system is independent from the software sources.

The APT system is configured from the files available in /etc/apt, where /etc/apt/sources.list is the list of available sources; an example could be:

deb http://http.us.debian.org/debian stable main contrib non-free debsrc http://http.us.debian.org/debian stable main contrib non-free deb http://security.debian.org stable/updates main contrib non-free

Where various of the "official" sources for a Debian are compiled (etch in this case, which is assumed to be stable), from which we can obtain the software packages in addition to their available updates. Basically, we specify the type of source (web/FTP in this case), the site, the version of the distribution (stable in this example) and categories of software to be searched for (free, third party contributions, non-free or commercial licenses).

The software packages are available for the different versions of the Debian distribution, there are packages for the stable, testing, and unstable versions. The use of one or the others determines the type of distribution (after changing the repository sources in sources.list). It is possible to have mixed package sources, but it is not advisable, because conflicts could arise between the versions of the different distributions.

Example 3-13. Note

Debian's DEB packages are perhaps the most powerful installation system existing in GNU/Linux. A significant benefit is the system's independence from the sources of the packages (through APT).

Once we have configured the software sources, the main tool for handling them in our system is apt-get, which allows us to install, update or remove from the individual package, until the entire distribution is updated. There is also a front-end to apt-get, called aptitude, whose options interface is practically identical (in fact it could be described as an apt-get emulator, since the interface is equivalent); as benefits it manages package dependencies better and allows an interactive interface. In fact it is hoped that aptitude will become the default interface in the command line for package management in Debian.

Some basic functions of apt-get:

Through this last process, we can keep our distribution permanently updated, updating installed packages and verifying dependencies with the new ones. Some useful tools for building this list are apt-spy, which tries to search for the fastest official sites, or netselect, which allows us to test a list of sites. On a separate note, we can search the official sites (we can configure these with apt-setup) or copy an available source file. Additional (third party) software may need to add more other sources (to etc/sources.list); lists of available source sites can be obtained (for example: http://www.apt-get.org).

Updating a system in particular generates a download of a large number of packages (especially in unstable), which makes it advisable to empty the cache, the local repository, with the downloaded packages (they are kept in /var/cache/apt/archive) that will no longer be used, either with apt-get clean to eliminate them all or with apt-get autoclean to eliminate the packages that are not required because there are already new versions and, in principle, they will no longer be needed. We need to consider whether we may need these packages again for the purposes of reinstalling them, since, if so, we will have to download them again.

The APT system also allows what is known as SecureAPT, which is the secure management of packages through verifying sums (md5) and the signatures of package sources (of the GPG type). If the signatures are not available during the download, apt-get reports this and generates a list of unsigned packages, asking whether they will stop being installed or not, leaving the decision to the administrator. The list of current reliable sources is obtained using:

# apt-key list

The GPG keys of the official Debian sites are distributed through a package, and we can install them as follows:

# apt-get install debian-archive-keyring

Obviously, considering that we have the sources.list with the official sites. It is hoped that by default (depending on the version of Debian) these keys will already be installed when the system initiates. For other unofficial sites that do not provide the key in a package, but that we consider trustworthy, we can import their key, obtaining it from the repository (we will have to consult where the key is available, there is no defined standard, although it is usually on the repository's home page). Using apt-key add with the file, to add the key or also:

# gpg –import file.key # gpg –export –armor XXXXXXXX | apt-key add -

With X being a hexadecimal related to the key (see repository instructions for the recommended way of importing the key and the necessary data).

Another important functionality of the APT system is for consulting package information, using the apt-cache tool, which allows us to interact with the lists of Debian software packages.

Example 3-14. Example

The apt-cache tool has commands that allow us to search for information about the packages, for example:

  • Search packages based on an incomplete name:

    apt-cache search name

  • Show package description:

    apt-cache show package

  • What packages it depends on:

    apt-cache depends package

Other interesting apt tools or functionalities:

- apt-show-versions: tells us what packages may be updated (and for what versions, see option -u).

Other more specific tasks will need to be done with the lowest level tool, such as dpkg. For example, obtaining the list of files of a specific installed package:

dpkg -L package

The full list of packages with

dpkg -l

Or searching for what package an element comes from (file for example):

dpkg -S file

This functions for installed packages; apt-file can also search for packages that are not yet installed.

Finally, some graphic tools for APT, such as synaptic, gnome-apt for gnome, and kpackage or adept for KDE are also worth mentioning, as well as the already mentioned text ones such as aptitude or dselect.

Conclusion: we should highlight that the APT management system (in combination with the dpkg base) is very flexible and powerful when it comes to managing updates and is the package management system used by Debian and its derived distributions such as Ubuntu, Kubuntu, Knoppix, Linex etc.