Reading Yum Repository Data

I’ve spent a lot of time working with RPM in the last couple years, and have had the pleasure of maintaining the IUS Community .

I wanted to share a small utility we use quite often called repodataParser , repodataParser is a Python class for working with RPM repositories, and used in a few of our Django applications .

The idea is all RPM repositories contain a XML file containing details about the package it contains. Lets take CentOS’s Vault for example:

In [1]: from RepoParser.RepoParser import Parser

In [2]: parser = Parser(url='http://vault.centos.org/6.0/os/x86_64/repodata/80c918e87188ac5bba893df689108bb3f43ba2d2a7d36eb3094acdc851025ef7-primary.xml.gz')

We are now provided two methods getList and getPackage ,

lets go over getList first:

In [3]: help(parser.getList)

getList(self) method of RepoParser.RepoParser.Parser instance
    returns a python list of dicts of the nodes in a XML files TagName

According to the doc string we return a Python list of dicts, lets:

In [4]: type(parser.getList())
Out[4]: list

In [5]: len(parser.getList())
Out[5]: 6019

Yup, we have a list with 6019 files, lets have a look at the first one:

In [6]: parser.getList()[0]
Out[6]:
{u'arch': (u'i686', None),
 u'checksum': (u'36099439b7dbc9323588f1999bff9b1738bb8b4df56149eb7ebb5b5226107665',
  {u'pkgid': u'YES', u'type': u'sha256'}),
 u'description': (u'The libjpeg package contains a library of functions for manipulating\nJPEG images, as well as simple client programs for accessing the\nlibjpeg functions.  Libjpeg client programs include cjpeg, djpeg,\njpegtran, rdjpgcom and wrjpgcom.  Cjpeg compresses an image file into\nJPEG format.  Djpeg decompresses a JPEG file into a regular image\nfile.  Jpegtran can perform various useful transformations on JPEG\nfiles.  Rdjpgcom displays any text comments included in a JPEG file.\nWrjpgcom inserts text comments into a JPEG file.',
  None),
 u'format': (u'\n    ', None),
 u'location': (None, {u'href': u'Packages/libjpeg-6b-46.el6.i686.rpm'}),
 u'name': (u'libjpeg', None),
 u'packager': (u'CentOS BuildSystem <http://bugs.centos.org>', None),
 u'size': (None,
  {u'archive': u'289416', u'installed': u'287173', u'package': u'135732'}),
 u'summary': (u'A library for manipulating JPEG image format files', None),
 u'time': (None, {u'build': u'1282396975', u'file': u'1309667078'}),
 u'url': (u'http://www.ijg.org/', None),
 u'version': (None, {u'epoch': u'0', u'rel': u'46.el6', u'ver': u'6b'})}

Now lets have a look at getPackage :

In [7]: help(parser.getPackage)

getPackage(self, package) method of RepoParser.RepoParser.Parser instance
    return a python list of dicts for a package name

We know a CentOS 6 server should provide a php package, so lets look them up.

In [8]: parser.getPackage('php')
Out[8]:
[{u'arch': (u'x86_64', None),
  u'checksum': (u'8387996f9876fd0be5ae30845e8bb4c65371d54c4969ebe61c7e6fa771622f5b',
   {u'pkgid': u'YES', u'type': u'sha256'}),
  u'description': (u'PHP is an HTML-embedded scripting language. PHP attempts to make it\neasy for developers to write dynamically generated webpages. PHP also\noffers built-in database integration for several commercial and\nnon-commercial database management systems, so writing a\ndatabase-enabled webpage with PHP is fairly simple. The most common\nuse of PHP coding is probably as a replacement for CGI scripts.\n\nThe php package contains the module which adds support for the PHP\nlanguage to Apache HTTP Server.',
   None),
  u'format': (u'\n    ', None),
  u'location': (None, {u'href': u'Packages/php-5.3.2-6.el6.x86_64.rpm'}),
  u'name': (u'php', None),
  u'packager': (u'CentOS BuildSystem <http://bugs.centos.org>', None),
  u'size': (None,
   {u'archive': u'3648536', u'installed': u'3647853', u'package': u'1169480'}),
  u'summary': (u'PHP scripting language for creating dynamic web sites', None),
  u'time': (None, {u'build': u'1289553183', u'file': u'1309669007'}),
  u'url': (u'http://www.php.net/', None),
  u'version': (None, {u'epoch': u'0', u'rel': u'6.el6', u'ver': u'5.3.2'})}]

And here is how we would grab a package version for the first php package provided by the repository:

In [9]: php = parser.getPackage('php')

In [10]: php[0]['version'][1]['ver']
Out[10]: u'5.3.2'

repodataParser is pretty rough around the edges, but as you can see it does work. Hopefully this will have helped someone out there needing to check RPM repository XML data.