Search this site


Metadata

Articles

Projects

Presentations

Python's sad xml business and modules vs packages.

So, I've been reading docs on python's xml stuff, hoping there's something simple or comes-default-with-python that'll let me do xpath. Everyone overcomplicates xml processing. I have no idea why. Python seems to have enough alternatives to make dealing with xml less painful.

Standard python docs will lead you astray:

kenya(...ojects/pimp/pimp/controllers) % pydoc xml.dom | wc -l
643
Clearly, the pydoc for "xml.dom" has some nice things, right? I mean, documentation is clearly an indication that THE THING THAT IS DOCUMENTED BEING AVAILABLE. Right?

Sounds great. Let's try to use this 'xml.dom' module!

kenya(...ojects/pimp/pimp/controllers) % python -c 'import xml; xml.dom'
Traceback (most recent call last):
  File "", line 1, in ?
AttributeError: 'module' object has no attribute 'dom'
WHAT. THE. HELL.

Googling around, it turns out that 'xml' is a fake module that only actually works if you have it the 4Suite modules installed? Maybe?

Why include fake modules that provide complete documentation to modules that do not exist in the standard distribution?

Who's running this ship? I want off. I'll swim if necessary.

As it turns out, I made too-strong of an assumption about python's affinity towards java-isms. I roughly equated 'import foo' in python as 'import foo.*' in java. That was incorrect. Importing foo doesn't get you access to things in it's directory, they have to be imported explicity.

In summary, 'import xml' gets you nothing. 'import xml.dom' gets you nothing. If you really want minidom's parser, you'll need 'import xml.dom.minidom' or a 'from import' variant.

On another note, the following surprised me. I had a module, foo/bar.py. I figured 'from foo import *' would grab it. This means 'from xml.dom import *' doesn't get you minidom and friends.

Perhaps I was hoping for too much, but maybe it's better to import explicitly. If that's the case ,then why push exceptions that allow '*' to be imported only from modules, not packages?