treecutter - static docbook website generator
2013-02-05
Revision History | ||
---|---|---|
Revision | 2011-05-03 | FU |
Initial version | ||
Revision | 2012-02-07 | FU |
Added detailed usage instructions |
treecutter is a tool to publish docbook documents from a directory tree. The structure of the docbook documents have to follow some simple rules.
Docbook rules
Currently just using
article
documents.The tag
title
must be presentThe tag
titleabbrev
represents the menu name and must be present.If an
xi:include
exists with attribute parse='text' it will be run if it is an executable python or perl script. The output is expected to be valid docbook at the place from where it is called. It is a "build-time" execution.
The Name?
The name treecutter comes from the name of this site (tree.se), and we are trying to form documents, like cookie-cutters form and style doe. Tree cutters also forms trees (maybe even xml trees) into something useful. The name was also inspired by Charles Ammi Cutter who invented the Cutter Expansive Classification for libraries. If the project will deal with that classification is still open.
Code
The code is available in the git repository https://source.tree.se/git/treecutter.git The licence of the code is not yet set.
The processing of the xml documents follows several steps detailed below. The image below gives a small overview of what is going on in the code. Important to the site is the sitemap.txt which tells the program how the site should be linked together. The xml structure is scanned to map this view and adjust the sitemap.txt as needed. Once the site structure is determined each link is traversed and the different language pages per link are scanned and set up. This also creates the menues. Once all pages have been processed they are published to the server via rsync.
Prerequisite
The project used to use amara as xml parser, but switched to lxml to get better distribution support. Amara was not packaged for debian for example.
I will try to build a debian package for treecutter once the application has stabilized. As for dependencies, currently Cheetah is used for the html templating, and installing python-cheetah is needed. (might need to patch it to do unicode well, I have no real cheetah experience but this Compiler.py.patch works) I specifically do not want to put encoding in the template, as to keep it as clean as possible.
The amara xslt processing was not strong enough to handle docbooks style-sheets, so currently an external application is used.
xsltproc
. Lxml might be able to do it better and that is a next step.
Make sure that the treecutter.py directory is in your $PATH.
export PYTHONPATH=$HOME/treecutter:$PYTHONPATH
export PATH=$HOME/treecutter/bin:$PATH
Setup
Setting up a proper treecutter structure is not hard but it requires several steps. This setup example details how to set up a new site for hosting by the webserver apache, but can be easily adapted to other webservers as well as it is just static pages.
Setting up the structure for the site
It is good practice to keep the data of your site in version control. You do not need to, but it you will find the benefits of doing the setup work now, especially if the site grows to include more contributors. Please refere to Git over apache2 https using git-http-backend with gitweb on how to set up a git repository.
The recomended structure for a site is
./doc/www/public
./doc/my/secure
./material
./style/www
The doc structure will contain the docbook documents. The extra
subdirectory www is to also be able to keep several sites in the
structure. ./doc/www/public
corresponds to
http://www.example.com/ and allows for a future ./doc/my/secure
structure that
corresponds to https://my.example.com/. If no SSL version of the
site is planned the directory secure
can also be removed.
The material
directory,
is a suggestion where data related to the site can be tracked.
For example the logo of the site can be developed under version
control in the material/svg
directory, and once it
is finalized it can be rendered and included into the doc
structure.
The style
directory, will
contain the global stylesheet of the site. Docbook itself does
provide some html formating, but not to the extent that it can
build site wide menues. In this directory each subsite
(http://www.example.com , https://my.example.com) can have its
own style. Individual sections of the site can not
(eg. http://www.example.com/directory).
Mandatory files in this the style
directory are
docbook.xsl
a stylesheet wrapper where you
can add your docbook formating options, and
index.$LANG.html.tmpl
which contains the
main html template of the site. As the site can have several
languages there needs to exist one template per language.
Either you can write both the site from scratch or you can use the predefined styles from treecutter and adapt them to your needs. To move one of the styles in treecutters style directory one either just copies it into the project or tracks it via git "vendor branches", subtree merge.
Writing content
Now the setup should be ready and we can start writing content.
There are some rules to how an article must look like, and there
are some possibilities to add "build time dynamic content".
Files named index.$LANG.xml
will will be
transformed to index.$LANG.html and hence be the Index files in
apaches representation of the site. (
doc/public/index.$LANG.xml
will becom
http://www.example.com/ ). Currently language suport is requried
for the full site, so each document published should be
availible in the languages selected. This is planned to change.
The rules for the docbook articles are given above.
Publishing the site
Now the content is written, the style is set up, all is checked
in review. Time to publish the site. Make sure that you have set
up ssh to use ssh-agent to enable passwordless connection to the
site. Go to the directory that contains the root of the site,
the ./doc/www
directory.
treecutter --style $HOME/site/example.com/style/www/ --output example.com:/srv/www/example.com/www/htdocs
. Now the site should be up and running!
Test the code
To test the code unpack the test.tar.gz to the /tmp directory of your computer. Decide on what style you would like and
cd /tmp/root
treecutter --style $HOME/treecutter/style/turquoiseswirls/ --output /tmp/html/
.
This should execute the script as :
Prepareing the input
executing address.py [ 2.76 s] (Big Ben, Bridge Street, London, England)
Prepare [ 2.77 s]
Language [ 0.00 s]
Render [ 1.77 s]
Template [ 0.12 s]
Resources[ 0.00 s]
Sitemap [ 0.03 s]
Publish [ 0.20 s]
Total [ 4.96 s]
The output is meant for Content Negotiation, but you can still do simple testing by
cd /tmp/html
and start a simple webserver
python -m SimpleHTTPServer 8000
. Going to http://localhost:8000/ should display the test site. Rerun the command with a different style to change the look of the site.
The stock styles are not very well developed at this point, as
only I use them and I have developed individual sites more, but it
is a starting point. They mostly come from http://freehtml5templates.com
and have just been slightly adapted for treecutter use. For
example the http://tree.se/ template is
based on the style freggies. There are limitations and bugs but
it is still useful. If you have ideas, questions and comments
please let me know. <fred@tree.se>