Welcome the company of trees

treecutter - static docbook website generator

2013-02-05

Revision History
Revision 2011-05-03FU
Initial version
Revision 2012-02-07FU
Added detailed usage instructions

treecutter is a tool to publish docbook documents from a directory tree. The structure of the docbook documents have to follow some simple rules.

Docbook rules

  • Currently just using article documents.

  • The tag title must be present

  • The tag titleabbrev represents the menu name and must be present.

  • If an xi:include exists with attribute parse='text' it will be run if it is an executable python or perl script. The output is expected to be valid docbook at the place from where it is called. It is a "build-time" execution.

The Name?

The name treecutter comes from the name of this site (tree.se), and we are trying to form documents, like cookie-cutters form and style doe. Tree cutters also forms trees (maybe even xml trees) into something useful. The name was also inspired by Charles Ammi Cutter who invented the Cutter Expansive Classification for libraries. If the project will deal with that classification is still open.

Code

The code is available in the git repository https://source.tree.se/git/treecutter.git The licence of the code is not yet set.

The processing of the xml documents follows several steps detailed below. The image below gives a small overview of what is going on in the code. Important to the site is the sitemap.txt which tells the program how the site should be linked together. The xml structure is scanned to map this view and adjust the sitemap.txt as needed. Once the site structure is determined each link is traversed and the different language pages per link are scanned and set up. This also creates the menues. Once all pages have been processed they are published to the server via rsync.

Beskrivning saknas

[494x263 (10.6KiB)]

Prerequisite

The project used to use amara as xml parser, but switched to lxml to get better distribution support. Amara was not packaged for debian for example.

I will try to build a debian package for treecutter once the application has stabilized. As for dependencies, currently Cheetah is used for the html templating, and installing python-cheetah is needed. (might need to patch it to do unicode well, I have no real cheetah experience but this Compiler.py.patch works) I specifically do not want to put encoding in the template, as to keep it as clean as possible.

The amara xslt processing was not strong enough to handle docbooks style-sheets, so currently an external application is used.

xsltproc

. Lxml might be able to do it better and that is a next step.

Make sure that the treecutter.py directory is in your $PATH.


    export PYTHONPATH=$HOME/treecutter:$PYTHONPATH
    export PATH=$HOME/treecutter/bin:$PATH
    

Setup

Setting up a proper treecutter structure is not hard but it requires several steps. This setup example details how to set up a new site for hosting by the webserver apache, but can be easily adapted to other webservers as well as it is just static pages.

Setting up the structure for the site

It is good practice to keep the data of your site in version control. You do not need to, but it you will find the benefits of doing the setup work now, especially if the site grows to include more contributors. Please refere to Git over apache2 https using git-http-backend with gitweb on how to set up a git repository.

The recomended structure for a site is ./doc/www/public ./doc/my/secure ./material ./style/www

The doc structure will contain the docbook documents. The extra subdirectory www is to also be able to keep several sites in the structure. ./doc/www/public corresponds to http://www.example.com/ and allows for a future ./doc/my/secure structure that corresponds to https://my.example.com/. If no SSL version of the site is planned the directory secure can also be removed.

The material directory, is a suggestion where data related to the site can be tracked. For example the logo of the site can be developed under version control in the material/svg directory, and once it is finalized it can be rendered and included into the doc structure.

The style directory, will contain the global stylesheet of the site. Docbook itself does provide some html formating, but not to the extent that it can build site wide menues. In this directory each subsite (http://www.example.com , https://my.example.com) can have its own style. Individual sections of the site can not (eg. http://www.example.com/directory).

Mandatory files in this the style directory are docbook.xsl a stylesheet wrapper where you can add your docbook formating options, and index.$LANG.html.tmpl which contains the main html template of the site. As the site can have several languages there needs to exist one template per language.

Either you can write both the site from scratch or you can use the predefined styles from treecutter and adapt them to your needs. To move one of the styles in treecutters style directory one either just copies it into the project or tracks it via git "vendor branches", subtree merge.

Writing content

Now the setup should be ready and we can start writing content. There are some rules to how an article must look like, and there are some possibilities to add "build time dynamic content". Files named index.$LANG.xml will will be transformed to index.$LANG.html and hence be the Index files in apaches representation of the site. ( doc/public/index.$LANG.xml will becom http://www.example.com/ ). Currently language suport is requried for the full site, so each document published should be availible in the languages selected. This is planned to change.

The rules for the docbook articles are given above.

Publishing the site

Now the content is written, the style is set up, all is checked in review. Time to publish the site. Make sure that you have set up ssh to use ssh-agent to enable passwordless connection to the site. Go to the directory that contains the root of the site, the ./doc/www directory.

treecutter --style $HOME/site/example.com/style/www/ --output example.com:/srv/www/example.com/www/htdocs

. Now the site should be up and running!

Test the code

To test the code unpack the test.tar.gz to the /tmp directory of your computer. Decide on what style you would like and

cd /tmp/root

treecutter --style $HOME/treecutter/style/turquoiseswirls/ --output /tmp/html/

. This should execute the script as : Prepareing the input executing address.py [ 2.76 s] (Big Ben, Bridge Street, London, England) Prepare [ 2.77 s] Language [ 0.00 s] Render [ 1.77 s] Template [ 0.12 s] Resources[ 0.00 s] Sitemap [ 0.03 s] Publish [ 0.20 s] Total [ 4.96 s]

The output is meant for Content Negotiation, but you can still do simple testing by

cd /tmp/html

and start a simple webserver

python -m SimpleHTTPServer 8000

. Going to http://localhost:8000/ should display the test site. Rerun the command with a different style to change the look of the site.

The stock styles are not very well developed at this point, as only I use them and I have developed individual sites more, but it is a starting point. They mostly come from http://freehtml5templates.com and have just been slightly adapted for treecutter use. For example the http://tree.se/ template is based on the style freggies. There are limitations and bugs but it is still useful. If you have ideas, questions and comments please let me know.