Monday, January 08, 2007

Periodic Table of Visualization Methods



An extremely cool collection of visualization methods. You can mouse over to get an example of each.

Friday, January 05, 2007

PEP 8 checker

PEP 8 checker for the anal retentive in all of us. :)


/Users/gergely/twp/twp/bot.py:32:16: W291 trailing whitespace
sys.exit(1)
^
JCR: Trailing whitespace is superfluous.

/Users/gergely/twp/twp/bot.py:34:1: E302 expected 2 blank lines, found 1
def refresh_images(limit=dbbot._default_limit):
^
Separate top-level function and class definitions with two blank lines.

Method definitions inside a class are separated by a single blank line.

Extra blank lines may be used (sparingly) to separate groups of related
functions. Blank lines may be omitted between a bunch of related
one-liners (e.g. a set of dummy implementations).

Use blank lines in functions, sparingly, to indicate logical sections.


I'm the kind of masochistic guy who actually enjoys this kind of thing, but I need to figure out a better way to integrate it with PyDev. I wonder what's the easiest way to make Eclipse jump to the line where the error happened. Hmm...

TurboGears 1.0 and beyond

Kevin Dangoor, TurboGears project lead announced 1.0 this week on IRC.

I was there and I have this pretty screenshot to prove it! :)



Maybe even more importantly, TurboGears has a new leader: Alberto Valverde.

I was too busy to stay there for the followup discussions, but the gist of it seemed to be that a heavily WSGI based approach (sounded much like Pylons) will solve all problems including world hunger and the conflict in the Middle East.

Another equally important thing was the direction that is planned for TurboGears 2.0: decentralization and modularization. From what I understand people want to fork off chunks of TurboGears into fairly independent and externally reusable projects and keep TurboGears a small chunk of glue code that connects them together.

On the one hand this is not new, TurboGears started out by integrating a bunch of preexisting tools. ToscaWidgets was forked off recently from the TurboGears widget code. I agree that this approach can work to a certain extent. My guess is that in the case of TG the current change of direction (actually returning to its minimalistic roots) was more organizational than architectural. (Not that you can separate the two: see Conway's Law)

But there are pros and cons to decoupling. Unix command line tools are a good example. They were great, because there were standard interfaces between them which let them develop and be tested independently. But there is also a huge lack of conceptual integrity compared to monolithic frameworks. The naming conventions are inconsistent, different switches are used for the same functionality in different programs, etc.

The big advantage of monolithic frameworks is consistency in design. Modules use the same naming and coding style, have similar layouts. They reuse the low level utility code, the documentation tool, the testing framework, the bug reporting, the build and packaging system. There is one well known place to ask questions, to look for documentation, to download the latest stable release.

Linux distributions are a good example of both the strengths and weaknesses of heavily modularized systems. Probably the biggest advantage is that there is a huge amount of code reuse, and you can decentralize work to thousands of volunteers, maintaining the individual packages which can evolve independently.

On the other hand some combinations of packages are not tested properly, only certain combination of packages are well supported. If you report a bug that has been fixed in the upstream version, but not in your distro, you're on your own. Linux and Firefox is a good example.

People who want to support your software have a harder time when instead of a standard way, you have an infinite combination of modules. Just think of LSB and desktop Linux vs Mac OS or Windows.

We'll see how loose coupling works out for TurboGears. Interesting times ahead.

Tuesday, January 02, 2007

7zip is amazing

7zip just blew my pants off. Back in the day I though I was edgy when I used bzip2 instead of gzip, but this is just amazing.

I downloaded the full edit history of the Hungarian Wikipedia to run some analysis on it and 7z compressed it to 1/87th of its original size.

barcika:~/wp/huwiki$ du -k *
11502112 huwiki-20061205-pages-meta-history.xml
131808 huwiki-20061205-pages-meta-history.xml.7z
Of course this was superverbose XML, but the compression rate is still very impressive. The same original compressed with bz2 is almost 4 times as big. 7zip gonna be my first choice for archiving large log files.