Tuesday, July 21, 2009

Python in a web page

"Gestalt is a library released by MIX Online Labs that allows you to write Ruby, Python & XAML code in your (X)HTML pages. It enables you to build richer and more powerful web applications by marrying the benefits of expressive languages, modern compilers, AJAX & RIAs with the write » save » refresh development model of the web."

Python on a web page! In Silverlight .. which means it is time to learn Mono and .Net and all that stuff.

Update: I did nothing to learn any of that stuff then one day I walked into the office and they said "oh, we need you to work on a C# project.. on a Windows box"

Thursday, June 18, 2009

SQLite Gotcha 2

Creating a table with a column of type INT PRIMARY KEY will not result in automatic assignment of a unique value into that column when new rows are added. Why not? In part because INT is not a type in SQLite (but you can use it and the column will have integer affinity but you have to provide the value) and partly because rowid assignment only occurs on columns defined as INTEGER PRIMARY KEY.

Take a look at the documentation on SQLite data types (http://www.sqlite.org/datatype3.html). You may also be surprised to discover that you can declare a column of one type then store data of a different type in it.

Sunday, June 7, 2009

SQLite Gotcha

ROWID is a reserved word in SQLite which seems reasonable.. but it has a synonym OID so if you declare a table with a column called OID then select from that table without using the table name as a prefix, you will not get what you expect.

If you are using an ORM to abstract away the database then you may never realise what is going on..

Unicode Gotcha

What will this give you?

>>> astring = None
>>> print unicode(astring)

If you said None then you will be surprised to find the answer is u"None" which is not at all the same thing. Really messed up my day..

Monday, June 1, 2009

Bing Sucks

Bing, the new search engine from Microsoft, sucks. This conclusion is based on one simple test:
Enter "internet search" as your search term and the results do not contain Yahoo, Google, Bing or LiveSearch. Dogpile is the number one hit.

Remove the word internet and now you get Yahoo listed top.

Try the same test on Yahoo and it lists Bing first in a sponsored results box, followed by Yahoo and Google .. and then Dogpile. Huh?

Sunday, May 10, 2009

New Things

I am trying to be more organized on my Mac. As part of this I have added some new apps: Cyber Duck (a pretty good FTP client) and Syrinx for twitter. They work, they don't get in the way and they look they are part of the system.

Sunday, April 12, 2009

Regex Exploration

The Question

I was asked a question on the way out of the office at the end of the week: what is the difference between a regex match and a regex search?

It didn't seem like a difficult question but it stumped me for a little while. Both a search and a match should use a regular expression and evaluate it against a text. Perhaps a regex search traverses the input string and returns one result at a time and a regex match does.. hm.. or maybe a regex match is used for validation and a regex search is more exploratory?

The individuals asking the question were looking at the Python regex documentation (or rather AMK's very excellent regex howto ) which says "match() function only checks if the RE matches at the beginning of the string while search() will scan forward through the string for a match" which is interesting because it means that the Python regex match effectively inserts a leading anchor before attempting evaluation. Both functions return a MatchObject.

Wait a minute, you may say, you have the documentation to answer the original question! Not exactly because this question was not inspired by an abstract search for knowledge but a performance problem. The other interesting part was that Python documentation was being studied but the code is in C++ with no Python involved.

If you need a regex library for C++ then a good place to look is at the Boost home page which is “...one of the most highly regarded and expertly designed C++ library projects in the world” and has libraries for all sorts of things like graph theory, linear algebra or interprocess communication. So what does the Boost Regex library say about match and search? How about this: "the result is true only if the expression matches the whole of the input sequence. If you want to search for an expression somewhere within the sequence then use regex_search". Of note is the fact that the return type of match is boolean.

Did you get that? The Python regex match/search answer is a matter of the start position of the match and the Boost C++ answer is whether the input sequence is consumed.

Performance Evaluation


Another alternative for a regex library for the C++ developer is the Perl Compatible Regular Expression library (see the website) which offers a single execute function instead of a search or match function, with a flag parameter modifying behaviour. One suggestion I heard was that the PCRE library performs faster than the Boost library because "maybe the Boost library sucks?" but as Carl Sagan said "Extraordinary claims require extraordinary evidence" or, in other words, how do you figure?

I wrote a small app that generates multiple random strings by appending together words randomly picked from a small dictionary and then evaluates a number of regular expressions. Here's the results (slightly re-arranged):

Boost regex search, pattern: yellowgreensapphireruby.*bluered
12.50 s

Boost regex search with continuous flag, pattern: yellowgreensapphireruby.*bluered
0.07 s

Boost regex search with any flag, pattern: yellowgreensapphireruby.*bluered
12.51 s

Boost regex match, pattern: yellowgreensapphireruby.*bluered
0.05 s

PCRE regex, pattern: yellowgreensapphireruby.*bluered
1.47 s

Boost regex search, pattern: ^yellowgreensapphireruby.*bluered
6.59 s

Boost regex search with continuous flag, pattern: ^yellowgreensapphireruby.*bluered
0.07 s

Boost regex search with any flag, pattern: ^yellowgreensapphireruby.*bluered
6.57 s

Boost regex match, pattern: ^yellowgreensapphireruby.*bluered
0.05 s

PCRE regex, pattern: ^yellowgreensapphireruby.*bluered
0.02 s

Boost regex search, pattern: yellowgreensapphireruby.*bluered$
14.16 s

Boost regex search with continuous flag, pattern: yellowgreensapphireruby.*bluered$
0.07 s

Boost regex search with any flag, pattern: yellowgreensapphireruby.*bluered$
14.15 s

Boost regex match, pattern: yellowgreensapphireruby.*bluered$
0.05 s

PCRE regex, pattern: yellowgreensapphireruby.*bluered$
2.01 s

Boost regex search, pattern: ^yellowgreensapphireruby.*bluered$
6.59 s

Boost regex search with continuous flag, pattern: ^yellowgreensapphireruby.*bluered$
0.07 s

Boost regex search with any flag, pattern: ^yellowgreensapphireruby.*bluered$
6.59 s

Boost regex match, pattern: ^yellowgreensapphireruby.*bluered$
0.05 s

PCRE regex, pattern: ^yellowgreensapphireruby.*bluered$
0.02 s

Saturday, April 11, 2009

Eric's Law

If you think the compiler is wrong, you are 99.999% certain to be the problem.

*This number may vary according to the platform and the compiler.

Saturday, April 4, 2009

Web2py rocks more than the alternatives?

I need to find some spare time, perhaps just some of that discretionary sleeping time, to take a look at web2py. I have a small project in mind - updating my friends website to have a simple calendar and appointment system - but I have been getting slammed at the office porting an application to AIX (yes, it's still in use) and busy at home with the twins.

Sunday, January 25, 2009

Linus comes to his senses

According to a link on Slashdot, Linus Torvalds has finally come to his senses and now agrees with me that Gnome is a better choice for a desktop than KDE.