ThinkSQL ODBC Driver Source

The ThinkSQL ODBC driver source code is now available here.

The driver is pure Delphi but the specification is C-based and very detailed. Memory handling and complex state transitions between client and server made this a large undertaking.

The Java JDBC specification was closely based on ODBC, but garbage collection made the JDBC driver much easier to implement.

It’s worth mentioning that the Python DB API specification is tiny in comparison to the ODBC documents but led to a much simpler, smaller and more elegant driver codebase. The Python driver can do all the things that the ODBC driver can do, but with less than a tenth of the code. It is more readable and maintainable and less fragile, and easier to use. And it actually works on multiple platforms. The surprising thing was how much simpler Python made the low-level network communications (mostly thanks to the struct module), especially considering Java was originally designed for low-level systems. Compare this Python method with the Java and Delphi ones:

The Python version

def getpUCHAR_SWORD(self):
   if self.bufferPtr+_sizeof_short>self.bufferLen:
       if self.bufferPtr==self.bufferLen:
           self.read()
       else:
           return _fail

   s=self.buffer.read(_sizeof_short)
   self.bufferPtr+=_sizeof_short
   si=struct.unpack('<H', s)[0]
   self.bufferPtr+=si
   return self.buffer.read(si)

The Java version

public String getpUCHAR_SWORD() { 
  if (bufferPtr+Global.sizeof_short>bufferLen) {
    if (bufferPtr==bufferLen) {
      Read();
    }
    else { //the buffer is not quite empty
      return Global.failString; 
    }
  }
  short usi=0;
  for (int siz=Global.sizeof_short-1; siz>=0; siz--) {
    usi<<=Global.sizeof_byte;
    short b=(short)buffer[bufferPtr+siz];
    if (b<0) {b=(short)(b+256);}
    usi = (short)(usi | b); //i.e. reverse order
  } 
  bufferPtr=bufferPtr+Global.sizeof_short;
  if (bufferPtr+usi>bufferLen) {
    if (bufferPtr==bufferLen) {
      Read();
    }
    else { //the buffer is not quite empty
      return Global.failString; 
    }
  }
  bufferPtr=bufferPtr+usi;
  return new String(buffer,bufferPtr-usi,(int)usi-1);
}

The Delphi (C-like) version

function TMarshalBuffer.getpUCHAR_SWORD(
                                var puc:pUCHAR;
                                allocated:SWORD;
                                var sw:SWORD):integer;
{RETURNS: ok,
          fail = probably means data is left in buffer, 
                 but not enough 
                 else errors as from Read
 Assumes:
   puc has 'allocated' space allocated by caller,
   unless -1 => allocate here (but caller must free!)
 Note:
   returns buffer data up to the allocated length
   (including 1 character for a null terminator)
}
const routine=':getpUCHAR_SWORD';
var actual:SWORD;
begin
  if bufferPtr+sizeof(sw)>bufferLen then
  begin
    if bufferPtr=bufferLen then
    begin //the buffer is empty
      result:=Read;
      if result<>ok then exit; 
    end
    else
    begin //the buffer is not quite empty
      result:=fail; //buffer overflow
      exit;
    end;
  end;

  move(buffer[bufferPtr],sw,sizeof(sw));
  bufferPtr:=bufferPtr+sizeof(sw);

  if bufferPtr+sw>bufferLen then
  begin
    if bufferPtr=bufferLen then
    begin //the buffer is empty
      result:=Read;
      if result<>ok then exit; 
    end
    else
    begin //the buffer is not quite empty
      result:=fail; //buffer overflow
      exit;
    end;
  end;

  if allocated=DYNAMIC_ALLOCATION then
  begin
    {Now allocate the space for the buffer data}
    getMem(puc,sw+sizeof(nullterm)); 
    allocated:=sw+sizeof(nullterm);
  end;

  if (allocated-sizeof(nullterm))<sw then
    actual:=(allocated-sizeof(nullterm)) 
  else
    actual:=sw;
  move(buffer[bufferPtr],puc^,actual); 
  move(nullterm,(puc+actual)^,sizeof(nullterm));
  bufferPtr:=bufferPtr+sw;
  result:=ok;
end; {getpUCHAR_SWORD}

Run a pipe2py-generated pipe on Google App Engine

Once you’ve used pipe2py to convert your Yahoo! Pipe into a local Python module, you can then host it as your own Google App Engine application. You just need a bit of boilerplate to handle the parameters and output formatting. I’ve created a sample project containing this boilerplate here: http://github.com/ggaughan/testpipe. You’ll need to include a copy of pipe2py, so it can find the modules it needs, and replace the demo pipeline references with your own. See the README for details.

Running Yahoo! Pipes on Google App Engine

Yahoo! Pipes is an excellent tool for processing data. It provides a visual way to aggregate, manipulate, and mashup content from around the web. It’s very much like plumbing with data and is a great metaphor. I’m convinced that this approach is just the beginning, and look forward to connecting systems using pipes in a three-dimensional virtual environment with tactile and audio feedback… soon.

pipe UuvYtuMe3hGDsmRgPm7D0g

Tony Hirst, a prolific Yahoo! Pipes user, had the idea to translate the pipe definitions into code so that they could be run on your own computer, in case the Yahoo! Pipes server was unavailable. This sounded like an interesting challenge so I developed pipe2py. The pipe2py package can compile a Yahoo! Pipe into pure Python source code, or it can interpret the pipe on-the-fly. It supports embedded pipes too. (Not all of the Yahoo! Pipes modules are available yet, but they’re gradually being added: if you find the need for one that’s missing please let me know, or better still provide me with the code for the module.)

The design for the compiled pipes was based on David Beazley’s work on building Python generators into pipelines, together with ideas from SQL query compilers and XProc pipelines. Each Yahoo! Pipes module is implemented as a Python generator which iterates over items provided by an input module and processes them to yield output results. Once these generators are connected together, iterating over the final one will initiate a cascading call to all earlier generators for them to iterate over their inputs and, in turn, yield their output. There are several benefits to this architecture:

  1. the compiled pipeline closely matches the original Yahoo! pipeline
  2. adding new modules is easy because they are loosely coupled
  3. each item is typically passed through the whole pipeline one at a time, so:
    1. memory usage is kept to a minimum
    2. no module is waiting on an earlier module to finish processing the whole data set
  4. by adding queues between the modules they could easily be made to run in parallel, each on a different CPU, to give great scalability

Here’s an example pipe2py session which converts the pipe shown above into Python and then runs it locally:

$ python compile.py -p UuvYtuMe3hGDsmRgPm7D0g
$ python pipe_UuvYtuMe3hGDsmRgPm7D0g.py
Name (default=Lancaster) Neill
{u'title': u'Bob Neill',
...
u'TotalAllowancesClaimedIncTravel': u'157332'}

Since pipe2py can compile pipes into Python modules, it seemed a good idea to try to run them in Google’s cloud via App Engine. So now there’s pipes-engine, which uses pipe2py to run your Yahoo! Pipes on Google’s servers.

pipe2py running Yahoo! Pipes on Google App Engine

You’ll need to log on with your Google account, and then you can take the Id of your Yahoo! Pipe (you can find it in the url when editing a pipe) and add it to the list. pipes-engine will then compile it and store the Python version of it. Clicking the pipe Id will run it on the App Engine. If you change the pipe in Yahoo, you can reload it in pipes-engine to re-compile the latest version (although I hope to automate this step in future).

There’s currently an App Engine timeout of 30 seconds, but Google have said that they are working on increasing that soon.

There were some tricky bits to developing this, like storing the generated Python source in the datastore and then importing it dynamically back from the datastore, and doing so recursively for any embedded pipe imports. Some Python PEP 302 magic helped here.

The pipes-engine.appspot.com service is a proof of concept and needs some more work, not least to provide the output in formats other than json, but I think it proves it’s feasible. Let me know what you think.