Running Yahoo! Pipes on Google App Engine

Yahoo! Pipes is an excellent tool for processing data. It provides a visual way to aggregate, manipulate, and mashup content from around the web. It’s very much like plumbing with data and is a great metaphor. I’m convinced that this approach is just the beginning, and look forward to connecting systems using pipes in a three-dimensional virtual environment with tactile and audio feedback… soon.

pipe UuvYtuMe3hGDsmRgPm7D0g

Tony Hirst, a prolific Yahoo! Pipes user, had the idea to translate the pipe definitions into code so that they could be run on your own computer, in case the Yahoo! Pipes server was unavailable. This sounded like an interesting challenge so I developed pipe2py. The pipe2py package can compile a Yahoo! Pipe into pure Python source code, or it can interpret the pipe on-the-fly. It supports embedded pipes too. (Not all of the Yahoo! Pipes modules are available yet, but they’re gradually being added: if you find the need for one that’s missing please let me know, or better still provide me with the code for the module.)

The design for the compiled pipes was based on David Beazley’s work on building Python generators into pipelines, together with ideas from SQL query compilers and XProc pipelines. Each Yahoo! Pipes module is implemented as a Python generator which iterates over items provided by an input module and processes them to yield output results. Once these generators are connected together, iterating over the final one will initiate a cascading call to all earlier generators for them to iterate over their inputs and, in turn, yield their output. There are several benefits to this architecture:

  1. the compiled pipeline closely matches the original Yahoo! pipeline
  2. adding new modules is easy because they are loosely coupled
  3. each item is typically passed through the whole pipeline one at a time, so:
    1. memory usage is kept to a minimum
    2. no module is waiting on an earlier module to finish processing the whole data set
  4. by adding queues between the modules they could easily be made to run in parallel, each on a different CPU, to give great scalability

Here’s an example pipe2py session which converts the pipe shown above into Python and then runs it locally:

$ python compile.py -p UuvYtuMe3hGDsmRgPm7D0g
$ python pipe_UuvYtuMe3hGDsmRgPm7D0g.py
Name (default=Lancaster) Neill
{u'title': u'Bob Neill',
...
u'TotalAllowancesClaimedIncTravel': u'157332'}

Since pipe2py can compile pipes into Python modules, it seemed a good idea to try to run them in Google’s cloud via App Engine. So now there’s pipes-engine, which uses pipe2py to run your Yahoo! Pipes on Google’s servers.

pipe2py running Yahoo! Pipes on Google App Engine

You’ll need to log on with your Google account, and then you can take the Id of your Yahoo! Pipe (you can find it in the url when editing a pipe) and add it to the list. pipes-engine will then compile it and store the Python version of it. Clicking the pipe Id will run it on the App Engine. If you change the pipe in Yahoo, you can reload it in pipes-engine to re-compile the latest version (although I hope to automate this step in future).

There’s currently an App Engine timeout of 30 seconds, but Google have said that they are working on increasing that soon.

There were some tricky bits to developing this, like storing the generated Python source in the datastore and then importing it dynamically back from the datastore, and doing so recursively for any embedded pipe imports. Some Python PEP 302 magic helped here.

The pipes-engine.appspot.com service is a proof of concept and needs some more work, not least to provide the output in formats other than json, but I think it proves it’s feasible. Let me know what you think.

45 thoughts on “Running Yahoo! Pipes on Google App Engine

  1. Absolutely amazing stuff:-)

    When running pipes workshops, one of the concerns I get from devs about pipes is the legacy issue. pipe2py along with pipes engine is a great solution:-)

    I’ll try to contribute some test cases too… and will have a think around a demo or two using the pipe engine…

  2. Amazing stuff indeed. Here is a related opensource (MIT) project for dataflow programming in python:

    http://pyfproject.org/

    It shares a lot of similarities with yahoo! pipes, including and ajax graphical designer and monitor UI, but also works locally on your machine and is able to use local SQL / FS / XML datastore as input too.

    Maybe pipe2py could be extended to run the pipes on a pyf runtime to get the UI for free.

    Note: there is also an alternative project called pypes that seems to implement the same pattern: http://pypes.org/

  3. Pingback: Backup and Run Yahoo Pipes Pipework on Google App Engine « OUseful.Info, the blog…

  4. Great work! I’ve been using Y! Pipes in my project to provide map of Food Safety Recalls.
    http://food-prints.appspot.com/recalls

    The problem is that there is a limit (200 runs in 10 minutes) and when one exceeds this limit the Pipe will be 999′ed for an hour.

    I have been thinking to reimplemented this completely with GAE but coding Location Builder module in Python seems challenging. I just tried this (Pipe Id: 07799a045d4b31402c0fae6c2c0eb38c) with pipes-engine and got “Failed loading pipe definition from Yahoo”. I’ll try to run this from the command line to see if this a problem of a missing module or something else. Thanks!

  5. Very nice post. Yahoo Pipes is such a powerful tool for mashing up data. Can’t wait to see the exciting possibilities with combining Pipes and Python. Look forward to reading more!

  6. Pretty impressive! Yahoo Pipes is a great service so thanks for pipe2py and sharing the source!

  7. @Aaron: many of the entries in your feed(s) were missing pubDate. I’ve updated pipe2py to handle that more quietly and updated pipes-engine. Your pipe runs now (although it’s close to the 30s App Engine quota – maybe worth adding a filter, for now at least).

  8. @Sargis: the ‘failed loading from Yahoo’ error I think is a timeout either at Yahoo (we get the pipe definition via YQL) or App Engine. It seems to happen quite quickly which is strange, but re-trying usually works.

    However, your pipe does need some modules that aren’t in pipe2py yet (e.g. split and location-builder). They’re on the todo list though.

  9. This is good stuff! Thanks.

    It doesn’t seem to accept (or recognize) text and number inputs though, even when defaults are set in the original Pipe?

  10. @Ian: pipe2py has limited support for inputs. Text and url input modules are working and pipes-engine will use the default values from them at the moment. The next task is to add more input modules (they’re relatively simple to write) and hook them into a web form/url-query so pipes-engine can use them.

  11. Pingback: Yahoo Pipe et Google App Engine : si Yahoo Pipes disparaît, que deviendrons-nous ? « Bibliothèques [reloaded]

  12. pipes-engine now prompts for any inputs before running the pipe (it also handles parameter passing via url query parameters). I still need to add support in pipe2py for input types other than text and url, but many of them are straightforward, so it shouldn’t take long.

  13. My feed 2 article pipe (740c44020747aa0a04c18fdb2814c294) doesn’t work. It would be great if you could add support for pipefeedautodiscovery :D

  14. Pingback: Linkdump: Hating on “NoSQL,” Alan Turing and Ritual Human Sacrifice « Joyeur

  15. Pingback: Peters Linkschleuder – Der Schockwellenreiter

  16. Pingback: The Web Column: Issue No.1 — L'Alpiniste

  17. My pipe takes various RSS feeds that are important to my (my blogs and Picasa albums), munges them as a more customized replacement to Feedburner BuzzBoost and places them on the front of my website. I love this idea — one thing the cloud is missing is a great backup of itself. FYI getting this error when trying the proof of concept on my Pipe:

    Error: Error running: 14f7f41882dd5e4944796d7e1f7832ed : global name ‘pipesubstr’ is not defined]}, “count”:0}

  18. @Dave: the substr module has now been added to pipe2py and pipes-engine, so your pipe should work now.

  19. sorry my bad english.

    This project need two things:

    1. RSS output
    2. instruction how to deploy own pipe2py

    Is it will be implemented? When?

  20. I’m not a programmer, so I’m not exactly sure how this all works, but I am a big user of Yahoo Pipes and I am concerned about Yahoo pulling the plug on it at some point, particularly since the v2 engine project appears stuck in neutral.

    So, I was eager to try this out with one of my Pipes, but it didn’t work.

    Here’s the Pipe ID:

    e04615063e5872eb2209c7d7d41b523f

    And here’s the error message I received:

    Traceback (most recent call last):
    File “/base/python_runtime/python_lib/versions/1/google/appengine/ext/webapp/__init__.py”, line 636, in __call__
    handler.post(*groups)
    File “/base/data/home/apps/pipes-engine/2.348298677040134336/main.py”, line 269, in post
    pipe_def = json.loads(pipe_json)
    File “/base/python_runtime/python_lib/versions/third_party/django-0.96/django/utils/simplejson/__init__.py”, line 232, in loads
    return cls(encoding=encoding, **kw).decode(s)
    File “/base/python_runtime/python_lib/versions/third_party/django-0.96/django/utils/simplejson/decoder.py”, line 251, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
    TypeError: expected string or buffer

    Let me know if you need more info on tracking where this went wrong.

  21. Pingback: Running Yahoo Pipes on Google App Engine | Panicked Zebra

  22. Yahoo Pipes officially sucks now. And in August they’re changing all V1 pipes to V2. Basically the Internet is in for a world of hurt. I was looking to transfer my pipes to appspot. Help help would be appreciated. I have no knowledge of python or Google App engine but I’m willing to learn.

    I was wondering if anybody could write a tutorial on transferring a pipe to appspot. I’d really appreciate it. I just wish I thought of moving sooner. Yahoo Pipes V2 is NOT ready for their planned switchover. EVERYTHING will break.

    Anyway, I copied the generated .py file into Google App engine but it didn’t like that. Nothing but error messages. I’m guessing there’s more to this than meets the eye. I’m sorry for being such a neophyte about this. Any help would be appreciated.

    Thank you.

  23. Hmm, getting :

    The website cannot display the page
    HTTP 500
    Most likely causes:
    •The website is under maintenance.
    •The website has a programming error.

    What you can try:
    Refresh the page.

    Go back to the previous page.

    Pipe ID:3a8945760e12d1ba65350c388062a1ec

    Something wrong with the pipe perhaps?

  24. This is probably your browser hiding the json/error result – perhaps because it’s too short. The error for that pipe is “NameError: global name ‘pipecreaterss’ is not defined” – so it relies on the ‘createrss’ module which isn’t implemented yet.

  25. Yahoo Pipes 2.0 seems to break my pipe, so I’m looking to move off that platform before the dreaded “early August” date. I had tried pipes-engine in January and it worked great on my Pipe, but now I’m getting this error (and I haven’t changed anything in the Pipe):
    Error running: 14f7f41882dd5e4944796d7e1f7832ed : expected string or buffer]}, “count”:0}

    Any thoughts on what might be causing this? Trying to beat the August deadline :-)

    Thanks!
    Dave

  26. @Dave: this was a bug caused by one of the regexes having an empty replacement. It’s fixed now and uploaded to pipes-engine.

  27. Hi,

    What is not clear in any of the blog posts on this service is what you actually do to run this. I’ve created a pipe, added to the list at http://pipes-engine.appspot.com/, run the pipe and out comes a load of code which I suppose is Python. I don’t know Python at all, I’m afraid, but what I don’t understand is what I then do with this output. I would like to display the output on a webpage and get it to automatically update, just like an RSS feed. Is there a step-by-step tutorial that shows me what to do?

    Your assistance would be appreciated. Thanks.

  28. @jamie: The output is in JSON format, so more JavaScript than Python. This is one of the output options given by Yahoo! Pipes and is useful for integrating with other systems. The App Engine site “is a proof of concept and needs some more work, not least to provide the output in formats other than json”. Adding more output modules to provide other formats should be relatively easy, but still needs doing. Until then, you’d need to parse the JSON, e.g. with JavaScript, to display it as html on a website.

  29. Pingback: Quelques éléments pour l’ »autre » infrastructure de l’information sur internet: (flux rss &co) | tlog

  30. This is indeed great work!

    I thought I’d nudge you to include iCal as an output format, if it’s not too much trouble.

    Cheers

  31. I get NameError: global name ‘pipexpathfetchpage’ is not defined
    when trying to use pipe2py. Does this mean Xpath Fetch Page module is not available?

  32. Do you have a clue as to why I’m getting this error message? I’ve loaded 2 pipes and 1 has me download a file (pipe id: 9f34eccc2791dc91550d6091003372fd) which contains title, description, link, generator, items, count but nothing like the output I expect from my pipe.

    The second pipe I have loaded (pipe id: 2d6d5da8d444189d5254de209e0f2e35) give me this error message:

    File “”, line 23, in pipe_2d6d5da8d444189d5254de209e0f2e35
    NameError: global name ‘pipexpathfetchpage’ is not defined

  33. Oops. Sorry. I just read your response again and realize I misunderstood. However, the fetch page module has been deprecated. Do you plan to update?

  34. @Donna: The first pipe, 9f34eccc2791dc91550d6091003372fd, uses an XPATH selector in the sub-element module which isn’t supported yet. I plan to add an pipexpathfetchpage module (and so XPATH support elsewhere) but I’m not sure when I’ll get the time to add it.

  35. What are the limitations pipes-engine and there is currently supporting XPATH?

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>