Workflow software survey

I recently did some research at work on existing libraries, frameworks, and tools for managing semi-automated workflows and pipelines of tasks. Here’s a survey of many of them.

There are a number of loose, overlapping terms for these kinds of systems: BPM (business process management), workflow, pipeline, flowchart, state machine, visual programming, dataflow, flow-based programming, dependency management, task management, etc. They’re useful for web searches, even if not a lot more.

In practice, they tend to fall into a few broad categories that don’t overlap too much:

  • BPM. These are primarily people-based, ie each stage usually corresponds to something a human needs to do. Examples include CRM, bug/issue tracking, ticketing, approval chain, and document management.
  • Data processing. These are primarily data based, usually for large datasets or streaming data sources. Examples include web search indexing, ETL (extract/transform load) for data warehouses, MapReduce/Hadoop, and other large scale data pipelines.
  • make style. These are individually driven, batch automation pipelines. They’re kicked off manually and focus more on the workflow itself than the data. Unlike BPM, individual stages are often automated.
  • Visual programming. Let non-programmers draw boxes and arrows that become a computer program. People have been building these systems for decades! Examples include Squeak, Yahoo Pipes, Hypercard (old school!), maybe Visual Basic, etc.

Also, here are a few of the many standards:

I focused on open source Python projects here, but even so, this is nowhere near comprehensive. There are tons more out there. See Wikipedia’s Comparison of BPEL engines, among many others.

name description engine run log authoring UI definition format parameters community last release/ checkin as of 1/19/2011 notes
PyF dataflow framework command line, web web dashboard web-based, screenshots many! csv, xml, flat files built-in serializers active. list, IRC 1 mo ago; days ago pluggable architecture. APIs at multiple levels. polished.
Pypes (overview, user guide, developer guide) data processing, flow-based programming, ETL command line, REST API text logging web-based, screenshots JSON serialized strings some. list 1y ago; 3 mos ago requires stackless python
PAPY (PDF guide) flow-based programming no text logging no python code manual marshalling into RPyC not much 6 mos ago just a library. aimed at dataflow, large/streaming datasets.
Kamaelia (intro, code examples) continuous dataflow; very generic command line not exactly read-only visualization python code serializers funded by BBC. fairly active. list 1 mo ago; days ago from BBC Research, used for misc things there.
Bonita Open Solution full-featured BPM, workflow, automation server-based web-based dashboard web-based WYSIWYG UI supports data formats (XML, XPDL, BPEL) and image formats unknown mature, active community. forum, bug tracker, etc. weeks ago open source (GPL), commercial support contracts.
YAWL (Yet Another Workflow Language) based on workflowpatterns.com. local/server, web interface web-based Monitor Service for current runs, no persistence though (?) dedicated GUI custom language: YAWL string serialization looks mature but small. forum months ago, days ago commercial-ish? somewhat heavy, academic, architect astronaut-ish.
ProcessMaker BPM, process automation, human oriented custom, server-based. may not be able to plug in code. web-based, inbox style full-fledged WYSIWYG UI unknown unknown some. wiki, forum, bug tracker 6 mos ago? not frequent. commercial. can’t find the source, may be closed
OpenFlow (white paper) people-based BPM, workflow, task management. based on Zope. blocks are generally expected to be done by humans or external systems none none none N/A dead? 2003? from Icube, which looks dead.
GoFlow (online demo, white paper) people-based BPM, workflow, task management. a Django mixin. blocks are generally expected to be done by humans or external systems Django admin interface no Django models custom, in code dead. list 2.5 ago, 2y ago clone of OpenFlow
Finite State Machine Editor finite state machine GUI, compiler, library none (compiler generates c++/python code from definition) text logging dedicated UI, screenshots XML code dead. forums 4 yrs ago, 2 yrs ago
State Machine Compiler compile time code generator none none unknown custom .sm file format none not a lot. forums, contributors 1 mo ago, 1 mo ago from an ACM member. old school, unix graybeard feel.
Windows Workflow Foundation (intro) workflow, automation, batch pipelining. blocks can be any CLR language. windows based dashboards dedicated GUI. screenshots (scroll down) XML, similar to Azure Fabric code big, mature. recent windows and CLR based. MSDN, shared source, etc.
MGLTools Vision (presentation) visual programming, dataflow, very graphics oriented. through the UI unknown, maybe a dashboard custom X application, comprehensive, screenshots unknown but UI supports save/load unknown decent. academic focus. forum 2.5y ago focused on images, life sciences, data processing. non-commercial license only; commercial requires permission.
VisTrails exploratory data processing workflows command line, GUI, or server text logging, pluggable dedicated GUI, screenshots, and visual diff! unknown custom, pluggable from U of Utah, NYU, IBM, et al. list, contributors, users days ago, days ago focused on exploratory, dynamically changing workflows while running, change history, image processing and life sciences
Makeflow/Weaver distributed computing/data processing framework. implements abstractions like map, mapreduce, all pairs, etc. command line. basically like make over a DAG (Makeflow) with inline python code (Weaver) none none code code from Notre Dame’s CS dept. academic, decent. list 2 mos ago, days ago
itools.workflow state machine library none none none code/text code decent. list, IRC weeks ago, days ago
Joblib pipelining/distributed computing library none configurable text logging none code code not much. list two months ago, two months ago minimal
pomsets distributed computing, workflow management command line launcher, supports multiple environments (e.g. Hadoop, EC2) none dedicated GUI, screenshots JSON, depends on execution environment string serialized dead. forum 6 mos ago; repository isn’t public problematic license, paid for commercial use.
ruote workflow engine command line none none ruby, xml, json code active. list, IRC, users days ago ruby!
SpiffWorkflow workflow/state machine/flowchart execution library. part of Spiff CMS. command line? unknown none unknown code not a lot. list three months ago barely any documentation
hurry.workflow (guide) BPM, people-based workflow/task management. mixin for zope. zope nothing beyond whatever zope provides none code code not much 9 mos ago

4 thoughts on “Workflow software survey

  1. Just reached this page today. Very interesting.
    Was thinking… any major update since it was written?

    Best.

Leave a Reply

Your email address will not be published. Required fields are marked *