I recently did some research at work on existing libraries, frameworks, and tools for managing semi-automated workflows and pipelines of tasks. Here’s a survey of many of them.
There are a number of loose, overlapping terms for these kinds of systems: BPM (business process management), workflow, pipeline, flowchart, state machine, visual programming, dataflow, flow-based programming, dependency management, task management, etc. They’re useful for web searches, even if not a lot more.
In practice, they tend to fall into a few broad categories that don’t overlap too much:
- BPM. These are primarily people-based, ie each stage usually corresponds to something a human needs to do. Examples include CRM, bug/issue tracking, ticketing, approval chain, and document management.
- Data processing. These are primarily data based, usually for large datasets or streaming data sources. Examples include web search indexing, ETL (extract/transform load) for data warehouses, MapReduce/Hadoop, and other large scale data pipelines.
make
style. These are individually driven, batch automation pipelines. They’re kicked off manually and focus more on the workflow itself than the data. Unlike BPM, individual stages are often automated.- Visual programming. Let non-programmers draw boxes and arrows that become a computer program. People have been building these systems for decades! Examples include Squeak, Yahoo Pipes, Hypercard (old school!), maybe Visual Basic, etc.
Also, here are a few of the many standards:
- XPDL: XML Process Definition Language, a Workflow Management Coalition standard.
- Wf-XML, a Workflow Management Coalition standard.
- BPEL (Business Process Execution Language), an OASIS standard.
- YAWL (Yet Another Workflow Language). Supported by the YAWL foundation and the Workflow Patterns initiative.
I focused on open source Python projects here, but even so, this is nowhere near comprehensive. There are tons more out there. See Wikipedia’s Comparison of BPEL engines, among many others.
name | description | engine | run log | authoring UI | definition format | parameters | community | last release/ checkin as of 1/19/2011 | notes |
---|---|---|---|---|---|---|---|---|---|
PyF | dataflow framework | command line, web | web dashboard | web-based, screenshots | many! csv, xml, flat files | built-in serializers | active. list, IRC | 1 mo ago; days ago | pluggable architecture. APIs at multiple levels. polished. |
Pypes (overview, user guide, developer guide) | data processing, flow-based programming, ETL | command line, REST API | text logging | web-based, screenshots | JSON | serialized strings | some. list | 1y ago; 3 mos ago | requires stackless python |
PAPY (PDF guide) | flow-based programming | no | text logging | no | python code | manual marshalling into RPyC | not much | 6 mos ago | just a library. aimed at dataflow, large/streaming datasets. |
Kamaelia (intro, code examples) | continuous dataflow; very generic | command line | not exactly | read-only visualization | python code | serializers | funded by BBC. fairly active. list | 1 mo ago; days ago | from BBC Research, used for misc things there. |
Bonita Open Solution | full-featured BPM, workflow, automation | server-based | web-based dashboard | web-based WYSIWYG UI | supports data formats (XML, XPDL, BPEL) and image formats | unknown | mature, active community. forum, bug tracker, etc. | weeks ago | open source (GPL), commercial support contracts. |
YAWL (Yet Another Workflow Language) | based on workflowpatterns.com. | local/server, web interface | web-based Monitor Service for current runs, no persistence though (?) | dedicated GUI | custom language: YAWL | string serialization | looks mature but small. forum | months ago, days ago | commercial-ish? somewhat heavy, academic, architect astronaut-ish. |
ProcessMaker | BPM, process automation, human oriented | custom, server-based. may not be able to plug in code. | web-based, inbox style | full-fledged WYSIWYG UI | unknown | unknown | some. wiki, forum, bug tracker | 6 mos ago? not frequent. | commercial. can’t find the source, may be closed |
OpenFlow (white paper) | people-based BPM, workflow, task management. based on Zope. | blocks are generally expected to be done by humans or external systems | none | none | none | N/A | dead? | 2003? | from Icube, which looks dead. |
GoFlow (online demo, white paper) | people-based BPM, workflow, task management. a Django mixin. | blocks are generally expected to be done by humans or external systems | Django admin interface | no | Django models | custom, in code | dead. list | 2.5 ago, 2y ago | clone of OpenFlow |
Finite State Machine Editor | finite state machine GUI, compiler, library | none (compiler generates c++/python code from definition) | text logging | dedicated UI, screenshots | XML | code | dead. forums | 4 yrs ago, 2 yrs ago | |
State Machine Compiler | compile time code generator | none | none | unknown | custom .sm file format | none | not a lot. forums, contributors | 1 mo ago, 1 mo ago | from an ACM member. old school, unix graybeard feel. |
Windows Workflow Foundation (intro) | workflow, automation, batch pipelining. blocks can be any CLR language. | windows based | dashboards | dedicated GUI. screenshots (scroll down) | XML, similar to Azure Fabric | code | big, mature. | recent | windows and CLR based. MSDN, shared source, etc. |
MGLTools Vision (presentation) | visual programming, dataflow, very graphics oriented. | through the UI | unknown, maybe a dashboard | custom X application, comprehensive, screenshots | unknown but UI supports save/load | unknown | decent. academic focus. forum | 2.5y ago | focused on images, life sciences, data processing. non-commercial license only; commercial requires permission. |
VisTrails | exploratory data processing workflows | command line, GUI, or server | text logging, pluggable | dedicated GUI, screenshots, and visual diff! | unknown | custom, pluggable | from U of Utah, NYU, IBM, et al. list, contributors, users | days ago, days ago | focused on exploratory, dynamically changing workflows while running, change history, image processing and life sciences |
Makeflow/Weaver | distributed computing/data processing framework. implements abstractions like map, mapreduce, all pairs, etc. | command line. basically like make over a DAG (Makeflow) with inline python code (Weaver) | none | none | code | code | from Notre Dame’s CS dept. academic, decent. list | 2 mos ago, days ago | |
itools.workflow | state machine library | none | none | none | code/text | code | decent. list, IRC | weeks ago, days ago | |
Joblib | pipelining/distributed computing library | none | configurable text logging | none | code | code | not much. list | two months ago, two months ago | minimal |
pomsets | distributed computing, workflow management | command line launcher, supports multiple environments (e.g. Hadoop, EC2) | none | dedicated GUI, screenshots | JSON, depends on execution environment | string serialized | dead. forum | 6 mos ago; repository isn’t public | problematic license, paid for commercial use. |
ruote | workflow engine | command line | none | none | ruby, xml, json | code | active. list, IRC, users | days ago | ruby! |
SpiffWorkflow | workflow/state machine/flowchart execution library. part of Spiff CMS. | command line? | unknown | none | unknown | code | not a lot. list | three months ago | barely any documentation |
hurry.workflow (guide) | BPM, people-based workflow/task management. mixin for zope. | zope | nothing beyond whatever zope provides | none | code | code | not much | 9 mos ago |
Nice list, I was striking out on finding info on Python/Django Workflow Engines until I got here, thanks!
I recently did the investigation for active workflows projects specifically for django – and here is the result list http://stackoverflow.com/questions/6795328/workflow-frameworks-for-django/25717038#25717038
Just reached this page today. Very interesting.
Was thinking… any major update since it was written?
Best.