Shown in the diagram above are the 3 main subsystems of the NEMA architecture. They are:
- User interface
- Workflow processing
- Remote executors
The NEMA system is primarily implemented using Java-based APIs, libraries and frameworks, such as Java RMI, JINI, the Spring Framework, Hibernate, Apache Jackrabbit, and Jetty application server.
The NEMA user interface (a.k.a. DIY) provides tools for users to prepare and execute Music Information Retrieval (MIR) compute jobs by configuring MIR workflows and components.
The first step in preparing a job is to select an MIR "workflow template" to run. We call these templates because they are not runnable in their default state—the user must first configure the properties of the workflow components in order to create a "workflow instance." The image below is a screen capture of the interface for configuring component properties.
Because the ultimate goal of the NEMA user is to execute their MIR codes against IMIRSEL-curated music collections, a major portion of the workflow configuration process involves uploading and configuring the MIR codes to run on NEMA systems. Using the NEMA DIY interface, users are able to fine tune their executable configuration by specifying system properties, executable arguments, environment variables and input/output file formats. These settings are preserved as an "executable profile," which gets stored in the Content Repository Service for later use during workflow processing.
The NEMA system accepts user-submitted codes written in a wide variety of languages including Java, MATLAB, C, C++, Perl, Python, Wine, Ruby, and shell scripts.
After the user has finished configuring their MIR workflow for execution, the DIY application submits the workflow instance to the Flow Service, which persists the flow to the Content Repository service and runs the job.
The workflow processing subsystem consists of:
- The Flow Service, which dispatches and monitors jobs
- Meandre servers, which process flow instances
- And, the Content Repository Service, where job results are stored
The Flow Service provides an abstraction layer between a cluster of Meandre servers and the UI web application. The UI web application calls the Flow Service to run user-configured workflows. Besides scheduling and dispatching jobs, the Flow Service implements functionality for load balancing, job monitoring, job status notification and Meandre server monitoring.
Meandre provides a RESTful web service API for running and monitoring workflows. The NEMA Meandre cluster is configured with a single head node, which the Flow Service queries for information about installed flows and components, and N worker nodes. The worker nodes do nothing but execute flows.
The image above is a screen capture of a NEMA workflow opened within the Meandre workbench. The Meandre workbench is used by NEMA system administrators to create MIR workflow templates. It is not part of the NEMA user interface.
User-submitted MIR codes are processed within the Remote Executor subsystem. This subsystem consists of:
- Flow components running on the Meandre servers, which delegate processing to the Executor Services
- Executor Services which actually run user-submitted codes
- And, the Content Repository Service, where user-submitted codes, user-configured executable profiles, and execution results are stored
The Remote Executor subsystem uses a client-service model. The client side is wrapped by a Meandre component. When the component fires, the client downloads the user-configured executable profile from the Content Repository Service, then delegates processing to a compatible Executor Service instance. The client side then monitors the remote process lifecycle, output/error streams, and process results.
Upon startup, an Executor Service instance advertises itself, and the executable types it supports, to the lookup service. Clients use the lookup service to discover compatible executors. When it receives a process request, an Executor Service downloads the user-supplied executable archive from the Content Repository Service, then starts the process. When the process has completed, the service uploads results that were produced to the Content Repository Service.