In the Big Data Europe framework, the Big Data Integrator is an application that can be thought as a “starter kit” to start working and implementing big data pipelines in your process. It is the minimal standalone system so you can create a project with multiple docker containers, upload it & make it run using a nice GUI.
Architecture
You can think of the Big Data Integrator as a placeholder. It acts as a “skeleton” application where you can plug & play different big data services from the big data europe platform, and add and develop your own.
At it’s core it is a simple web application that will render each different service’s frontends inside it, so it is easy to navigate between each system providing a sense of continuity in your workflow.
The basic application to start from is constituted of several components:
- Stack Builder: this application allows users to create a personalized docker-compose.yml file describing the services to be used in the working environment. It is equipped with hinting & search features to ease discovery and selection of components.
- Swarm UI: after the docker-compose.yml has been created in the Stack Builder, it can be uploaded into a github repository, and from the SwarmUI users can clone the repository and launch the containers using docker swarm from a nice graphical user interface, from there one can start, stop, restart, scale them, etc…
- HTTP Logger: provides logging of all the http traffic generated by the containers and pushes it into an elasticsearch instance, to be visualized with kibana. It is important to note that containers to be observed must run always with the
logging=true
label activated. - Workflow Builder: it helps define a specific set of steps that have to be executed in sequence, as a “workflow”. This adds functionality like docker healthchecks but more fine-grained. To allow the Workflow Builder to enforce a workflow for a given stack (docker-compose.yml), the mu-init-daemon-service needs to be added as part of the stack.
That service will be the “referee” that imposes the steps defined in the workflow builder. For more information check it’s repository.
Systems are organized following a microservices architecture and run together using a docker-compose script, some of them sharing microserviecs common to all architectures, like the identifier, dispatcher, or resource. This is a more visual representation of the basic architecture:
Installation & Usage
- Clone the repository
- Per each one of the subsystems (stack builder, http logger, etc..) used, check their repository’s README for it may be some small quirks to take into account before running each piece.
- Run the edit-hosts.sh script. This is to assign url’s to the different services in the integrator.
docker-compose up
will run the services together.- Visit
integrator-ui.big-data-europe.aksw.org
to access the application’s entry point.
How to add new services
- Add the new service(s) to docker-compose.yml, it is important to expose the VIRTUAL_HOST & VIRTUAL_PORT environment variables for the frontend application of those services, to be accessible by the integrator (e.g):
new-service-frontend:
image: bde2020/new-service-frontend:latest
links:
- csswrapper
- identifier:backend
expose:
- "80"
environment:
VIRTUAL_HOST: "new-service.big-data-europe.aksw.org"
VIRTUAL_PORT: "80"
- Add an entry in /etc/hosts to point the url to localhost (or wherever your service is running) (e.g):
127.0.0.1 workflow-builder.big-data-europe.aksw.org
127.0.0.1 swarm-ui.big-data-europe.aksw.org
127.0.0.1 kibana.big-data-europe.aksw.org
(..)
127.0.0.1 new-service.big-data-europe.aksw.org
- Modify the file integrator-ui/user-interfaces to add a link to the new service in the integrator UI.
{
"data": [
...etc .. ,
{
"id": 1,
"type": "user-interfaces",
"attributes": {
"label": "My new Service",
"base-url": "http://new-service.big-data-europe.aksw.org/",
"append-path": ""
}
}
]
}
Have fun with it!