What happens when a pipeline is created in Concourse CI?

Hey,

this blog post is the continuation of the previous post (How to build and run Concourse CI locally). In this one, I try to understand better how one of its main components works internally to keep track of pipelines and have builds triggered.

Let’s dig into what goes on when a pipeline is submitted to Concourse.

The API

As a consumer of the “concourse service”, the API server is the first point that we touch. The API server in this case is atc:

Concourse CI interaction with the Concourse scheduler (ATC)

From the words of the atc repository:

atc is the brain of Concourse. It’s responsible for scheduling builds across the cluster of workers, providing the API for the system, as well as serving the web interface.

In the last post, when we submitted a pipeline to Concourse, atc was the component that the cli (fly) was sending it to.

To have an example, let’s take that very same pipeline:

jobs:
- name: 'hello'
  plan:
  - task: 'say-hello'
    config:
      platform: 'linux'
      image_resource:
        type: 'docker-image'
        source: {repository: 'alpine'}
      run:
        path: 'echo'
        args: ['hello']

and then submit it using fly:

fly \
        set-pipeline \
        --pipeline=hello-world \
        --config=./hello.yml \
        --target=local

Fly is interesting in its own; in summary it does the follow (see commands/set_pipeline and commands/internal/setpipelinehelpers):

creates an internal representation of the new configuration submitted (parses the yaml it got from --config);
loads a possible existing configuration for that pipeline (asks atc for it based on the pipeline name);
performs a diff between the existing pipeline and the new one;
shows a message highlighting these diffs and then asks for confirmation to proceed;
applies the configuration by making use of the concourse client go-concourse ;
the configuration save call made from go-concourse touches the following endpoint in atc: /api/v1/teams/:team_name/pipelines/:pipeline_name/config".

Now that the configuration landed in atc, api/configserver/save.go#SaveConfig gets touched.

This piece does two things:

performs a second validation (fly already validated it, but as this api might be public it takes the defensive approach);
saves the configuration.

Fair enough, but, what “saves the configuration” means?

Saving the pipeline configuration

Saving such configuration means making the pipeline persistent.

Interaction between Concourse ATC and PostgreSQL

The process of saving the pipeline is based on two possibilities:

we’re creating a plain new pipeline;
there’s already a pipeline with such name.

In the first case, it sets a row in the pipelines table and creates an extra table to keep track of build events for that pipeline (pipeline_build_events_<pipeline_id>d) - pretty neat. Once that’s done, it starts populating the other tables (jobs, resources, resource_types, …).

In the case that there’s already a pipeline established for that name, an update is performed, all previous jobs, resources and resource_types are made inactive, and task caches are cleared. Once that’s done, now it populates the jobs, resources, … tables with active entries (such that in the end, it has only the new ones marked as active).

What’s in the DB?

After we ran set-pipeline, we should have some stuff filled in our database.

# log in as `postgres` (we know we don't need password
# because of the `atc` initialization at `./dev/atc`)
psql \
        --username=postgres \
        --no-password


# list the databases that we have
\l
                                 List of databases
   Name    |
-----------+
 atc       |    <<<<
 postgres  |
 template0 |
           |
 template1 |
           |
(4 rows)


# connect to the ATC database
\c atc
You are now connected to database "atc" as user "postgres".


# list all the tables that are there
\dt                                               
                                          
|         Name             |
+--------------------------+
| base_resource_types      |
| build_events             |
...
| jobs_serial_groups       |
| next_build_inputs        |
| pipeline_build_events_2  | <<< the generated build_events table
| pipelines                | <<< our pipeline should be here
...
| worker_task_caches       |
| workers                  |
(32 rows)


# Retrieve all the pipelines saved
SELECT * FROM pipelines;
 version | id |    name     | paused | ordering |        last_scheduled         | team_id | public | groups 
---------+----+-------------+--------+----------+-------------------------------+---------+--------+--------
       3 |  2 | hello-world | f      |        2 | 2018-02-05 18:58:00.205094-05 |       1 | f      | null
(1 row)

While that’s interesting per se, looking at the table definitions, we can understand a bit more about how data is structured internally by looking at the references.

Referenced by:

    # `builds` keeps track of the build executions that
    # are initiated. It has fields that mark the status
    # of the build, what's the engine that ran the
    # tasks, if it was manually triggered or not,
    # which pipeline the build belongs to ...
    TABLE "builds" 
        CONSTRAINT "builds_pipeline_id_fkey" 
        FOREIGN KEY (pipeline_id) 
        REFERENCES pipelines(id) 
        ON DELETE CASCADE

    TABLE "jobs" 
        CONSTRAINT "jobs_pipeline_id_fkey" 
        FOREIGN KEY (pipeline_id) 
        REFERENCES pipelines(id) 
        ON DELETE CASCADE

    TABLE "resource_types" 
        CONSTRAINT "resource_types_pipeline_id_fkey" 
        FOREIGN KEY (pipeline_id) 
        REFERENCES pipelines(id) 
        ON DELETE CASCADE

    TABLE "resources" 
        CONSTRAINT "resources_pipeline_id_fkey" 
        FOREIGN KEY (pipeline_id) 
        REFERENCES pipelines(id) 
        ON DELETE CASCADE

And then to understand what are the most critical queries (or at least guess), look at the indices:

Indexes:
    "pipelines_pkey" 
        PRIMARY KEY, btree (id)

    "pipelines_name_team_id" 
        UNIQUE CONSTRAINT, 
        btree (name, team_id)

    "pipelines_team_id" 
        btree (team_id)

What’s next?

The next step is getting to know more about how does a work register itself against tsa, another component that makes up the architecture.

I’d like to dig deep into how that process looks like and what are the components involved.

If you’re curious about it too, check out the next posts!

Let me know if you have any questions or comments. I probably got something wrong here, and I’d really like to know about it - I’m cirowrc on Twitter, feel free to reach out.

Have a good one!

finis