Hey,
this blog post is the continuation of the previous post (How to build and run Concourse CI locally). In this one, I try to understand better how one of its main components works internally to keep track of pipelines and have builds triggered.
Let’s dig into what goes on when a pipeline is submitted to Concourse.
The API
As a consumer of the “concourse service”, the API server is the first point that we touch. The API server in this case is atc:
From the words of the atc
repository:
atc is the brain of Concourse. It’s responsible for scheduling builds across the cluster of workers, providing the API for the system, as well as serving the web interface.
In the last post, when we submitted a pipeline to Concourse, atc
was the component that the cli (fly) was sending it to.
To have an example, let’s take that very same pipeline:
jobs:
- name: 'hello'
plan:
- task: 'say-hello'
config:
platform: 'linux'
image_resource:
type: 'docker-image'
source: {repository: 'alpine'}
run:
path: 'echo'
args: ['hello']
and then submit it using fly
:
fly \
set-pipeline \
--pipeline=hello-world \
--config=./hello.yml \
--target=local
Fly
is interesting in its own; in summary it does the follow (see commands/set_pipeline and commands/internal/setpipelinehelpers):
- creates an internal representation of the new configuration submitted (parses the
yaml
it got from--config
); - loads a possible existing configuration for that pipeline (asks
atc
for it based on the pipeline name); - performs a diff between the existing pipeline and the new one;
- shows a message highlighting these diffs and then asks for confirmation to proceed;
- applies the configuration by making use of the concourse client go-concourse ;
- the configuration save call made from
go-concourse
touches the following endpoint inatc
:/api/v1/teams/:team_name/pipelines/:pipeline_name/config"
.
Now that the configuration landed in atc
, api/configserver/save.go#SaveConfig gets touched.
This piece does two things:
- performs a second validation (
fly
already validated it, but as thisapi
might be public it takes the defensive approach); - saves the configuration.
Fair enough, but, what “saves the configuration” means?
Saving the pipeline configuration
Saving such configuration means making the pipeline persistent.
The process of saving the pipeline is based on two possibilities:
- we’re creating a plain new pipeline;
- there’s already a pipeline with such name.
In the first case, it sets a row in the pipelines
table and creates an extra table to keep track of build events for that pipeline (pipeline_build_events_<pipeline_id>d
) - pretty neat. Once that’s done, it starts populating the other tables (jobs
, resources
, resource_types
, …).
In the case that there’s already a pipeline established for that name, an update
is performed, all previous job
s, resource
s and resource_types
are made inactive, and task caches are cleared. Once that’s done, now it populates the jobs
, resources
, … tables with active
entries (such that in the end, it has only the new ones marked as active).
What’s in the DB?
After we ran set-pipeline
, we should have some stuff filled in our database.
# log in as `postgres` (we know we don't need password
# because of the `atc` initialization at `./dev/atc`)
psql \
--username=postgres \
--no-password
# list the databases that we have
\l
List of databases
Name |
-----------+
atc | <<<<
postgres |
template0 |
|
template1 |
|
(4 rows)
# connect to the ATC database
\c atc
You are now connected to database "atc" as user "postgres".
# list all the tables that are there
\dt
| Name |
+--------------------------+
| base_resource_types |
| build_events |
...
| jobs_serial_groups |
| next_build_inputs |
| pipeline_build_events_2 | <<< the generated build_events table
| pipelines | <<< our pipeline should be here
...
| worker_task_caches |
| workers |
(32 rows)
# Retrieve all the pipelines saved
SELECT * FROM pipelines;
version | id | name | paused | ordering | last_scheduled | team_id | public | groups
---------+----+-------------+--------+----------+-------------------------------+---------+--------+--------
3 | 2 | hello-world | f | 2 | 2018-02-05 18:58:00.205094-05 | 1 | f | null
(1 row)
While that’s interesting per se, looking at the table definitions, we can understand a bit more about how data is structured internally by looking at the references.
Referenced by:
# `builds` keeps track of the build executions that
# are initiated. It has fields that mark the status
# of the build, what's the engine that ran the
# tasks, if it was manually triggered or not,
# which pipeline the build belongs to ...
TABLE "builds"
CONSTRAINT "builds_pipeline_id_fkey"
FOREIGN KEY (pipeline_id)
REFERENCES pipelines(id)
ON DELETE CASCADE
TABLE "jobs"
CONSTRAINT "jobs_pipeline_id_fkey"
FOREIGN KEY (pipeline_id)
REFERENCES pipelines(id)
ON DELETE CASCADE
TABLE "resource_types"
CONSTRAINT "resource_types_pipeline_id_fkey"
FOREIGN KEY (pipeline_id)
REFERENCES pipelines(id)
ON DELETE CASCADE
TABLE "resources"
CONSTRAINT "resources_pipeline_id_fkey"
FOREIGN KEY (pipeline_id)
REFERENCES pipelines(id)
ON DELETE CASCADE
And then to understand what are the most critical queries (or at least guess), look at the indices:
Indexes:
"pipelines_pkey"
PRIMARY KEY, btree (id)
"pipelines_name_team_id"
UNIQUE CONSTRAINT,
btree (name, team_id)
"pipelines_team_id"
btree (team_id)
What’s next?
The next step is getting to know more about how does a work register itself against tsa
, another component that makes up the architecture.
I’d like to dig deep into how that process looks like and what are the components involved.
If you’re curious about it too, check out the next posts!
Let me know if you have any questions or comments. I probably got something wrong here, and I’d really like to know about it - I’m cirowrc on Twitter, feel free to reach out.
Have a good one!
finis