Hey,
today I happened to write a script to solve a specific problem that it looks like a good deal of people face: rename a Elasticsearch index.
Naturally, there are documented solutions but I didn’t find quickly a script that would get me where I wanted - all the data from an index named a
now being queryable in an index named b
with all the properties set.
Note.: the following code is aimed at Elasticsearch 2.4.6.
Here it comes then.
Reindexing step by step
There are four steps to get towards our goal:
-
Create an Elasticsearch index and populate it with some data;
-
Get the configurations of the original index;
-
Create the new index with the desired configuration;
-
Run `_reindex` action;
-
Drop the old index.
0. Create an Elasticsearch index and populate it with some data
To create an index using the default parameters (e.g, number of shards and replicas) we can issue a POST
against the Elasticsearch HTTP endpoint specifying the desired index (in this case, acme-production
:
curl \
-XPOST \
http://localhost:9200/acme-production
{
"acknowledged": true
}
Which, naturally, has no data indexed:
curl localhost:9200/_cat/indices\?v
health status index pri rep docs.count docs.deleted store.size pri.store.size
green open acme-production 1 0 0 0 130b 130b
Now we populate it with some data:
curl \
-XPOST \
-d '{"title":"Hello world"}' \
http://localhost:9200/acme-production/test/hello
{
"_id": "hello",
"_index": "acme-production",
"_shards": {
"failed": 0,
"successful": 1,
"total": 1
},
"_type": "test",
"_version": 1,
"created": true
}
Which we can verify by looking again at the /_cat/indices
endpoint:
curl localhost:9200/_cat/indices\?v
health status index pri rep docs.count docs.deleted store.size pri.store.size
green open acme-production 1 0 1 0 2.9kb 2.9kb
1. Get the configurations of the original index
Because the renaming is nothing more than “create, copy and delete” we need to create a new index with the properties from the old one. To properly achieve that we must then copy the old configuration:
index_config=$(curl \
localhost:9200/acme-production/_settings,_mappings | \
jq '.[]')
echo "$index_config"
{
"mappings": {
"test": {
"properties": {
"title": {
"type": "string"
}
}
}
},
"settings": {
"index": {
"creation_date": "1511285883996",
"number_of_replicas": "0",
"number_of_shards": "1",
"uuid": "h3z8Hq_5SKuu7MwEJZYcVQ",
"version": {
"created": "2040699"
}
}
}
}
ps.: here I’m making use of jq, the lightweight command-line JSON processor, in order to get the mappings and settings objects from within the bigger object returned by the call to /<index>/_settings,_mappings
. This way we’re able to assign that to a variable and then utilize directly later.
2. Create the new index with the desired configuration
Using the old configuration (stored in the index_config
variable) we’re able to create the the index based on it:
curl \
-XPUT \
-d "$index_config" \
http://localhost:9200/acme-staging
{
"acknowledged": true
}
ps.: even though there’s an uuid
in the $index_config
object there, it doesn’t matter - it’ll get replaced by a new uuid
in the new index.
3. Run _reindex
action
Having both indices properly configured we’re ready to have the data from the old index in the new one:
source=acme-production
dest=acme-staging
payload="{
\"source\": {
\"index\": \"$source\"
},
\"dest\": {
\"index\": \"$dest\",
\"version_type\": \"internal\"
}
}"
curl \
-XPOST \
-d "$payload" \
http://localhost:9200/_reindex
{
"batches": 1,
"created": 1,
"failures": [],
"noops": 0,
"requests_per_second": "unlimited",
"retries": 0,
"throttled_millis": 0,
"throttled_until_millis": 0,
"timed_out": false,
"took": 48,
"total": 1,
"updated": 0,
"version_conflicts": 0
}
By this time you should already have your new index populated. Now it’s a matter of deleting the old index:
4. Drop the old index
If you have no intention of making use of the old index, now it’s time to drop it:
curl \
-XDELETE \
http://localhost:9200/acme-production
{
"acknowledged": true
}
Update - Using aliases and avoiding complete index copies
I received some pretty interesting feedback on Reddit that I’d like to share here.
It turns out that sometimes we can avoid reindex
ing by making use of aliases (see Elasticsearch Indices Aliases).
The idea is that when we need to reference what’s covered by an index by using another name we can create some kind of “pointer” to the real index and perform all the normal operations against this pointer (alias). The API that allows doing that allows us to essentially CRUD
(create, remove, update and delete) aliases, making it totally possible for us to perform what we want: achieve a “renaming” of an index, even if only virtually.
Let’s do it then.
First, create an acme-production
index just like before and add some data:
curl \
-XPOST \
http://localhost:9200/acme-production
curl \
-XPOST \
-d '{"title":"Hello world"}' \
http://localhost:9200/acme-production/test/hello
health status index pri rep docs.count docs.deleted store.size pri.store.size
yellow open acme-production 5 1 1 0 3.4kb 3.4kb
then create an alias
named acme-staging
:
payload='{
"actions": [
{
"add": {
"alias": "acme-staging",
"index": "acme-production"
}
}
]
}'
curl \
-XPOST \
-d "$payload" \
http://localhost:9200/_aliases
if we check the indices we’ll see that we don’t have any new index though:
curl localhost:9200/_cat/indices\?v
health status index pri rep docs.count docs.deleted store.size pri.store.size
yellow open acme-production 5 1 1 0 3.4kb 3.4kb
But that we do have aliases:
curl localhost:9200/_cat/aliases\?v
alias index filter routing.index routing.search
acme-staging acme-production - - -
which allows us to perform queries against acme-staging
and retrieve data from acme-production
:
# request against the index
curl http://localhost:9200/acme-production/test/hello
{"_index":"acme-production","_type":"test","_id":"hello","_version":1,"found":true,"_source":{"title":"Hello world"}}
# now against the alias
curl http://localhost:9200/acme-staging/test/hello
{"_index":"acme-production","_type":"test","_id":"hello","_version":1,"found":true,"_source":{"title":"Hello world"}}
Now, what if we want to disallow requests to the old index? As if we had really renamed it and not duplicated? Then we need to close the old index using the open/close index api:
curl -XPOST http://localhost:9200/acme-production/_close
then we can try to get from acme-production
:
curl http://localhost:9200/acme-production/test/hello
{"error":{"root_cause":[{"type":"index_closed_exception","reason":"closed","index":"acme-production"}],"type":"index_closed_exception","reason":"closed","index":"acme-production"},"status":403}
Cool, what we wanted, huh? Now, if we try to get from acme-staging
:
curl http://localhost:9200/acme-staging/test/hello
{"error":{"root_cause":[{"type":"index_closed_exception","reason":"closed","index":"acme-production"}],"type":"index_closed_exception","reason":"closed","index":"acme-production"},"status":403}
we can’t retrieve either.
It sounds logical to me that we can’t as the alias is just a pointer to the other index (which was closed).
So, to sum up, if you want to have new indices to point to an existing one (as if you were renaming), aliases will save you and you’ll need to perform 0 copying of data.
If you need to have something like “rename” and disallow access to the old index, then alias
won’t help you (will have to use the reindex + delete
strategy.
I never used aliases before and it’s pretty good to know that they exist! It can definitely be very useful some times.
Closing thoughts and resources
As someone who never really dug deep into how Elasticsearch works, I found very easy the whole concept of reindexing. The official documentation is pretty good and with it, I was able to quickly solve the problem. Kudos Elasticsearch team!
- Elasticsearch Reindex API
- Elasticsearch List all indices
- Elasticsearch Indices Aliases
- Elasticsearch Open/Close Index API
I plan to do some more posts regarding Elasticsearch in the future. If you want to come along and learn together, make sure you subscribe to the mailing list.
Thanks,
finis