Hey,

today I happened to write a script to solve a specific problem that it looks like a good deal of people face: rename a Elasticsearch index.

Naturally, there are documented solutions but I didn’t find quickly a script that would get me where I wanted - all the data from an index named a now being queryable in an index named b with all the properties set.

Note.: the following code is aimed at Elasticsearch 2.4.6.

Here it comes then.

Reindexing step by step

There are four steps to get towards our goal:

  1. Create an Elasticsearch index and populate it with some data;
  2. Get the configurations of the original index;
  3. Create the new index with the desired configuration;
  4. Run _reindex action;
  5. Drop the old index.

0. Create an Elasticsearch index and populate it with some data

To create an index using the default parameters (e.g, number of shards and replicas) we can issue a POST against the Elasticsearch HTTP endpoint specifying the desired index (in this case, acme-production:

curl \
        -XPOST \
        http://localhost:9200/acme-production
{
    "acknowledged": true
}

Which, naturally, has no data indexed:

curl localhost:9200/_cat/indices\?v

health status index           pri rep docs.count docs.deleted store.size pri.store.size 
green  open   acme-production   1   0          0            0       130b           130b 

Now we populate it with some data:

curl \
        -XPOST \
        -d '{"title":"Hello world"}' \
        http://localhost:9200/acme-production/test/hello

{
    "_id": "hello",
    "_index": "acme-production",
    "_shards": {
        "failed": 0,
        "successful": 1,
        "total": 1
    },
    "_type": "test",
    "_version": 1,
    "created": true
}

Which we can verify by looking again at the /_cat/indices endpoint:

curl localhost:9200/_cat/indices\?v

health status index           pri rep docs.count docs.deleted store.size pri.store.size 
green  open   acme-production   1   0          1            0      2.9kb          2.9kb 

1. Get the configurations of the original index

Because the renaming is nothing more than “create, copy and delete” we need to create a new index with the properties from the old one. To properly achieve that we must then copy the old configuration:

index_config=$(curl \
        localhost:9200/acme-production/_settings,_mappings | \
        jq '.[]')

echo "$index_config"

{
    "mappings": {
        "test": {
            "properties": {
                "title": {
                    "type": "string"
                }
            }
        }
    },
    "settings": {
        "index": {
            "creation_date": "1511285883996",
            "number_of_replicas": "0",
            "number_of_shards": "1",
            "uuid": "h3z8Hq_5SKuu7MwEJZYcVQ",
            "version": {
                "created": "2040699"
            }
        }
    }
}

ps.: here I’m making use of jq, the lightweight command-line JSON processor, in order to get the mappings and settings objects from within the bigger object returned by the call to /<index>/_settings,_mappings. This way we’re able to assign that to a variable and then utilize directly later.

2. Create the new index with the desired configuration

Using the old configuration (stored in the index_config variable) we’re able to create the the index based on it:

curl \
        -XPUT \
        -d "$index_config" \
        http://localhost:9200/acme-staging

{
    "acknowledged": true
}

ps.: even though there’s an uuid in the $index_config object there, it doesn’t matter - it’ll get replaced by a new uuid in the new index.

3. Run _reindex action

Having both indices properly configured we’re ready to have the data from the old index in the new one:

source=acme-production
dest=acme-staging
payload="{
  \"source\": {
    \"index\": \"$source\"
  },
  \"dest\": {
    \"index\": \"$dest\",
    \"version_type\": \"internal\"
  }
}"

curl \
        -XPOST \
        -d "$payload" \
        http://localhost:9200/_reindex

{
    "batches": 1,
    "created": 1,
    "failures": [],
    "noops": 0,
    "requests_per_second": "unlimited",
    "retries": 0,
    "throttled_millis": 0,
    "throttled_until_millis": 0,
    "timed_out": false,
    "took": 48,
    "total": 1,
    "updated": 0,
    "version_conflicts": 0
}

By this time you should already have your new index populated. Now it’s a matter of deleting the old index:

4. Drop the old index

If you have no intention of making use of the old index, now it’s time to drop it:

curl \
        -XDELETE \
        http://localhost:9200/acme-production

{
    "acknowledged": true
}

Update - Using aliases and avoiding complete index copies

I received some pretty interesting feedback on Reddit that I’d like to share here.

It turns out that sometimes we can avoid reindexing by making use of aliases (see Elasticsearch Indices Aliases).

The idea is that when we need to reference what’s covered by an index by using another name we can create some kind of “pointer” to the real index and perform all the normal operations against this pointer (alias). The API that allows doing that allows us to essentially CRUD (create, remove, update and delete) aliases, making it totally possible for us to perform what we want: achieve a “renaming” of an index, even if only virtually.

Let’s do it then.

First, create an acme-production index just like before and add some data:

curl \
        -XPOST \
        http://localhost:9200/acme-production
curl \
        -XPOST \
        -d '{"title":"Hello world"}' \
        http://localhost:9200/acme-production/test/hello

health status index           pri rep docs.count docs.deleted store.size pri.store.size 
yellow open   acme-production   5   1          1            0      3.4kb          3.4kb 

then create an alias named acme-staging:

payload='{
    "actions": [
        {
            "add": {
                "alias": "acme-staging",
                "index": "acme-production"
            }
        }
    ]
}'

curl \
        -XPOST \
        -d "$payload" \
        http://localhost:9200/_aliases 

if we check the indices we’ll see that we don’t have any new index though:

curl localhost:9200/_cat/indices\?v
health status index           pri rep docs.count docs.deleted store.size pri.store.size 
yellow open   acme-production   5   1          1            0      3.4kb          3.4kb 

But that we do have aliases:

curl localhost:9200/_cat/aliases\?v
alias        index           filter routing.index routing.search 
acme-staging acme-production -      -             -              

which allows us to perform queries against acme-staging and retrieve data from acme-production:

# request against the index
curl http://localhost:9200/acme-production/test/hello
{"_index":"acme-production","_type":"test","_id":"hello","_version":1,"found":true,"_source":{"title":"Hello world"}}

# now against the alias
curl http://localhost:9200/acme-staging/test/hello
{"_index":"acme-production","_type":"test","_id":"hello","_version":1,"found":true,"_source":{"title":"Hello world"}}

Now, what if we want to disallow requests to the old index? As if we had really renamed it and not duplicated? Then we need to close the old index using the open/close index api:

curl -XPOST http://localhost:9200/acme-production/_close

then we can try to get from acme-production:

 curl http://localhost:9200/acme-production/test/hello   

{"error":{"root_cause":[{"type":"index_closed_exception","reason":"closed","index":"acme-production"}],"type":"index_closed_exception","reason":"closed","index":"acme-production"},"status":403}

Cool, what we wanted, huh? Now, if we try to get from acme-staging:

curl http://localhost:9200/acme-staging/test/hello
{"error":{"root_cause":[{"type":"index_closed_exception","reason":"closed","index":"acme-production"}],"type":"index_closed_exception","reason":"closed","index":"acme-production"},"status":403}

we can’t retrieve either.

It sounds logical to me that we can’t as the alias is just a pointer to the other index (which was closed).

So, to sum up, if you want to have new indices to point to an existing one (as if you were renaming), aliases will save you and you’ll need to perform 0 copying of data.

If you need to have something like “rename” and disallow access to the old index, then alias won’t help you (will have to use the reindex + delete strategy.

I never used aliases before and it’s pretty good to know that they exist! It can definitely be very useful some times.

Closing thoughts and resources

As someone who never really dug deep into how Elasticsearch works, I found very easy the whole concept of reindexing. The official documentation is pretty good and with it, I was able to quickly solve the problem. Kudos Elasticsearch team!

I plan to do some more posts regarding Elasticsearch in the future. If you want to come along and learn together, make sure you subscribe to the mailing list.

Thanks,

finis