Support for multiple Index schema and retention

We route lots of messages into our graylog system, e.g. access logs, debug logs from test environment, infrastructure logs etc. At the moment all these messages go into a single index and share the same retention policies. We would like to route incoming messages based on the input or some sort of other criteria into different indices with different retention polices, e.g. we want to keep access logs around and searchable for about two weeks but debugging messages from test or staging environment are pretty useless after one or two days.

  • Guest
  • Apr 4 2017
  • Shipped
  • Attach files
  • Matt Maloney commented
    April 04, 2017 20:49
    We are planning on addressing this requirement in a future release. Will post status update as soon as we have committed it to a release.
  • Jared O. commented
    April 04, 2017 20:49

    I'd suggest adding this functionality to the indices page, having the default (unchangeable) being the graylog2_ index then having the option to create new indexes from within the page (that will then be incremented in the same fashion). 

     

    If additional indexes have been created you should be able to configure them with individual retention options that currently exist (size/time/etc) and route to the new index by a drop down settings on the input or the stream.

  • anders larsson commented
    April 04, 2017 20:49

    yes and u can have diffrent reqiuremnts from customer how long time you have to store logs.

    so be able to set it would be very usefull

  • Cho Injoong commented
    April 04, 2017 20:49

    @Jared, What a great idea! I thnk your suggestion will be enhance the usability of graylog2 much more.

  • Jorge Ponte Martínez commented
    April 04, 2017 20:49

    Hello, is there any news about this feature?

    Best Regards

  • Guest commented
    April 04, 2017 20:49

    Many other tools can use Elasticsearch directly but some don't allow you to designate an index, plus you cannot create an index alias to the graylog2 index.  

  • Seb ROMMENS commented
    April 04, 2017 20:49

    Very very usefull feature ! pleaaaaaaaaaaaase

  • Jorge Ponte Martínez commented
    April 04, 2017 20:49

    any plans?

  • Matt Maloney commented
    April 04, 2017 20:49

    Yes, we are planning on implementing a basic version in the upcoming 2.0 release. Stay tuned for the alpha/beta announcements over the next several weeks. 

  • tok commented
    April 04, 2017 20:49

    One cannot overemphasize the importance of this feature!

    We would like to use Graylog in a telco environment but due to legal compliance we are required to delete certain logs after several days. But keeping _all the other logs_ just this short time too is impractical.

    Unfortunately, I could not find an indication for this feature in the current alphas. Is it still planned for 2.0?

  • Antonio Recio commented
    April 04, 2017 20:49

    any news about this subject?

  • Antonio Recio commented
    April 04, 2017 20:49

    any plans about this subject?

  • Olivier Boukili commented
    April 04, 2017 20:49

    same as tok, for legal reasons, we are required to delete certain logs after several days, would really appreciate an update on this feature.

  • Michael Ward commented
    April 04, 2017 20:49

    Just wondering if this has been implemented yet? We'd love to be able to have different retention times for our various logs.

  • Anwar Mian commented
    April 04, 2017 20:49

    This feature should have been added from the beginning.  This is so important.  In Splunk, for example, you can specify an index, source, host, and sourcetype for each input via config files.  Each index can contain multiple sources of data that are related so that you can join fields.  It seems like we are limited to default active index in elasticsearch through graylog-web inputs.

  • Guest commented
    April 04, 2017 20:49

    The new "Premium/Enterprise" archive feature doesn't cover this issue.  Being able to make use of  other indexes in elasticsearch needs to be a core feature.  

  • Jan Doberstein commented
    April 04, 2017 20:49

    The requested feature is to route messages to different indices based on any kind of rule. This is not yet a feature and is not yet planed.

  • Guest commented
    April 04, 2017 20:49

    @Jan This is true that this feature request is about routing incoming messages to a different index.  It is also about having multiple indexes and managing retention for said indexes(see title).  I am only pointing out that being able to use the search interface to search over multiple indexes is also a requirement because routing logs to a different index and not being to search that index (or any other index than "graylog2" for that matter) would be self defeating.  

  • Jan Doberstein commented
    April 04, 2017 20:49

    Hej


    if writing / routing to different indices is done, for us this includes searching in those.

  • Cristiano Casado commented
    April 04, 2017 20:49

    Single indice to store all types of messages include a several pitfall to scalability and performance when you work with high volumes of data and many types of applications sending logs. I doesn't have a distinct retention policies, possibility of conflict with fields and typing. With multi index I could implement a strategy to reallocate index with high rates of messages in more number of dedicated nodes in ES.

  • Linwood Ferguson commented
    April 04, 2017 20:49

    I had just started evaluating Graylog for a client and this is where I ended up also - some information is not worth keeping as long as others.  Sure, disk is cheap, but it is not free.  Test systems, frequent but unimportant messages, systems with regulatory requirements vs others, occasionally management insistence certain things NOT be kept beyond a certain point... it seems like the alternatives are to either keep everything for the longest required time, or build a different archiving solution independent of, but mirroring many of the same rules, you build in graylog.   I'm very new to the product; am I missing something? 

  • Pieter Lange commented
    April 04, 2017 20:49

    Index routing is a must-have feature. With fluentd/logstash this is easily accomplished, but this is a serious point of friction with graylog. The are multiple reasons for wanting to do this: easier retention management (look at elasticsearch-curator), better integration with external tools like kibana..i could go on.

     

    Also, wouldn't it be more appropriate for elasticsearch to live in an output plugin?

  • Jan Doberstein commented
    April 04, 2017 20:49

    This is going to be implement soon https://github.com/Graylog2/graylog2-server/issues/2880

  • David Vokac commented
    April 04, 2017 20:49

    I have been looking into the implementation of this feature in 2.2.0 beta2 version of GrayLog and found some fundamental problems sadly.

    You have implemented data handling features to the most frontend part of GrayLog server (Streams) that should only by used for viewing, not handling of the data imho. As a result there are some pitfalls with this solution.

    You can set different retention for different logtypes if you want. But you are cutting yourself out of analysis tools in the process. I have consulted it with joshi on IRC already.

    Example. I have 2 applications. APP1 and APP2. Both have different people in S2/S3 support and hence I need to differentiate user access. At the same time, both applications have two types of logfiles. Application logs and HTTP logs. HTTP logs are only important for the first 3 days. Application logs have to be kept for 3 months. How can I allow S2 support for APP1 to search application AND http logs at the same time? I can not. I will either be keeping terabytes of useless data or I wont be able to search as I need to find root causes (which is one of the main drives behind using graylog honestly).

    Ideally, data should be sorted to index sets right after they are received by GrayLog server (during processing by the message filter chain probably) and Streams should remain as a viewport. Also streams should be able to look into more than one index set.

    Just my 2 cents. I thouroughly enjoy GrayLog though, I just wanted to point out some pitfalls of this solution.

  • Graylog Team commented
    April 04, 2017 20:49

    A first version of this has been shipped in Graylog 2.2. Please see the updated documentation for details:

    http://docs.graylog.org/en/2.2/pages/configuration/index_model.html

    http://docs.graylog.org/en/2.2/pages/streams.html#index-sets


    There is some discussion about this in GitHub issue https://github.com/Graylog2/graylog2-server/issues/3473 as well. So if you run into problems, please check that issue as well.