Skip to content

High-Frequency Data Sources

Some data sources have a property of supplying high amounts of events that are really similar to each other (for example differing only in timestamps). For these events we do not want to store each one in the database separately as it would take up a lot of space. What we really care of is the original data of the first event (because the rest is just the same), the time of the first event, the time of the last event and the count of events of this kind we got.

Aggregator module

Data from these kind of sources should go through the aggregator module. What it does is what we wanted in the first place. It keeps the data of the first event, counts how many we got up to this point and keeps the track of the time of the first and the last event. What’s more, it periodically takes all of its stored events and sends them to the database so that they will not be kept by it forever (actually the process of sending the aggregated events as one event to the database is a bit more complicated than just doing it once per several hours but we can safely skip the details).

Sending data to the Aggregator

To send the data to the aggregator the parser needs to add the hifreq tag at the beginning of the routing key of the message as well as add a special _group key to the payload (it would be possible do it in the collector but it is much better to do so in the parser because the collector should not really know if the source is a high frequency one or not).

How does aggregator know that it should treat the event as the same as the last one? Precisely by the value under the _group key. If _group values of some two events are the same the events are treated as incarnations of the same event, just with different timestamps.

What is more, the event_type attribute of the parser class should be set to 'hifreq'.

AggregatedEventParser

N6Core provides a base class for the parsers of the high frequency data sources. The title of this section already spoiled the name, it is AggreagatedEventParser.

It takes care of most of the things like setting the event_type class attribute and generating the value for and adding the _group key to the payload as well as modifying the routing key appropriately.

The value for the _group key will be created by getting the values for the keys specified in the group_id_components class attribute and joining them with underscores. The values will be taken from the incoming collector’s data. If one of the given keys is missing from the data the None string value will be used in place of it. However at least one of the specified keys must be present, otherwise ValueError is raised.

It is also important to note that an ip key is treated differently. It actually evaluates to data['address'][0]['ip']. It may look strange, however it is a really frequent pattern in the collected data so it was done to make it simpler for the implementation.

Remember that you still need to implement the parse method yourself.

Back to top