Introduction to the Parsers part¶
Please, recall the diagram from the beginning of the Collectors part:
While collectors are n6‘s entry points for any external data to flow in, parsers actually analyze those data and translate them to the normalized format.
To state it more technically: a parser takes input data from its respective RabbitMQ queue, parses and normalizes those data (converting them to the n6-specific JSON-based format), and sends them further down the data processing pipeline (by pushing those data – already in their normalized form – into the appropriate RabbitMQ exchange).
Whereas collectors may be stateful, parsers shall always be stateless (i.e., they should neither store any persistent state nor make use of any external mutable context, such as current time).
Types of parsers/events¶
There are three main types of parsers:
- event – parsing data from ordinary event sources;
- bl – parsing data from blacklist sources;
- hifreq – parsing data from high frequency event sources.
For the most parts they work similarly.
The difference which is visible at the first sight is how they tag their
output data, i.e., what they add to the routing key of each message sent
to RabbitMQ. An ordinary event parser just adds the
to the routing key, a blacklist parser adds the
bl. prefix, while a
high frequency event parser adds the
The routing key is important for the further processing (down the
pipeline). Normal events go through
enricher, blacklist ones –
enricher and then to
hifreq – to
aggregator and only then to
enricher (see also: the n6 architecture
and data flow diagram).
A RecordDict – normalized data record¶
All parsers produce sequences of events aka normalized data
records, each being a
dict-like mapping, containing specific items.
There is a specialized class,
(plus its subclass,
BLRecordDict, for blacklist data), whose
instances represent normalized data records. A
RecordDict is a
dict-like mapping that provides automatic validation and adjustment of
Some keys are required to be present in each normalized data record
(i.e., in each
RecordDict instance when it is serialized):
Furthermore, there are lots of optional keys that can appear in a
normalized data record. We do not list all valid
here, but they can be deduced from presence of
attributes in the definition of the