Dealing with event data is dirty work at times. Developers may transmit events with errors because of a change a developer made. Also, sometimes errors could be introduced if the data engineering team decides to change something on the data warehouse schema. Due to these changes to the schema, data type conflict may occur. How can someone deal with all the different event data issues that might arise in a production environment? This blog discusses how RudderStack handles event filtering and value aggregation without introducing manual errors.
RudderStack’s solution is a sophisticated mechanism. Here, you can implement custom logic using JavaScript to define transformations. You can apply these transformations to the incoming events.
Having an expressive environment like RudderStack offers endless possibilities of how a data engineering team can interact with the data. In this blog post, we will explore just two of the most common use cases we’ve encountered among the RudderStack community. Event filtering and value aggregation are universal, simple to implement, yet very powerful.
User Transformation for Event Filtering and Value Aggregation
You can define user transformations in the Configuration Plane of your RudderStack setup. Few sample user transformations are available on our GitHub. This blog provides an insight into one such sample transformation that you can use for:
- Event Filtering: This stops events from passing to a destination. You might need to filter events where an organization employs multiple tools/platforms for addressing different business requirements. Also, you may want to route only specific events to specific tool/platform destinations.
- Value aggregation: This allows aggregation of values on specific attributes of particular event types. You might need to aggregate values where an organization is not looking to employ a tool/platform to perform transaction-level record keeping and/or analysis. Instead, they want consolidated records/analytics. So, this kind of transformation helps in reducing the network traffic, and request/message volume. This is because the system can replace multiple events of a particular type by a single event of the same type with the aggregated value(s). This transformation also helps in cost reduction, where the destination platform charges by volume of events/messages.
You can view the sample transformation on our GitHub page.
Implementation
You need to contain all logic within the transform
function, which takes an array of events as input and returns an array of transformed events. The transform
function is the entry-point function for all user transformations.
function transform(events) {
const filterEventNames = [
// Add list of event names that you want to filter out
"game_load_time",
"lobby_fps"
];
//remove events whose name match those in above list
const filteredEvents = events.filter(event => {
const eventName = event.event;
return !(eventName && filterEventNames.includes(eventName));
});
The code snippet above shows how you can use the filter
function of JavaScript arrays to filter out events based on the event name.
A variation of this code is also possible. Here, the values in the array of event names are the ones you want to retain, and you remove the not (!
) condition from the return
statement in the penultimate line.
Below code shows event removal based on a simple check like event name match but more complex logic involving checking the presence of value for a related attribute.
//remove events of a certain type if related property value does not satisfy the pre-defined condition
//in this example, if 'total_payment' for a 'spin' event is null or 0, then it would be removed.
//Only non-null, non-zero 'spin' events would be considered
const nonSpinAndSpinPayerEvents = filteredEvents.filter( event => {
const eventName = event.event;
// spin events
if(eventName.toLowerCase().indexOf('spin') >= 0) {
if(event.userProperties && event.userProperties.total_payments
&& event.userProperties.total_payments > 0) {
return true;
} else {
return false;
}
} else {
return true;
}
});
As you can see from the above examples, you can use the filtered array available as output from one step as the input to the next. As a result, you can daisy-chain the transformation conditions.
Finally, the following code shows how you can prepare aggregates for specific attributes across events of a particular type present in a batch. After this, the code returns a single event of the concerned type. Also, the code returns the aggregated values for the corresponding attributes.
//remove events of a certain type if related property value does not satisfy the pre-defined condition
//in this example, if 'total_payment' for a 'spin' event is null or 0, then it would be removed.
//Only non-null, non-zero 'spin' events would be considered
const nonSpinAndSpinPayerEvents = filteredEvents.filter( event => {
const eventName = event.event;
// spin events
if(eventName.toLowerCase().indexOf('spin') >= 0) {
if(event.userProperties && event.userProperties.total_payments
&& event.userProperties.total_payments > 0) {
return true;
} else {
return false;
}
} else {
return true;
}
});
Conclusion
In the above snippet:
- First, the code collects the
spin_result
events into an array. - Then, the code aggregates the values for three attributes –
bet_amount
,win_amount
, andno_of_spin
by iterating over the elements of the above array. - After this, the system assigns the aggregated values to the respective attributes of the first
spin_result
event in the array. - Now, the code separates the events that are not of the target type (
spin_result
in this case) into another array. If there were no such events, an empty array is created. - Finally, the system adds the
single spin_result
event to the array created in the previous step, and the result is returned.
Sign up for Free and Start Sending Data
Test out our event stream, ELT, and reverse-ETL pipelines. Use our HTTP source to send data in less than 5 minutes, or install one of our 12 SDKs in your website or app. Get started.
Top comments (0)