Data science suspects that advertising campaigns are showing the same ad to users too many
times (a high frequency) as they browse their favorite websites. They’ve asked the data
engineers to investigate.
Given two input files (ad_data.1.log and ad_data.2.log) containing tab delimited ad event data,
find all of the users that saw the same ad more than 5x on a site.
Each line in the input files represents one user’s view of an ad on a site
GUID is a unique identifier for a user
Filter out any ad events that do not have a valid GUID (i.e. GUID is “unsupported”, “-”,
Output should be Ad ID, Site ID, Frequency and Total users that saw the ad at that
frequency. Frequency is defined as the total number of times the same ad was shown to
a user on the same site.
The output should be tab separated and sorted in descending order by frequency
August 17, 2018
I am willing to pay higher rates for the most experienced freelancers