Contextual Data
Within Data Collection you have the option to choose to collect contextual data with the On-Site Pixel.
Contents on This Page |
---|
Description
The Contextual Data feature can be enabled on a pixel level. If enabled, our pixel will scan the text of the webpage and automatically classifies it based on its content. As a result one or multiple contextual data labels will be stored in the DMP and can be used for Custom Audiences and/or Report Central.
Labels
Our Contextual Data model includes 84 default labels: news, travel, gambling, weather, mobile_phones, television, portal, entertainment, search, utility_companies, football, death, sports, fashion, lifestyle, women, sex, automobile, house, knowledge, price_comparison, public_transport, technology, advertising, men, animals, movies, music, beauty, computer, tickets, retail, books, electronics, health, pets, home, toys, secondhand, weddings, perfume, kids, holiday, shoes, rental, dating, gaming, hotel, food, jobs, flights, business, leisure, education, social, finance, jewellery, hobby, blog, software, deals, charity, art, pregnancy, recipes, insurance, culture, parents, horoscope, garden, forum, nature, celebrities, politics, diet, pharma, shopping, socialmedia, law, science, disaster, injury, family, war. By default one label will be returned and stored in the DMP per pixel event. If you want to increase this number to store multiple labels please contact your Account Manager.
Custom Labels
Besides our standard Contextual Data model and its labels it is possible to upload custom labels and input to train a custom model. This can be done by means of our API. Please contact your Account Manager if you want to receive documentation.
Methodology
Contextual Data uses a machine-learning classification approach to classify unknown web pages. This requires training a model with high-quality and known web pages. These web pages will be used to extract their contents, so we get valid "tokens" (e.g. common words, unique words etc.) from the static HTML. To get the best results, it is recommended to add urls with a lot of text containing information about the specific label that you'd want to classify, so for the label "sports" it is advised to add Sports Articles that contain mostly terms regarding sports (e.g. "football", "hockey", "referee", "pitch", "stadium"). The model is then trained with these tokens, telling that "pitch", "stadium" and "referee" belong to the label "sports". The more accurate articles the model is trained with, the more tokens the model will be trained on, which will lead to a higher accuracy of the model when it comes to classifying unknown web pages. Therefore, it is advised to have a high amount of web pages that the model will be trained on, preferable with specific terms regarding the labels to be trained on.