Online search advertisement platform & Realtime Campaign Monitoring
Used Jsoup to crawler information on Amazon.
Search advertising is placing online advertisments on front end pages that show results to users from their search engine queries. This search ads server takes thousands of product data as ads candidates and selects, filters, ranks, allocates and prices the ads when search query comes in. The selection and ranking of search ads is based on the quality of ads and the bid price offered by advertisers.
Ads candiate will first be evaluated and filtered by relevance score. Relevance score is to measure how relevant query is to key words in ads. Here the relevance score = number of word match query / total number of words in key words. For quick retreival of ads infomation, the inverted index of ads keywords were built and store in cache.
The data layer for supporting online system:
The probability of user click (p-click) plays an important role in ads ranking.
Use spark ML process simulated user click log data and generate prediction model.
log: Device IP, Device id,Session id,Query,AdId,CampaignId,Ad_category_Query_category(0/1),clicked(0/1)
pClick Features extracted from search log and stored in key-value store
Logistic Regression
Gradient Boosting Tree
Quality Score = 0.25 * Relevance Score + 0.75 * pClick
Rank Score = Quality Score * Bid
Price(Cost Per Click) = next rank score / current quality score + 0.01
When receiving search query, the system matchs rewrote query with keywords of ads using inverted index to get relevance score, and predict the probability of click by the regression model generated from 50GB historical click data. The quality of ads will be determined by both relevance score and the probability of click. The ads engine calculates the quality score and combines it with ads bid price for final ranking and pricing.
The real time campaign monitor system is built for collecting the ads relevant events generated by online ads server and visulizing the trending of campaigns.
he real time campaign monitoring system is a streaming pipeline which collects and processes the ads events generated by online search ads engine. The chance events, impression events and click events of ads are published to message queue and processed to store in database in streaming way. The front end dashboard visualizes the budget status and dynamic impression, click and pricing trending of campaigns.