the challenge
-
Dragan Milosevic - Bringing Together Millions of Publishers and Thousands of Advertisers
the challenge 10mo ago
-
Huo Yuan Jia 2007 - Samurai Challenge
the challenge 36m ago
-
The Challenge On The Job
the challenge 1h ago
-
The Ron Jeremy Baby Food Challenge!
the challenge 2h ago
-
Beyond Good & Evil - No Health Run - Part 3
the challenge 3h ago
-
The Kegel Challenge June 2013 (Pelvic Floor Strengthening Challenge)
the challenge 4h ago
-
2013 GAME15 - Halifax RLFC Highlights
the challenge 4h ago
-
Three-hour Painting 2009
the challenge 4h ago
-
TRY TO WATCH WITHOUT CRYING 99% Fail The Challenge
the challenge 6h ago
-
Leinster Thrash Stade to Take Title
the challenge 8h ago
-
Cara Jadi Youtuber Famous
the challenge 8h ago
-
S.K.I.L.L. - Special Force 2 Announcement Trailer (HD)
the challenge 8h ago
-
The Challenge of Liberty Summer Seminar is Back—Will You Be There?
the challenge 9h ago
-
If He can use me He can use you ... Get Ready! by Evangelist Naeem Nasir
the challenge 10h ago
-
Ess Rawr/Skillzy -The Challenge Series Ep 1 - Tomato Ketchup
the challenge 10h ago
-
Sleeping With Sirens- With Ears to See and Eyes to Hear (Cover)
the challenge 11h ago
-
Take On the Challenge
the challenge 12h ago
-
Maria Albulet: Forest Core
the challenge 13h ago
-
Studger's Amazing Goal For Liverpool / Best Goals Ever Recreated
the challenge 14h ago
-
Savvas Stouppas: The Challenge Of Multiculturalism In Cyprus
the challenge 16h ago
-
Ellen's Mirror Moves
the challenge 16h ago
-
Fleet Maull at Imagining Peace
the challenge 16h ago
-
Going further together
the challenge 17h ago
-
Mongolia Bike Challenge
the challenge 17h ago
-
Save Time By Cooking In Bulk
the challenge 18h ago
-
Johnie Berntsson/Stena Sailing Team - The Challenge Ahead
the challenge 19h ago
-
Hysteria (Muse Bass Cover)
the challenge 19h ago
-
Challenge Gregg - Press ups with high fives
the challenge 19h ago
-
2012 San Antonio Sports Fit Family Challenge presented by Superior Health Plan
the challenge 19h ago
-
Viper Challenge 2013
the challenge 20h ago
Dragan Milosevic - Bringing Together Millions of Publishers and Thousands of Advertisers
10mo agoDescription
Bringing Together Millions of Publishers and Thousands of Advertisers by Zanox Reporting Systems built on top of Hadoop and Lucene Technologies Dr. Dragan Milosevic, Senior Hadoop Architect, dragan.milosevic@zanox.com Zanox is company that deals with the huge amount of data. Currently our tracking gets more than 2 million sales, 30 million clicks and almost 1 billion views every day. Huge amount of data are also coming from search-engines that are providing information about related costs. The challenge is to join and summarise those huge sets of data and provide valuable tracking and cost statistics to more than 1 million publishers and 10 thousands advertisers. They will use our analyses to successfully drive their business by knowing which launched campaigns are well performing and how others can be changed to bring even more money. This talk will have two parts. The first part shows how Hadoop helps in analysing available data and saving valuable results in Lucene indexes. The second part describes Lucene search infrastructure that is able to efficiently generate reports in real time for publishers and advertisers. Challenges to be solved during Hadoop processing can be summarised as follows: (1) Joining several huge sets of data, namely the data that are tracked by Zanox, costs data coming from search-engines and master-data about publishers and advertisers. We have developed unique approach that combines map-side and reduce-side joins, trying to combine the best of both techniques. It does sampling to identify records which have join-key that is frequent. Those records will be joined on a map-side and directly written to output without loading map-reduce pipeline. Only records whose join-key is infrequent will be propagated to reducers in order to be joined there. Provided experiments have shown significant gains compared to pure reduce-side join mainly due to the fact that more than 80% of records can be joined on a map-side, resulting that only one fifth of records have to be sorted and joined on a reduce-side. (2) Several aggregation jobs are found to be very important for us, mainly due to the huge number of records that have to be summarised. We have made many experiments aiming to speed-up map-reduce engine, being in our particular case slow mainly due to the fact that sorting on a reduce-side has to wait until all map-tasks are not completed. In the case where thousands of map-tasks have to be executed, reduce tasks are waiting a significant amount of time. Our solution is very simple, and it is based on replacing one huge job with several smaller ones that are responsible only for the selected part of input data. Because smaller jobs have much less map-tasks that have to be finished before reduce-tasks can start with expensive sorting and aggregation operations, we have managed to earlier activate resources that are responsible for reduce tasks and on average to speed-up complete aggregation for more than 30%. (3) Lucene indexes save every aggregated result produced by Hadoop jobs to make them real-time available to publishers and advertisers. Created indexes represent different views to aggregated data, being responsible to speed-up the processing of queries. We have optimised indexing performance by (a) building different aggregations of every record simultaneously, (b) using intelligent partitioners that are sending the semantically same aggregated-versions of records to same reducers, and (c) taking care of producing medium-size indexes that can be generated completely in memory on a reduce-side. Lucene search infrastructure is dealing with the real-time generation of reports on a request basis. Because the response time plays very important role while serving online requests, the architecture of our search-backend is optimised by following actions: (1) Indexing data with different aggregation granularities to optimise the execution of various types of queries. Ideally the requested info...
