Top Picks
-
Indexing Big Data on Amazon AWS
Vimeo 11mo ago
-
Lie Witness News - LA Mayoral Election
Top Picks 1h ago
-
GoPro: Erik Roner's Snowmobile B.A.S.E. Jump
Top Picks 2h ago
-
Lune - Leave The World Behind
Top Picks 3h ago
-
Van Halen - Eruption Guitar Cover
Top Picks 3h ago
-
All the 'Arrested Development' "Chicken" Dances
Top Picks 4h ago
-
Shock Value: Electric Carmaker Tesla Repays Federal Loan
Top Picks 5h ago
-
A Life Well Lived | Jim Whittaker & 50 Years of Everest
Top Picks 6h ago
-
Arrested Development Returns! EXCLUSIVE Sneak Preview
Top Picks 6h ago
-
Man of Steel - "Fate of Your Planet" Official Trailer [HD]
Top Picks 7h ago
-
Jerry Seinfeld Accepts Webby for Outstanding Comedic Performance
Top Picks 7h ago
-
TAX EVASION! New from Apple
Top Picks 7h ago
-
Pepsi Max - OFFICE INTERVIEW Ad [HQ version]
Top Picks 8h ago
-
Windows 8: Beautiful and Fast
Top Picks 8h ago
-
Does Adam Savage View Himself as an Artist or Scientist?
Top Picks 8h ago
-
"Climate Change Is Real" - Former NY Army Corps Commander John Boulé
Top Picks 8h ago
Tags
Description
Presented by Scott Stults | OpenSource Connections at Lucene Revolution 2012 Amazon Web Services offers a quick and easy way to build a scalable search platform, a flexibility is especially useful when an initial data load is required but the hardware is no longer needed for day-to-day searching and adding new documents. This presentation will cover one such approach capable of enlisting hundreds of worker nodes to ingest data, track their progress, and relinquish them back to the cloud when the job is done. The data set that will be discussed is the collection of published patent grants available through Google Patents. A single Solr instance can easily handle searching the roughly 1 million patents issued between 2010 and 2005, but up to 50 worker nodes were necessary to load that data in a reasonable amount of time. Also, the same basic approach was used to make three sizes of PNG thumbnails of the patent grant TIFF images. In that case 150 worker nodes were used to generate 1.6 Tb of data over the course of three days. In this session, attendees will learn how to leverage EC2 as a scalable indexer and tricks for using XSLT on very large XML documents.
