A Cost-Benefit Analysis of Indexing Big Data with Map-Reduce

- Siafarikas, Dimitrios and Samourkasidis, Argyrios and Arampatzis, Avi -

Abstract

We reflect upon the challenge a Big Data analyst faces when dealing with the complex problem of considering the approximate amount of nodes needed for a computation to be completed within a given time. We develop a formula which allows anyone, with the job of designing clusters for massive data sets, to do so. We consider the problem of Inverted Index construction which is widely used in Information Retrieval. We present the various aspects and the challenges of this problem along with details on how the system we developed works.

In PDF