Beyond the hype: Big data concepts, methods, and analytics

https://doi.org/10.1016/j.ijinfomgt.2014.10.007Get rights and content
Under a Creative Commons license
open access

Highlights

  • We define what is meant by big data.

  • We review analytics techniques for text, audio, video, and social media data.

  • We make the case for new statistical techniques for big data.

  • We highlight the expected future developments in big data analytics.

Abstract

Size is the first, and at times, the only dimension that leaps out at the mention of big data. This paper attempts to offer a broader definition of big data that captures its other unique and defining characteristics. The rapid evolution and adoption of big data by industry has leapfrogged the discourse to popular outlets, forcing the academic press to catch up. Academic journals in numerous disciplines, which will benefit from a relevant discussion of big data, have yet to cover the topic. This paper presents a consolidated description of big data by integrating definitions from practitioners and academics. The paper's primary focus is on the analytic methods used for big data. A particular distinguishing feature of this paper is its focus on analytics related to unstructured data, which constitute 95% of big data. This paper highlights the need to develop appropriate and efficient analytical methods to leverage massive volumes of heterogeneous data in unstructured text, audio, and video formats. This paper also reinforces the need to devise new tools for predictive analytics for structured big data. The statistical methods in practice were devised to infer from sample data. The heterogeneity, noise, and the massive size of structured big data calls for developing computationally efficient algorithms that may avoid big data pitfalls, such as spurious correlation.

Keywords

Big data analytics
Big data definition
Unstructured data analytics
Predictive analytics

Cited by (0)

Amir Gandomi is an assistant professor at the Ted Rogers School of Information Technology Management, Ryerson University. His research lies at the intersection of marketing, operations research and IT. He is specifically focused on big data analytics as it relates to marketing. His research has appeared in journals such as OMEGA - The International Journal of Management Science, The International Journal of Information Management, and Computers & Industrial Engineering.

Murtaza Haider is an associate professor at the Ted Rogers School of Management, Ryerson University, in Toronto. Murtaza is also the Director of a consulting firm Regionomics Inc. He specializes in applying statistical methods to forecast demand and/or sales. His research interests include human development in Canada and South Asia, forecasting housing market dynamics, transport and infrastructure planning and development. Murtaza Haider is working on a book, Getting Started with Data Science: Making Sense of Data with Analytics (ISBN 9780133991024), which will be published by Pearson/IBM Press in Spring 2015. He is an avid blogger and blogs weekly about socio-economics in South Asia for the Dawn newspaper and for the Huffington Post.