How Big Data Collection Works: Process, Challenges, Techniques (2024)

Feature

Taming large amounts of data from multiple sources and deriving the greatest value to ensure trusted business decisions hinge on a foolproof system for collecting big data.

How Big Data Collection Works: Process, Challenges, Techniques (1)

By

  • Mary K. Pratt

Published: 07 Feb 2022

Big data has become one of the more valuable assets held by enterprises, and virtually every large organization is making investments in big data initiatives.

That's not an overstatement. A 2021 survey by NewVantage Partners found that 99% of senior C-level executives at Fortune 1000 companies said they're pursuing a big data program. Perhaps even more significant, 96% reported that their companies have had success with their big data and artificial intelligence programs, 92% said the pace of their investments in these areas is accelerating and 81% voiced optimism about the future of big data and AI in their organizations.

What is big data collection?

Big data collection is the methodical approach to gathering and measuring massive amounts of information from a variety of sources to capture a complete and accurate picture of an enterprise's operations, derive insights and make critical business decisions. Data collection is far from new, of course, since information gathering has been an ingrained practice for millennia. Moreover, researchers for centuries have been confounded in their attempts to manage and analyze overwhelming amounts of data.

Big data collection entails structured, semi-structured and unstructured data generated by people and computers. Big data's value doesn't lie in its quantity, but rather in its role in making decisions, generating insights and supporting automation -- all critical to business success in the 21st century.

This article is part of

The ultimate guide to big data for businesses

  • Which also includes:
  • 8 benefits of using big data for businesses
  • What a big data strategy includes and how to build one
  • 10 big data challenges and how to address them
Download1 Download this entire guide for FREE now!

"Companies need to invest in what the data can do for their business," said Christophe Antoine, vice president of global solutions engineering at data integration platform provider Talend. But organizations that want to reap the benefits of big data must first effectively collect it -- not so easy a feat given the volume, variety and velocity of data today.

What data is collected?

Today the volume, variety and velocity of data are so much greater that it warrants the title big data. The world now generates an estimated 2.5 quintillion bytes of data every day, according to general consensus statistics. This data comes in the following three forms:

  • Structured data is highly organized and exists in predefined formats like credit card numbers and GPS coordinates.
  • Unstructured data exists in the form it was generated, such as social media posts.
  • Semi-structured data is a mix of structured and unstructured data like email addresses and text, respectively.

Data generally can be classified as quantitative and qualitative.Quantitative data comes in numerical form such as statistics and percentages, while qualitative data carries descriptive characteristics like color, smell, appearance and quality. In addition to the primary data, organizations might use secondary data collected by another party for a different purpose.

Common methods of collecting big data

In big data collection, the range of a company's sources generating data needs to be identified. Typical sources include the following:

  • operational systems producing transactional data such as point-of-sale software;
  • endpoint devices within IoT ecosystems;
  • second- and third-party sources such as marketing firms;
  • social media posts from existing and prospective customers;
  • multiple additional sources like smartphone locational data; and
  • surveys that directly ask customers for information.
How Big Data Collection Works: Process, Challenges, Techniques (2)

No enterprise can collect and use all the data being created. So, business leaders need to build a big data collection program that identifies the data they need for their existing and future business use cases. Some experts believe enterprises should collect as much data as they can acquire to pilot innovative use cases, while others advise organizations to be more selective to avoid running up costs, complexity and compliance issues without getting any business value in return.

Steps in the data collection process

Identifying useful data sources is just the start of the big data collection process. From there, an organization must build a pipeline that moves data from generation to enterprise locations where the data will be stored for organizational use. Most commonly, this data ingestion process involves three overarching steps -- extract, transform and load (ETL):

  • extraction -- data is taken from its originating location;
  • transformation -- data is cleansed and normalized for business use; and
  • loading -- data is moved into a database, data warehouse or data lake to be accessed for use.

Data management teams face additional considerations and requirements at each of these steps, such as how to ensure the data they've identified for use is reliable and how to prepare it for use.

"Data determines the uses you can have, and desired applications determine the data you will need," said David Belanger, senior research fellow at the Stevens Institute of Technology School of Business and retired chief scientist at AT&T Labs. "Once you know the sources, there are a number of questions to be answered: Where can I get the data I need? Is the source reliable? What are its properties, for example, velocity, stream, transaction, purchased? What is its quality? Is it internally or externally sourced? etc."

Challenges in big data collection

Not surprisingly, many businesses struggle with these questions. "There are all kinds of challenges -- technical challenges, organizational and sometimes compliance challenges," said Max Martynov, CTO at digital transformation service provider Grid Dynamics. These challenges can include the following:

  • identifying and managing all the data held by an organization;
  • accessing all the required data sets and breaking down internal and external data silos;
  • achieving and maintaining good data quality;
  • selecting and properly using the right tools for the various ETL tasks;
  • having the right skills and enough skilled talent for the level of work required to meet organizational objectives; and
  • properly securing all the collected data and adhering to privacy and security regulations while enabling access to meet business needs.

Such challenges within the data collection process mirror the challenges that executives cite as barriers to developing their big data initiatives overall. The NewVantage study, for example, found that 92% of respondents identified culture -- people, business processes, change management -- as the biggest challenge to becoming a data-driven organization, while just 8% identified technology limitations as the leading barrier.

Big data security and privacy issues

Experts advise business leaders to develop a strong data governance program to help address those challenges, particularly security- and privacy-related challenges. "You don't want to hurt access, but you do need to put the right governance in place to protect your data," Talend's Antoine noted.

A good governance program should establish the processes needed to dictate how the data is collected, stored and used and ensure that the organization does the following:

  • identifies regulated and sensitive data;
  • establishes controls to prevent unauthorized access to it;
  • creates controls to audit those who access it; and
  • creates systems to enforce governance rules and protocols.

Such steps help secure and protect data to ensure regulatory compliance. Moreover, experts said these measures help the business to trust its data -- an important part of becoming a data-driven organization.

Best practices for collecting big data

To build a successful, secure process for big data collection, experts offered the following best practices:

  • Develop a framework for collection that includes security, compliance and governance from the start.
  • Build a data catalog early in the process to know what's in the organization's data platform.
  • Let business use cases determine the data that's collected.
  • Tune and tweak data collection and data governance as use cases emerge and the data program matures, identifying what data sets are missing from the organization's big data collection process and what collected data sets hold no value.
  • Automate the process as much as possible from data ingestion to cataloging to ensure efficiency and speed as well as adherence to the protocols established by the governance program.
  • Implement tools that uncover problems in the data collection process, such as data sets that don't show up as expected.

Related Resources

Dig Deeper on Data governance

  • 5V's of big dataBy: ScottRobinson
  • Assemble the 6 layers of big data stack architectureBy: JeffMcCormick
  • 3 V's (volume, velocity and variety)By: BenLutkevich
  • Top trends in big data for 2023 and beyondBy: RonaldSchmelzer

As someone deeply entrenched in the realm of big data, I can attest to the crucial role it plays in modern enterprises, shaping the landscape of decision-making and innovation. My expertise is rooted in hands-on experience and a comprehensive understanding of the concepts discussed in the provided article by Mary K. Pratt.

The article delves into the significance of big data collection, emphasizing its role in driving trusted business decisions. The evidence presented, including statistics from a 2021 survey by NewVantage Partners, underscores the widespread adoption and success of big data initiatives among Fortune 1000 companies. The statistics, such as 99% of senior executives pursuing big data programs and 96% reporting success, showcase the undeniable impact of big data on today's corporate landscape.

Now, let's break down the key concepts covered in the article:

  1. Big Data Collection Definition:

    • Big data collection is described as a systematic approach to gathering massive amounts of information from diverse sources. The goal is to create a comprehensive and accurate overview of an enterprise's operations, enabling the derivation of insights crucial for making informed business decisions.
  2. Types of Data:

    • The article highlights three forms of data:
      • Structured data: Highly organized and exists in predefined formats.
      • Unstructured data: Maintains the form it was generated in, such as social media posts.
      • Semi-structured data: A mix of structured and unstructured data, like email addresses and text.
  3. Quantitative and Qualitative Data:

    • Data can be classified as quantitative (numerical, e.g., statistics) or qualitative (descriptive characteristics like color, smell, etc.).
  4. Sources of Big Data:

    • Various sources contribute to big data, including operational systems, IoT devices, second- and third-party sources, social media posts, smartphone locational data, and surveys.
  5. Data Collection Process:

    • The article outlines the steps in the data collection process:
      • Identification of useful data sources.
      • Building a data pipeline involving extraction, transformation, and loading (ETL).
      • Additional considerations include ensuring data reliability and preparation for use.
  6. Challenges in Big Data Collection:

    • Challenges include technical, organizational, and compliance issues, such as identifying and managing data, ensuring data quality, and selecting the right tools and skills.
  7. Security and Privacy:

    • Security and privacy are critical concerns, and a strong data governance program is recommended to address them. This involves identifying regulated and sensitive data, establishing controls, auditing access, and enforcing governance rules.
  8. Best Practices for Big Data Collection:

    • The article suggests several best practices:
      • Develop a framework that includes security, compliance, and governance.
      • Build a data catalog early in the process.
      • Let business use cases guide data collection.
      • Tune and tweak data collection and governance based on emerging use cases.
      • Automate processes for efficiency.
      • Implement tools to uncover problems in the data collection process.

In conclusion, the article provides a comprehensive guide to big data for businesses, covering essential concepts, challenges, and best practices. As an expert in this field, I can affirm the importance of these insights for organizations seeking to harness the power of big data for informed decision-making and sustained success.

How Big Data Collection Works: Process, Challenges, Techniques (2024)
Top Articles
Latest Posts
Article information

Author: Arline Emard IV

Last Updated:

Views: 6022

Rating: 4.1 / 5 (72 voted)

Reviews: 87% of readers found this page helpful

Author information

Name: Arline Emard IV

Birthday: 1996-07-10

Address: 8912 Hintz Shore, West Louie, AZ 69363-0747

Phone: +13454700762376

Job: Administration Technician

Hobby: Paintball, Horseback riding, Cycling, Running, Macrame, Playing musical instruments, Soapmaking

Introduction: My name is Arline Emard IV, I am a cheerful, gorgeous, colorful, joyous, excited, super, inquisitive person who loves writing and wants to share my knowledge and understanding with you.