The history of big data | LightsOnData (2024)

Do a quick google search and you’ll quickly realize that no one can really agree on the true origins of the term ‘Big Data’. Some argue that it has been around since the early 1990s, crediting American computer scientist John R Mashey, considered the ‘father of big data’, for making it popular.

Others believe it was a term coined in 2005 by Roger Mougalas and the O’Reilly Media group. And some would even argue that the idea of ‘big data’ didn’t really take off until the 2010s. But wherever you stand in the origins of the term, one thing that we can all agree on is that Big Data has actually been around for many, many years. Big Data is not something that is completely new or only of the last two decades. Arguably though, in the last decade it did turn into a bit of a buzz word.

Over the course of centuries, people have been trying to use data analysis and analytics techniques to support their decision-making process.

The ancient history of Big Data

The earliest examples we have of humans storing and analyzing data are the tally sticks, which date back to 18,000 BCE! The Ishango Bone was discovered in 1960 in what is now known as Uganda and is thought to be one of the earliest pieces of evidence of prehistoric data storage.

The history of big data | LightsOnData (2)

De Heinzelin’s detailed drawing of the Ishango bone

Paleolithic tribespeople would mark notches into sticks or bones, to keep track of trading activity or supplies. They would compare sticks and notches to carry out rudimentary calculations, enabling them to make predictions such as how long their food supplies would last.

Then, in 2400 BCE came, the abacus. The first dedicated device constructed specifically for performing calculations. The first libraries also appeared around this time, representing our first attempts at mass data storage.

The ancient Egyptians around 300 BC already tried to capture all existing ‘data’ in the library of Alexandria. Moreover, the Roman Empire used to carefully analyze statistics of their military to determine the optimal distribution for their armies.

But, in more recent times it has revolutionized the modern business environment.

Big Data in 20th century

The first major data project was created in 1937 and was ordered by the Franklin D. Roosevelt administration after the Social Security Act became law. The government had to keep track of contributions from 26 million Americans and more than 3 million employers. IBM got the contract to develop punch card-reading machine for this massive bookkeeping project.

The first data-processing machine appeared in 1943 and was developed by the British to decipher Nazi codes during World War II. This device, named Colossus, searched for patterns in intercepted messages at a rate of 5,000 characters per second, reducing the length of time the task took from weeks to merely hours.

The history of big data | LightsOnData (3)

A Colossus Mark 2 codebreaking computer being operated by Dorothy Du Boisson (left) and Elsie Booker (right), 1943 | Source: Wikipedia

Then, in 1965, the United States Government decided to build the first ever data centre to store over 742million tax returns and 175 million sets of fingerprints. They decided to do this by transferring those records onto magnetic computer tape that had to be stored in a single location. The project was later dropped but is generally accepted as the beginning of the electronic data storage era.

The internet age and the dawn of Big Data

Between 1989 and 1990 Tim Berners-Lee and Robert Cailliau created the World Wide Web and developed HTML, URLs and HTTP, all while working for CERN. The internet age with widespread and easy access to data had begun and by 1996 digital data storage had become more cost-effective than storing information on paper.

The history of big data | LightsOnData (4)

Tim Berners-Lee and Robert Cailliau

The domain google.com was registered a year later in 1997 and would launch the following year in 1998 firing the starting pistol on the search engine's climb to data dominance and the development of numerous other technological innovations, including in the areas of machine learning, big data and analytics.

In 1998, Carlo Strozzi developed NoSQL, an open-source relational database that provided a way to store and retrieve data modelled differently from the traditional tabular methods found in relational databases. Then, in 1999, the first edition of How Much Information by Hal R. Varian and Peter Lyman attempted to quantify the amount of digital information available in the world at that point.

The information age

Since the early 2000s, the Internet and the Web has offered unique data collections and data analysis opportunities. With the expansion of web traffic and online stores, companies such as Yahoo, Amazon and eBay started to analyze customer behavior by looking at click-rates, IP-specific location data and search logs. This opened a whole new world of possibilities.

In 2005, Big Data was labelled by Roger Mougalas as he referred to a large set of data that, at the time, was almost impossible to manage and process using the traditional business intelligence tools available. In the same year, Hadoop, which could handle Big Data, was created. Hadoop was based on an open-sourced software framework called Nutch and was merged with Google’s MapReduce.

Big Data revolutionized entire industries and changed human culture and behavior. It is a result of the information age and is changing how people exercise, create music, and work.

For example, Big Data is being used in healthcare to map disease outbreaks and test alternative treatments. NASA uses Big Data to explore the universe. The music industry replaces intuition with Big Data studies. Utilities use Big Data to study customer behavior and avoid blackouts. Nike uses health monitoring wearables to track customers and provide feedback on their health and Big Data is being used by cybersecurity to stop crime.

The future of Big Data

Since Big Data first entered the scene, its definition, its use cases, technology and strategy of harnessing its value evolved significantly across different industries. Innovations in cloud computing, quantum computing, Internet of Things (IoT), artificial intelligence, and so on will allow for Big Data to evolve further as we'll find new ways of harnessing its potential.

About the author

George Firican

George Firican is the Director of Data Governance and Business Intelligence at the University of British Columbia, which is ranked among the top 20 public universities in the world. His passion for data led him towards award-winning program implementations in the data governance, data quality, and business intelligence fields. Due to his desire for continuous improvement and knowledge sharing, he founded LightsOnData, a website which offers free templates, definitions, best practices, articles and other useful resources to help with data governance and data management questions and challenges. He also has over twelve years of project management and business/technical analysis experience in the higher education, fundraising, software and web development, and e-commerce industries.

Let me dive right in. Big Data, a term that has sparked debates about its origins, has roots that trace back much further than many might think. Some attribute its popularity to American computer scientist John R. Mashey in the early 1990s, dubbing him the 'father of big data,' while others credit Roger Mougalas and the O'Reilly Media group for coining the term in 2005. Despite these debates, one undeniable fact is that Big Data has a rich history spanning centuries.

In the realm of ancient history, evidence of data storage and analysis dates back to 18,000 BCE with tally sticks and the Ishango Bone. Paleolithic tribes used notches on sticks and bones for rudimentary calculations, demonstrating an early form of data tracking. Fast forward to 2400 BCE, and the abacus emerges as the first dedicated device for calculations, alongside the advent of libraries for mass data storage.

Jumping ahead to the 20th century, significant milestones include the Social Security Act of 1937, leading to the first major data project aimed at tracking contributions from millions of Americans. The British Colossus, developed in 1943 during World War II, marked a breakthrough in data processing by deciphering Nazi codes. In 1965, the U.S. Government initiated the first-ever data center to store vast amounts of tax returns and fingerprints.

The internet age ushered in a new era, with Tim Berners-Lee and Robert Cailliau creating the World Wide Web in 1989-1990, enabling widespread access to data. Google's domain registration in 1997 and subsequent launch in 1998 marked the beginning of the search engine's dominance, contributing to advancements in machine learning, big data, and analytics.

Notable developments in the late 1990s include Carlo Strozzi's creation of NoSQL in 1998 and the publication of "How Much Information" in 1999, attempting to quantify the world's digital information. The early 2000s witnessed the expansion of web traffic and online stores, with companies like Yahoo, Amazon, and eBay analyzing customer behavior using click-rates, location data, and search logs.

Roger Mougalas officially labeled it as 'Big Data' in 2005, describing large datasets challenging traditional business intelligence tools. Hadoop, born in the same year, revolutionized data processing capabilities. This period marked Big Data's entry into various industries, transforming healthcare, space exploration, music creation, and cybersecurity.

Looking to the future, innovations in cloud computing, quantum computing, the Internet of Things (IoT), and artificial intelligence promise to further evolve Big Data. As we explore new ways to harness its potential, the landscape of data governance, quality, and business intelligence will continue to shape industries.

And there you have it, a comprehensive journey through the history and evolution of Big Data, supported by evidence and a passion for data.

The history of big data | LightsOnData (2024)
Top Articles
Latest Posts
Article information

Author: Tyson Zemlak

Last Updated:

Views: 5401

Rating: 4.2 / 5 (63 voted)

Reviews: 86% of readers found this page helpful

Author information

Name: Tyson Zemlak

Birthday: 1992-03-17

Address: Apt. 662 96191 Quigley Dam, Kubview, MA 42013

Phone: +441678032891

Job: Community-Services Orchestrator

Hobby: Coffee roasting, Calligraphy, Metalworking, Fashion, Vehicle restoration, Shopping, Photography

Introduction: My name is Tyson Zemlak, I am a excited, light, sparkling, super, open, fair, magnificent person who loves writing and wants to share my knowledge and understanding with you.