Chapter 5. First steps in big data · Introducing Data Science: Big data, machine learning, and more, using Python tools (2024)

This chapter covers

  • Taking your first steps with two big data applications: Hadoop and Spark
  • Using Python to write big data jobs
  • Building an interactive dashboard that connects to data stored in a big data database

Over the last two chapters, we’ve steadily increased the size of the data. In chapter 3 we worked with data sets that could fit into the main memory of a computer. Chapter 4 introduced techniques to deal with data sets that were too large to fit in memory but could still be processed on a single computer. In this chapter you’ll learn to work with technologies that can handle data that’s so large a single node (computer) no longer suffices. In fact it may not even fit on a hundred computers. Now that’s a challenge, isn’t it?

We’ll stay as close as possible to the way of working from the previous chapters; the focus is on giving you the confidence to work on a big data platform. To do this, the main part of this chapter is a case study. You’ll create a dashboard that allows you to explore data from lenders of a bank. By the end of this chapter you’ll have gone through the following steps:

  • Load data into Hadoop, the most common big data platform.
  • Transform and clean data with Spark.
  • Store it into a big data database called Hive.
  • Interactively visualize this data with Qlik Sense, a visualization tool.

5.1. Distributing data storage and processing with frameworks

5.2. Case study: Assessing risk when loaning money

5.3. Summary

Chapter 5. First steps in big data · Introducing Data Science: Big data, machine learning, and more, using Python tools (2024)
Top Articles
Latest Posts
Article information

Author: Sen. Emmett Berge

Last Updated:

Views: 6604

Rating: 5 / 5 (80 voted)

Reviews: 87% of readers found this page helpful

Author information

Name: Sen. Emmett Berge

Birthday: 1993-06-17

Address: 787 Elvis Divide, Port Brice, OH 24507-6802

Phone: +9779049645255

Job: Senior Healthcare Specialist

Hobby: Cycling, Model building, Kitesurfing, Origami, Lapidary, Dance, Basketball

Introduction: My name is Sen. Emmett Berge, I am a funny, vast, charming, courageous, enthusiastic, jolly, famous person who loves writing and wants to share my knowledge and understanding with you.