Big Data on Kubeetes:A practical guide to building efficient and scalable data solutions
by: Neylson Crepalde (Author)
Publisher: Packt Publishing
Publication Date: 2024/7/19
Language: English
Print Length: 296 pages
ISBN-10: 1835462146
ISBN-13: 9781835462140
Book Description
Gain hands-on experience in building efficient and scalable big data architecture on Kubeetes, utilizing leading technologies such as Spark, Airflow, Kafka, and Trino Key FeaturesLeverage Kubeetes in a cloud environment to integrate seamlessly with a variety of toolsExplore best practices for optimizing the performance of big data pipelinesBuild end-to-end data pipelines and discover real-world use cases using popular tools like Spark, Airflow, and KafkaPurchase of the print or Kindle book includes a free PDF eBookBook DescriptionIn today's data-driven world, organizations across different sectors need scalable and efficient solutions for processing large volumes of data. Kubeetes offers an open-source and cost-effective platform for deploying and managing big data tools and workloads, ensuring optimal resource utilization and minimizing operational overhead. If you want to master the art of building and deploying big data solutions using Kubeetes, then this book is for you.Written by an experienced data specialist, Big Data on Kubeetes takes you through the entire process of developing scalable and resilient data pipelines, with a focus on practical implementation. Starting with the basics, you’ll progress toward leaing how to install Docker and run your first containerized applications. You’ll then explore Kubeetes architecture and understand its core components. This knowledge will pave the way for exploring a variety of essential tools for big data processing such as Apache Spark and Apache Airflow. You’ll also lea how to install and configure these tools on Kubeetes clusters. Throughout the book, you’ll gain hands-on experience building a complete big data stack on Kubeetes.By the end of this Kubeetes book, you’ll be equipped with the skills and knowledge you need to tackle real-world big data challenges with confidence.What you will leaInstall and use Docker to run containers and build concise imagesGain a deep understanding of Kubeetes architecture and its componentsDeploy and manage Kubeetes clusters on different cloud platformsImplement and manage data pipelines using Apache Spark and Apache AirflowDeploy and configure Apache Kafka for real-time data ingestion and processingBuild and orchestrate a complete big data pipeline using open-source toolsDeploy Generative AI applications on a Kubeetes-based architectureWho this book is forIf you’re a data engineer, BI analyst, data team leader, data architect, or tech manager with a basic understanding of big data technologies, then this big data book is for you. Familiarity with the basics of Python programming, SQL queries, and YAML is required to understand the topics discussed in this book. Table of ContentsGetting Started with ContainersKubeetes ArchitectureKubeetes - Hands OnThe Mode Data Stack Big Data Processing with Apache SparkApache Airflow for Building PipelinesApache Kafka for Real-Time Events and Data IngestionDeploying the Big Data Stack on KubeetesData Consumption LayerBuilding a Big Data Pipeline on KubeetesAI/ML Workloads on KubeetesWhere to Go from Here
About the Author
Gain hands-on experience in building efficient and scalable big data architecture on Kubeetes, utilizing leading technologies such as Spark, Airflow, Kafka, and Trino Key FeaturesLeverage Kubeetes in a cloud environment to integrate seamlessly with a variety of toolsExplore best practices for optimizing the performance of big data pipelinesBuild end-to-end data pipelines and discover real-world use cases using popular tools like Spark, Airflow, and KafkaPurchase of the print or Kindle book includes a free PDF eBookBook DescriptionIn today's data-driven world, organizations across different sectors need scalable and efficient solutions for processing large volumes of data. Kubeetes offers an open-source and cost-effective platform for deploying and managing big data tools and workloads, ensuring optimal resource utilization and minimizing operational overhead. If you want to master the art of building and deploying big data solutions using Kubeetes, then this book is for you.Written by an experienced data specialist, Big Data on Kubeetes takes you through the entire process of developing scalable and resilient data pipelines, with a focus on practical implementation. Starting with the basics, you’ll progress toward leaing how to install Docker and run your first containerized applications. You’ll then explore Kubeetes architecture and understand its core components. This knowledge will pave the way for exploring a variety of essential tools for big data processing such as Apache Spark and Apache Airflow. You’ll also lea how to install and configure these tools on Kubeetes clusters. Throughout the book, you’ll gain hands-on experience building a complete big data stack on Kubeetes.By the end of this Kubeetes book, you’ll be equipped with the skills and knowledge you need to tackle real-world big data challenges with confidence.What you will leaInstall and use Docker to run containers and build concise imagesGain a deep understanding of Kubeetes architecture and its componentsDeploy and manage Kubeetes clusters on different cloud platformsImplement and manage data pipelines using Apache Spark and Apache AirflowDeploy and configure Apache Kafka for real-time data ingestion and processingBuild and orchestrate a complete big data pipeline using open-source toolsDeploy Generative AI applications on a Kubeetes-based architectureWho this book is forIf you’re a data engineer, BI analyst, data team leader, data architect, or tech manager with a basic understanding of big data technologies, then this big data book is for you. Familiarity with the basics of Python programming, SQL queries, and YAML is required to understand the topics discussed in this book. Table of ContentsGetting Started with ContainersKubeetes ArchitectureKubeetes - Hands OnThe Mode Data Stack Big Data Processing with Apache SparkApache Airflow for Building PipelinesApache Kafka for Real-Time Events and Data IngestionDeploying the Big Data Stack on KubeetesData Consumption LayerBuilding a Big Data Pipeline on KubeetesAI/ML Workloads on KubeetesWhere to Go from Here