Lesson 1: Big Data & Hadoop Introduction (3 hrs)
Will cover Big Data characteristics, need of a framework such as Hadoop & its ecosystem. You will also be introduced to important daemons that support functioning of a Hadoop cluster.
o Data & Existing Solutions
o Welcome to the world of Big Data—What, Why & Where
o Case studies
o Hadoop & its Ecosystem
o Hadoop Core components
o Hadoop & its capabilities
Lesson 2: HDFS (4 hrs)
You will learn about Hadoop Distributed file System, its architecture, working & internals, Hadoop different distributions and about their similarities & differences.
o Gain knowledge on HDFS its internals, working & features
o Learn about possibilities without HDFS
o Differentiate or find similarities in different distributions of Hadoop.
o Identify the requirements to setup a Hadoop cluster
Lesson 3: Hadoop Cluster (3 hrs)
You will learn about steps to setup Apache Hadoop (core distribution) & Cloudera Distribution of Hadoop (vendor specific), cluster management solutions and their benefits and nut & bolts of Cloudera Distribution of Hadoop. You will also learn how to verify your cluster.
o Need of Cluster Management Solution
o Choice of Installation methods—Automated/ Manual
o Linux machines setup—Virtualization & Cloud
o Hadoop Cluster Setup—Apache Hadoop V2 & Cloudera Distribution of Hadoop (CDH)
o Cloudera manager features and capabilities
o Working with Hadoop cluster, HDFS & data
o Working with management console/ UI ( user interfaces) & Linux terminals
o Understand administration scenarios
Lesson 4: Hadoop Configurations & Daemon Logs (4 hrs)
Will learn about configuration files, ports & properties that relate to functioning of Hadoop cluster. You will also learn about Hadoop daemons logs and how they help in problem scenarios for diagnosing & gathering information.
o List and describe the files that control Hadoop configuration
o Explain how to manage Hadoop configuration with Cloudera Manager
o Locate configuration files and make changes
o Explain how to deal with stale configurations
o Explain the properties of addresses and ports of RPC and HTTP servers run by Hadoop Daemons
o Locate log files generated on hosts
o Filter information in log files
o Explain how to get diagnostic information from log files
Lesson 5: Hadoop Cluster Maintenance & Administration (3 hrs)
You will learn Hadoop cluster maintenance and administration activities. You will also learn the short comings of Hadoop v1 and how they are fulfilled by Hadoop v2 features.
o Explain how to add and remove nodes in an adhoc way
o Explain how to add and remove nodes in a systematic way (commissioning and decommissioning of nodes)
o Explain how to balance a cluster
o List the steps for managing services including adding, deleting, starting, stopping and checking status of services
o Explain the procedure to enable rack awareness
o List the steps to add, remove and move role instances and hosts
o Setup Users and Quota's
o Diagnostics and Recovery
Lesson 6: Hadoop Computational Frameworks (3 hrs)
You will learn about different types of computational frameworks, MapReduce & YARN concepts & configurations and how YARN manages applications.
o Describe the role of computational frameworks
o Explain MapReduce concepts
o Explain YARN framework and concepts
o Describe MRv2 on YARN
o Explain configuring and understanding of YARN
o Describe YARN applications
o Describe YARN memory and CPU settings
Lesson 7: Scheduling—Managing resources via Schedulers (3 hrs)
You will learn cluster scheduling concepts, managing resources in your YARN cluster by usage of schedulers & queue management to manage jobs/applications.
o Describe the scheduling concepts
o Identify the Schedulers
o Explain the ways to manage resources using Schedulers
o Describe FIFO, Fair Scheduler, and Capacity Scheduler
o Explain how to configure Schedulers
o Explain queue management
Lesson 8: Hadoop Cluster Planning (3 hrs)
You will learn about how to plan your Hadoop cluster, considerations for cluster sizing & workload patterns in Hadoop cluster, making choices pertaining to variables such as hardware, software & different cluster deployment options.
o Planning Hadoop Cluster
o General Planning considerations
o Workload and cluster sizing
o Hadoop Cluster Setup Options—Physical, Virtualization, Cloud or Hybrid
o Making Choices—Hardware, Software & Network
o Making Choices—Master/Slave considerations
o News from the world—Existing Setups
Lesson 9: Hadoop Clients, HUE, Ganglia, Puppet and Ambari interface (4 hrs)
You will learn about Hadoop clients, nodes that support Hadoop clients and web interface such as HUE, Gagnlia, Puppet and Ambari which can be used to work with Hadoop cluster and its components.
o Explain how Hue, Ganglia, Puppet and Ambari works
o Install and configure Hue, Ganglia, Puppet and Ambari
o Describe how authentication and authorization is managed in Hue , Ganglia, Puppet and Ambari
Lesson 10: Data Ingestion and components/ services in Hadoop Cluster (4 hrs)
You will learn about data ingestion types & tools, such as Flume, Sqoop that can be used for data import/export. Also open-source that work within Hadoop ecosystem such as Hive, Hbase, kafka & Spark.
o Understand Data Ingestion and various data ingestion tools & their capabilities
o Understanding how Flume and Sqoop works
o List the advantages and key features of Hive, Pig, Kafka, Zookeeper, Hbase, Spark, Oozie etc.
o Components of Hive, Pig, Kafka, Zookeeper, Hbase, Spark, Oozie etc.
Lesson 11: Hadoop Security—Securing Hadoop Cluster (3 hrs)
You will learn about security aspects and security implementation in a Hadoop cluster to secure data & cluster.
o Describe the different ways to avoid risks and secure data
o Identify the different threat categories
o Describe the security aspects for different nodes
o Describe operating system security
o Describe Kerberos and how it works
o Describe Service Level Authorization
Lesson 12: Cluster Monitoring (3 hrs)
You will learn about basics of cluster monitoring, choosing right monitoring solutions, Hadoop metrics categories & types and Cloudera manager’s features and capabilities that can be used for monitoring your Hadoop cluster.
o Describe cluster monitoring
o Describe the ways to choose the right monitoring solutions
o List the features and considerations of Cloudera manager for monitoring
o Describe the different categories of Hadoop Metrics
o List the different types of Hadoop Metrics
o List the steps to monitor a cluster by using Cloudera Manager