Difference between revisions of "MapR: Installation"

Revision as of 16:01, 26 March 2015

The MapR quick installer automates the process of configuring a Hadoop cluster and installing MapR software based on node type. You can install the MapR distribution for Hadoop on a set of nodes from any machine that can connect to the nodes. Using the quick installer, you can configure each node in a MapR cluster as one of the following types:

Control Node: Control nodes manage the operation of the cluster. Control nodes host the ZooKeeper, CLDB, JobTracker, ResourceManager, and Webserver services. One control node also hosts the HistoryServer.
Data Nodes: Data nodes host the NodeManager, TaskTracker, and FileServer services. These nodes store data, run YARN applications and MapReduce jobs, and process table data.
Control-as-Data Nodes: Control-as-data nodes combine control and data node functionality. This node type is appropriate for small clusters.
Client Nodes: Client nodes provide controlled user access to the cluster.

Ecosystem Component Installation

In addition to installing the core components of the MapR Hadoop distribution, the MapR quick installer supports installation of Apache Spark, Hive, and HBase. To install the Spark and Hive ecosystem components, you must use the quick installer configuration file. You can also use the configuration file to install HBase, however when you run the quick installer in interactive mode, the installer prompts you to see if you want HBase or MapR-DB installed. Entering y at these prompts instructs the installer to install HBase and/or MapR-DB during the installation process.

Installation Steps

To successfully install MapR using the quick installer, complete the following steps:

Make sure your installation machine and nodes meet all of the prerequisites.
Prepare for the installation and set up the installation machine.
Run the quick installer.
Complete the post installation steps.

Prerequisites

Verify that your installation machine and the nodes that you plan to install MapR on meet the required prerequisites.

Installation Machine Prerequisites

The machine from which you run the quick installer must run one of the following operating systems:

Ubuntu 12.04 or later
RedHat (with the EPEL repository installed) 6.1 or later
CentOS (with the EPEL repository installed) 6.1 or later
SuSE 11SP2
To install from a machine running SuSE, you must create a symbolic link named libssl.so.10, that points to libssl.so.1.0.0 under /usr/lib64 before you install.

Example:

cd /usr/lib64
ln -s libssl.so.1.0.0 libssl.so.10

MapR Node Prerequisites

The nodes that you install MapR on must meet the following prerequisites:

Java 1.7 or 1.8
Python 2.6 or later
The operating system on each node must meet the listed package dependencies. The quick installer should install these dependencies automatically. If not, you can install them manually. For RedHat and CentOS, you must have the EPEL repository installed for the quick installer to install the dependencies automatically.
1. Ubuntu/SUSE Package Dependencies:
  python-pycurl
  
  openssl
  
  sshpass
2. RedHat/CentOS Package Dependencies
  python-pycurl
  
  libselinux-python
  
  openssl
  
  sshpass
  
  openssh-clients

Installing the EPEL Repository

If you need to install the EPEL repository, complete the following steps:

Download the version of the EPEL repository that corresponds to the version of your operating system:
- EPEL-6
  wget http://download.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm
- EPEL-7
  wget http://download.fedoraproject.org/pub/epel/7/x86_64/e/epel-release-7-5.noarch.rpm
Issue the following command to install the EPEL repository, replacing version with the EPEL version:
- Syntax
  rpm -Uvh epel-release-<version>*.rpm
- Example
  rpm -Uvh epel-release-6*.rpm

Before You Run the Quick Installer

Before you run the quick installer to install MapR on your cluster, verify that you have completed all of the preparation tasks and set up the installation machine.

Preparing for Installation

Verify that you have completed the following preparation tasks before you set up the installation machine:

Determine the number of control nodes. The MapR installer supports one or three control nodes. Three control nodes are typically sufficient for clusters up to approximately 100 nodes.
Determine the data and client nodes. The MapR installer supports an arbitrary number of data or client nodes.
Ensure all nodes have internet access. For online installation only.
Ensure access to a local repository of MapR packages and Linux distribution repositories. For offline installation only. Ensure that you have access to a local repository of MapR packages and to Linux distribution repositories.
Decide if you will install Spark or Hive. If you decide to install Apache ecosystem projects, like Spark or Hive, you must install using the configuration file.
Verify that all nodes you plan to install on are configured to have the same login information. If you are using the quick installer in interactive mode, described later in this document, verify that all the nodes have the same disks for use by the MapR Hadoop Platform.
Identify disks to allocate to the MapR file system. For each node in the cluster, you must identify which disks you want to allocate to the MapR file system. If the same set of disks and partitions applies to all nodes in the cluster, you can use interactive mode for the installer. To specify a distinct set of disks and partitions for individual cluster nodes, you need to use the configuration file. The installer’s interactive mode and configuration files are discussed in depth later in this page.
Use the lsblk or fdisk -l commands to determine the full path for the disks that you plan to use.

Setting Up the Installation Machine

Complete the following steps to set up the installation machine:

Download the mapr-setup file for the MapR version that you plan to install. The following examples use the wget utility to download mapr-setup for MapR version 4.0.2, however you can also download mapr-setup for MapR v4.0.1.
- Ubuntu
  wget http://package.mapr.com/releases/v4.0.2/ubuntu/mapr-setup
- RedHat/CentOS
  wget http://package.mapr.com/releases/v4.0.2/redhat/mapr-setup
- SuSE
  wget http://package.mapr.com/releases/v4.0.2/suse/mapr-setup
Navigate to the directory where you downloaded mapr-setup, and enable execute permissions with the following command:
chmod 755 mapr-setup
Run mapr-setup to unpack the installer files to the /opt/mapr-installer directory. The user running mapr-setup must have write access to the /opt and /tmp directories. You can execute mapr-setup with sudo privileges:
sudo ./mapr-setup

The system extracts the installer and copies the set up files to /opt/mapr-install. The system prompts you to run /opt/mapr-installer/bin/install to begin the installation process. Follow the guidelines in the Using the MapR Quick Installer section.

Note:

This installer enables password-authenticated ssh login, which remains enabled after installation. You can disable password authentication for ssh manually after installation by adding the following line to the sshd_config file and restarting ssh: PasswordAuthentication no

Using the MapR Quick Installer

Use the MapR quick installer in interactive mode from the command line or provide a configuration file. Options for both are provided below.

Running the Quick Installer

To run the quick installer, login as the root user or use sudo, and issue the following command:

/opt/mapr-installer/bin/install new

Interactive Mode Sample Session

The following output reflects a typical interactive-mode session with the MapR quick installer.

===============================================================================
=                                                                             =
=  __  __                ____    ___              _          _  _             =
= |  \/  |  __ _  _ __  |  _ \  |_ _| _ __   ___ | |_  __ _ | || |  ___  _ __ =
= | |\/| | / _  ||  _ \ | |_) |  | | |  _ \ / __|| __|/ _  || || | / _ \|  __|=
= | |  | || (_| || |_) ||  _ <   | | | | | |\__ \| |_| (_| || || ||  __/| |   =
= |_|  |_| \__ _|| .__/ |_| \_\ |___||_| |_||___/ \__|\__,_||_||_| \___||_|   =
=                 |_|                                                         =
=                                                                             =
===============================================================================
Version: 4.0.2.123
An Installer config file is typically used by experienced MapR admins to skip through the interview process.
Do you have a config file (y/n) [n]: n
Enter the hostnames of all the control nodes separated by spaces or commas []: control-host-01

Note: Only 1 or 3 control nodes are supported.

Note: Hostname resolution of all nodes in the cluster must be consistent across cluster nodes and the multi-node installer's driver node (the node from which the installation is launched). For example, either all nodes must be specified with a fully qualified domain name (FQDN) or none of the nodes can be specified with their FQDN.

Enter the hostnames of all the data nodes separated by spaces or commas []: data-host-01,data-host-02
Set MapR User Name [mapr]:
Set MapR User Password [mapr]:
Is this cluster going to run YARN? (y/n) [y]:
Is this cluster going to run MapReduce1? (y/n) [n]:
Is this cluster going to run Apache HBase? (y/n) [n]:
Is this cluster going to run MapR-DB? (y/n) [y]:
Enter the full path of disks for hosts separated by spaces or commas []: /dev/xvdf,/dev/xvdg

Note: The MapR quick installer uses the same set of disks and partitions for each node in the cluster. To specify disks and partitions individually for each node, use a configuration file.

Once you have specified the cluster’s configuration information, the MapR quick installer displays the configuration and asks for confirmation:

       Current Information (Please verify if correct)
       ==============================================
       Accessibility settings:
           Cluster Name: "my.cluster.com"
           MapR User Name: "mapr"
           MapR Group Name: "mapr"
           MapR User UID: "2000"
           MapR User GID: "2000"
           MapR User Password (Default: mapr): "****"
       Functional settings:
           WireLevel Security: "n"
           MapReduce Services: "n"
           YARN: "y" 
           MapR-DB: "y"
           HBase: "n"
           Disks to use: "/dev/xvdf,/dev/xvdg"
           Client Nodes: ""
           Control Nodes: "control-host-01"
           Data Nodes: "data-host-01,data-host-02"
           Repository (will download core software from here): "http://package.mapr.com/releases"
           Ecosystem Repository (will download packages like Pig, Hive etc from here): "http://package.mapr.com/releases/ecosystem"
           MapR Version to Install: "4.0.2"
           Java Version to Install: "OpenJDK7"
           Allow Control Nodes to function as Data Nodes (Not recommended for large clusters): "n"
           Local Repository: "n"
       Metrics settings:
           Metrics DB Host and Port: ""
           Metrics DB User Name: ""
           Metrics DB User Password: ""
           Metrics DB Schema: ""
(c)ontinue with install, (m)odify options, or save current configuration and (a)bort? (c/m/a) [c]: m

Note: Before you proceed with the installation, you can modify various settings. For example, you can set the stripe width (the number of disks in each storage pool) by selecting the sw option from the modification menu. You should also force format your disks if they were previously used in a MapR installation.

As you continue with the installation, the installer prompts you for the login credentials:

(c)ontinue with install, (m)odify options, or save current configuration and (a)bort? (c/m/a) [c]: c
SSH Username: root
SSH password: ****
Now running on Added Control Nodes: [control-host-01]

The quick installer first sets up the control nodes in parallel, then sets up data nodes in groups of ten nodes at a time. Pre-requisite packages are automatically downloaded and installed by the MapR quick installer.

Quick Installer Configuration File

Installation with a configuration file is appropriate when:

You want to perform a non-interactive installation for speed or repetition.
The target nodes have different disk configurations.
You want to install an Apache ecosystem component like Spark or Hive.

To perform this type of installation, you must first create a configuration file. The example file, config.example, in the /opt/mapr-installer/bin directory shows the expected format of an installation configuration file.

@@ Line 198: / Line 198: @@
 The quick installer first sets up the control nodes in parallel, then sets up data nodes in groups of ten nodes at a time. Pre-requisite packages are automatically downloaded and installed by the MapR quick installer.
+== Quick Installer Configuration File ==
+Installation with a configuration file is appropriate when:
+*You want to perform a non-interactive installation for speed or repetition.
+*The target nodes have different disk configurations.
+*You want to install an Apache ecosystem component like Spark or Hive.
+To perform this type of installation, you must first create a configuration file. The example file, config.example, in the /opt/mapr-installer/bin directory shows the expected format of an installation configuration file.