Difference between revisions of "MapR: Installation"

Latest revision as of 09:43, 27 March 2015

All the following info and more are available in:

The MapR quick installer automates the process of configuring a Hadoop cluster and installing MapR software based on node type. You can install the MapR distribution for Hadoop on a set of nodes from any machine that can connect to the nodes. Using the quick installer, you can configure each node in a MapR cluster as one of the following types:

Control Node: Control nodes manage the operation of the cluster. Control nodes host the ZooKeeper, CLDB, JobTracker, ResourceManager, and Webserver services. One control node also hosts the HistoryServer.
Data Nodes: Data nodes host the NodeManager, TaskTracker, and FileServer services. These nodes store data, run YARN applications and MapReduce jobs, and process table data.
Control-as-Data Nodes: Control-as-data nodes combine control and data node functionality. This node type is appropriate for small clusters.
Client Nodes: Client nodes provide controlled user access to the cluster.

Ecosystem Component Installation

In addition to installing the core components of the MapR Hadoop distribution, the MapR quick installer supports installation of Apache Spark, Hive, and HBase. To install the Spark and Hive ecosystem components, you must use the quick installer configuration file. You can also use the configuration file to install HBase, however when you run the quick installer in interactive mode, the installer prompts you to see if you want HBase or MapR-DB installed. Entering y at these prompts instructs the installer to install HBase and/or MapR-DB during the installation process.

Installation Steps

To successfully install MapR using the quick installer, complete the following steps:

Make sure your installation machine and nodes meet all of the prerequisites.
Prepare for the installation and set up the installation machine.
Run the quick installer.
Complete the post installation steps.

Prerequisites

Verify that your installation machine and the nodes that you plan to install MapR on meet the required prerequisites.

Installation Machine Prerequisites

The machine from which you run the quick installer must run one of the following operating systems:

Ubuntu 12.04 or later
RedHat (with the EPEL repository installed) 6.1 or later
CentOS (with the EPEL repository installed) 6.1 or later
SuSE 11SP2
To install from a machine running SuSE, you must create a symbolic link named libssl.so.10, that points to libssl.so.1.0.0 under /usr/lib64 before you install.

Example:

cd /usr/lib64
ln -s libssl.so.1.0.0 libssl.so.10

MapR Node Prerequisites

The nodes that you install MapR on must meet the following prerequisites:

Java 1.7 or 1.8
Python 2.6 or later
The operating system on each node must meet the listed package dependencies. The quick installer should install these dependencies automatically. If not, you can install them manually. For RedHat and CentOS, you must have the EPEL repository installed for the quick installer to install the dependencies automatically.
1. Ubuntu/SUSE Package Dependencies:
  python-pycurl
  
  openssl
  
  sshpass
2. RedHat/CentOS Package Dependencies
  python-pycurl
  
  libselinux-python
  
  openssl
  
  sshpass
  
  openssh-clients

Installing the EPEL Repository

If you need to install the EPEL repository, complete the following steps:

Download the version of the EPEL repository that corresponds to the version of your operating system:
- EPEL-6
  wget http://download.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm
- EPEL-7
  wget http://download.fedoraproject.org/pub/epel/7/x86_64/e/epel-release-7-5.noarch.rpm
Issue the following command to install the EPEL repository, replacing version with the EPEL version:
- Syntax
  rpm -Uvh epel-release-<version>*.rpm
- Example
  rpm -Uvh epel-release-6*.rpm

Before You Run the Quick Installer

Before you run the quick installer to install MapR on your cluster, verify that you have completed all of the preparation tasks and set up the installation machine.

Preparing for Installation

Verify that you have completed the following preparation tasks before you set up the installation machine:

Determine the number of control nodes. The MapR installer supports one or three control nodes. Three control nodes are typically sufficient for clusters up to approximately 100 nodes.
Determine the data and client nodes. The MapR installer supports an arbitrary number of data or client nodes.
Ensure all nodes have internet access. For online installation only.
Ensure access to a local repository of MapR packages and Linux distribution repositories. For offline installation only. Ensure that you have access to a local repository of MapR packages and to Linux distribution repositories.
Decide if you will install Spark or Hive. If you decide to install Apache ecosystem projects, like Spark or Hive, you must install using the configuration file.
Verify that all nodes you plan to install on are configured to have the same login information. If you are using the quick installer in interactive mode, described later in this document, verify that all the nodes have the same disks for use by the MapR Hadoop Platform.
Identify disks to allocate to the MapR file system. For each node in the cluster, you must identify which disks you want to allocate to the MapR file system. If the same set of disks and partitions applies to all nodes in the cluster, you can use interactive mode for the installer. To specify a distinct set of disks and partitions for individual cluster nodes, you need to use the configuration file. The installer’s interactive mode and configuration files are discussed in depth later in this page.
Use the lsblk or fdisk -l commands to determine the full path for the disks that you plan to use.

Setting Up the Installation Machine

Complete the following steps to set up the installation machine:

Download the mapr-setup file for the MapR version that you plan to install. The following examples use the wget utility to download mapr-setup for MapR version 4.0.2, however you can also download mapr-setup for MapR v4.0.1.
- Ubuntu
  wget http://package.mapr.com/releases/v4.0.2/ubuntu/mapr-setup
- RedHat/CentOS
  wget http://package.mapr.com/releases/v4.0.2/redhat/mapr-setup
- SuSE
  wget http://package.mapr.com/releases/v4.0.2/suse/mapr-setup
Navigate to the directory where you downloaded mapr-setup, and enable execute permissions with the following command:
chmod 755 mapr-setup
Run mapr-setup to unpack the installer files to the /opt/mapr-installer directory. The user running mapr-setup must have write access to the /opt and /tmp directories. You can execute mapr-setup with sudo privileges:
sudo ./mapr-setup

The system extracts the installer and copies the set up files to /opt/mapr-install. The system prompts you to run /opt/mapr-installer/bin/install to begin the installation process. Follow the guidelines in the Using the MapR Quick Installer section.

Note:

This installer enables password-authenticated ssh login, which remains enabled after installation. You can disable password authentication for ssh manually after installation by adding the following line to the sshd_config file and restarting ssh: PasswordAuthentication no

Using the MapR Quick Installer

Use the MapR quick installer in interactive mode from the command line or provide a configuration file. Options for both are provided below.

Running the Quick Installer

To run the quick installer, login as the root user or use sudo, and issue the following command:

/opt/mapr-installer/bin/install new

Interactive Mode Sample Session

The following output reflects a typical interactive-mode session with the MapR quick installer.

===============================================================================
=                                                                             =
=  __  __                ____    ___              _          _  _             =
= |  \/  |  __ _  _ __  |  _ \  |_ _| _ __   ___ | |_  __ _ | || |  ___  _ __ =
= | |\/| | / _  ||  _ \ | |_) |  | | |  _ \ / __|| __|/ _  || || | / _ \|  __|=
= | |  | || (_| || |_) ||  _ <   | | | | | |\__ \| |_| (_| || || ||  __/| |   =
= |_|  |_| \__ _|| .__/ |_| \_\ |___||_| |_||___/ \__|\__,_||_||_| \___||_|   =
=                 |_|                                                         =
=                                                                             =
===============================================================================
Version: 4.0.2.123
An Installer config file is typically used by experienced MapR admins to skip through the interview process.
Do you have a config file (y/n) [n]: n
Enter the hostnames of all the control nodes separated by spaces or commas []: control-host-01

Note: Only 1 or 3 control nodes are supported.

Note: Hostname resolution of all nodes in the cluster must be consistent across cluster nodes and the multi-node installer's driver node (the node from which the installation is launched). For example, either all nodes must be specified with a fully qualified domain name (FQDN) or none of the nodes can be specified with their FQDN.

Enter the hostnames of all the data nodes separated by spaces or commas []: data-host-01,data-host-02
Set MapR User Name [mapr]:
Set MapR User Password [mapr]:
Is this cluster going to run YARN? (y/n) [y]:
Is this cluster going to run MapReduce1? (y/n) [n]:
Is this cluster going to run Apache HBase? (y/n) [n]:
Is this cluster going to run MapR-DB? (y/n) [y]:
Enter the full path of disks for hosts separated by spaces or commas []: /dev/xvdf,/dev/xvdg

Note: The MapR quick installer uses the same set of disks and partitions for each node in the cluster. To specify disks and partitions individually for each node, use a configuration file.

Once you have specified the cluster’s configuration information, the MapR quick installer displays the configuration and asks for confirmation:

       Current Information (Please verify if correct)
       ==============================================
       Accessibility settings:
           Cluster Name: "my.cluster.com"
           MapR User Name: "mapr"
           MapR Group Name: "mapr"
           MapR User UID: "2000"
           MapR User GID: "2000"
           MapR User Password (Default: mapr): "****"
       Functional settings:
           WireLevel Security: "n"
           MapReduce Services: "n"
           YARN: "y" 
           MapR-DB: "y"
           HBase: "n"
           Disks to use: "/dev/xvdf,/dev/xvdg"
           Client Nodes: ""
           Control Nodes: "control-host-01"
           Data Nodes: "data-host-01,data-host-02"
           Repository (will download core software from here): "http://package.mapr.com/releases"
           Ecosystem Repository (will download packages like Pig, Hive etc from here): "http://package.mapr.com/releases/ecosystem"
           MapR Version to Install: "4.0.2"
           Java Version to Install: "OpenJDK7"
           Allow Control Nodes to function as Data Nodes (Not recommended for large clusters): "n"
           Local Repository: "n"
       Metrics settings:
           Metrics DB Host and Port: ""
           Metrics DB User Name: ""
           Metrics DB User Password: ""
           Metrics DB Schema: ""
(c)ontinue with install, (m)odify options, or save current configuration and (a)bort? (c/m/a) [c]: m

Note: Before you proceed with the installation, you can modify various settings. For example, you can set the stripe width (the number of disks in each storage pool) by selecting the sw option from the modification menu. You should also force format your disks if they were previously used in a MapR installation.

As you continue with the installation, the installer prompts you for the login credentials:

(c)ontinue with install, (m)odify options, or save current configuration and (a)bort? (c/m/a) [c]: c
SSH Username: root
SSH password: ****
Now running on Added Control Nodes: [control-host-01]

The quick installer first sets up the control nodes in parallel, then sets up data nodes in groups of ten nodes at a time. Pre-requisite packages are automatically downloaded and installed by the MapR quick installer.

Quick Installer Configuration File

Installation with a configuration file is appropriate when:

You want to perform a non-interactive installation for speed or repetition.
The target nodes have different disk configurations.
You want to install an Apache ecosystem component like Spark or Hive.

To perform this type of installation, you must first create a configuration file. The example file, config.example, in the /opt/mapr-installer/bin directory shows the expected format of an installation configuration file.

Here is a working configuration file for one Control Node, which acts like a Data Node as well: File:MapR config.txt

Notes:

For a new installation, all sections must be present in the configuration file, although the [Data_Nodes] and [Client_Nodes] sections can be left empty. For additions to an existing installation, the [Control_Nodes], [Data_Nodes], and [Client_Nodes] must be present, although they can be left empty. Other sections in the configuration file are silently ignored for additions.

The value of the Disks element of the [Defaults] section provides a fallback in the case where a node is specified in a previous [Control_Nodes], [Data_Nodes], or [Client_Nodes] section without any disk information.

If the disks were used for a previous MapR installation, you must set ForceFormat to true.
ForceFormat reformats the disk, and any existing data will be permanently lost.

You do not have to specify values for the keys in the [Defaults] section, but each of the keys must be present.

Once the configuration file is created, you can initiate installation with one of the following commands.

Ubuntu
sudo /opt/mapr-installer/bin/install -K -s --cfg /opt/mapr/bin/myconfigfile new
RedHat/CentOS
sudo /opt/mapr-installer/bin/install --cfg /opt/mapr/bin/myconfigfile new
SuSE
sudo /opt/mapr-installer/bin/install --cfg /opt/mapr/bin/myconfigfile new

Installing Spark Using the Configuration File

To install Spark, uncomment and complete the configuration in the [Spark] section of the configuration file. You must specify one or more hostnames to be SparkMaster nodes, which coordinate execution of Spark jobs, and one or more hostnames to serve as the SparkWorker nodes, which execute Spark jobs. You can also modify the Spark memory configuration settings based on your environment.

Note: Scala 2.10.3 or later is a pre-requisite for Spark installation. Verify that Scala is installed on nodes where you plan to install Spark.

Installing Hive Components using the Configuration File

To install Hive client and server components, uncomment and complete the configuration under the [Hive] section in the configuration file. You may configure one or more hostnames on which to install the Hive clients, typically the same hostnames as specified in the [Clients] section. One or more hostnames can also be provided for installation of HiveServer2, and a single hostname on which to install the Hive metastore. The Hive metastore will be configured to use the default Derby database.

Post Installation

To complete the post installation process, follow these steps:

Access the MCS by entering the following URL in your browser, substituting the IP address with the IP address or hostname of a control node in your cluster:
https://<ip_address>:8443

Compatible browsers include Chrome, Firefox 3.0 and above, Safari (see Browser Compatibility for more information), and Internet Explorer 10 and above.
If a message about the security certificate appears, click Proceed anyway.
Log in with the MapR user name and password that you set during the installation.
To register and apply a license, click Manage Licenses in the upper right corner, and follow the instructions to add a license via the web.
Create separate volumes so you can specify different policies for different subsets of data. See Managing Data with Volumes for more information.
Set up topology so the cluster is rack-aware for optimum replication. See Setting up Node Topology for more information.