Managing data source connections » Creating a data source connection to a Hadoop Distributed File System

Creating a data source connection to a Hadoop Distributed File System

You can use the MicroStrategy Big Data Engine to browse Hadoop files stored in a Hadoop Distributed File System (HDFS). You can follow the steps below to install the MicroStrategy Big Data Engine in your HDFS, and then create a data source connection to your HDFS.

Prerequisite

You must have the Web Import Data privilege.

To install Big Data Engine and create a data source connection to a Hadoop Distributed File System

1 On the machine that hosts your HDFS, using a web browser, navigate to the MicroStrategy download site https://download.microstrategy.com.
2 Locate the MicroStrategy Big Data Engine installation and install the package to your machine.
3 Unzip the installation package, which includes installations for various versions of Cloudera and Hortonworks.
4 The Big Data Engine service can be installed on the NameNode or an edge node. Once you select the appropriate node, create a directory within this node for the Big Data Engine installation. For this example, the directory created is /BDE.
5 Select the installation appropriate for your version of Cloudera or Hortonworks, and copy it to the directory you created for the Big Data Engine.
6 Extract the files from the installation using the following command:

tar -xzf FileName.tar.gz

Where FileName is the name of the installation file.

7 Using MicroStrategy Web, log in to the project in which you want to import data.
8 On any page, click Create on the icon bar on the left, click Create on the icon bar on the left, and then click Access External Data. The Connect to Your Data page opens.
9 Hover your cursor over Hadoop, click Browse Hadoop Files. The Data Source dialog box opens:
10 Provide the following configuration information:
Hadoop Name Node: The host name or IP address for the Hadoop NameNode. The NameNode provides access to your Hadoop file system.  
HDFS Port: The port number for the Hadoop server service. The default port number is 8020.
WebHDFS Port: The port number for the Hadoop WebHDFS service, also referred to as the Hadoop web user interface. The default port number is 50070.
Kerberos Principal: The Kerberos principal for your HDFS, which is required if you have secured your HDFS using Kerberos.
Big Data Engine: The host name or IP address for the MicroStrategy Big Data Engine service installed and configured on your Hadoop system. The Big Data Engine service can be installed on the NameNode or an edge node.
Port: The port number for the service supporting Big Data Engine. The default port number is 30938.
11 Type a user name and password with access to the HDFS in the User and Password fields.
12 Type a name for the data source connection in the Data Source Name field.
13 Determine whether to allow other users to import data using the data source connection by doing one of the following:
To allow other users to import data using the data source connection, select the Share this connection with everybody check box.
To deny other users the ability to import data using the data source connection, clear the Share this connection with everybody check box.
14 Click OK to apply your changes. Your data source connection is created and added to the list.