Saturday, November 16, 2013

Install Liferay over a MySQL Database

With this post I will share how to get started with Liferay Portal including initial configurations and login.

Environment: Linux

Pre-requisites :

Let's download the Liferay pack from here.

I got the Community Edition of bundle with Tomcat.

Extract it to a folder of your choice, let's call the extracted folder LR_HOME. Ok, the resources are ready.. Let's go..


Login to MySQL. If you just installed following command will do.

mysql -uroot -p

Enter the password (default root).

Create database to be used for Liferay.

create database lportal;

Create user for Liferay.
create user 'lr_user'@'localhost' identified by 'user123';

Give access to the created database for this user.
grant all privileges on lportal.* to 'lr_user'@'localhost' with grant option;

Mysql work is over now. Let's go to Liferay.


Go to LR_HOME.
Inside LR_HOME create file with the name ''. This is an option provided by the Liferay architecture to override default configurations.
Copy the following content to the file.









Fill the values according to the created database.

Go inside the tomcat folder in LR_HOME and issue the following command to start it.

This will trigger the start and to see how it goes issue the following command to observe the logs.
tail -f logs/catalina.out

This will take some time at start up, since it is creating the databases. It will finally print server started in ?? ms and a new window will be automatically open in Firefox for 'http://localhost:8080/'. This is the Liferay Portal welcome page.

Login as the default admin '' with password 'test', this will get us through wizard and allow us to change the default password.

Now we are in the portal.....

Saturday, September 21, 2013

How to Write a Custom User Store Manager - WSO2 Identity Server 4.5.0

With this post I will be demonstrating writing a simple custom user store manager for WSO2 Carbon and specifically in WSO2 Identity Server 4.5.0 which is released recently. The Content is as follows,
  1. Use case
  2. Writing the custom User Store Manager
  3. Configuration in Identity Server
You can download the sample here.

Use Case

By default WSO2 Carbon has four implementations of User Store Managers as follows.
  • org.wso2.carbon.user.core.jdbc.JDBCUserStoreManager
  • org.wso2.carbon.user.core.ldap.ReadOnlyLDAPUserStoreManager
  • org.wso2.carbon.user.core.ldap.ReadWriteLDAPUserStoreManager
  • org.wso2.carbon.user.core.ldap.ActiveDirectoryLDAPUserStoreManager
Let's look at a scenario where a company has a simple user store where they have kept customer_id, customer_name and the password (For the moment let's not worry about salting etc. as purpose is to demonstrate getting a custom user store into action). So the company may want to keep this as it is, as there may be other services depending on this and still wanting to have identities managed. Obviously it's not a good practice to duplicate these sensitive data to another database to be used by the Identity Server as then the cost of securing both databases is high and can guide to conflicts. That is where a custom User Store Manager comes handy, with the high extendibility of Carbon platform.

So this is the scenario I am to demonstrate with only basic authentication.

We have the following user store which is currently in use at the company.
             PASSWORD VARCHAR(255) NOT NULL,


I have only two entries in user store. :) Now what we want is to let these already available users to be visible to Identity Server, nothing less, nothing more. So it's only the basic authentication that User Store Manager should support, according to this scenario.

Writing the custom User Store Manager

There are just 3 things to adhere when we writing the User Store Manager and the rest will be done for us.

  • Implement the 'org.wso2.carbon.user.api.UserStoreManager' interface
There are several other options to do this, by implementing 'org.wso2.carbon.user.core.UserStoreManager' interface or extending 'org.wso2.carbon.user.core.common.AbstractUserStoreManager' class, as appropriate. In this case as we are dealing with a JDBC User Store, best option is to extend the existing JDBCUserStoreManager class and override the methods as required.
CustomUserStoreManager extends JDBCUserStoreManager 

    public boolean doAuthenticate(String userName, Object credential) throws UserStoreException {

        if (CarbonConstants.REGISTRY_ANONNYMOUS_USERNAME.equals(userName)) {
            log.error("Anonymous user trying to login");
            return false;

        Connection dbConnection = null;
        ResultSet rs = null;
        PreparedStatement prepStmt = null;
        String sqlstmt = null;
        String password = (String) credential;
        boolean isAuthed = false;

        try {
            dbConnection = getDBConnection();
            sqlstmt = realmConfig.getUserStoreProperty(JDBCRealmConstants.SELECT_USER);

            prepStmt = dbConnection.prepareStatement(sqlstmt);
            prepStmt.setString(1, userName);

            rs = prepStmt.executeQuery();

            if ( {
                String storedPassword = rs.getString("PASSWORD");
                if ((storedPassword != null) && (storedPassword.trim().equals(password))) {
                    isAuthed = true;
        } catch (SQLException e) {
            throw new UserStoreException("Authentication Failure. Using sql :" + sqlstmt);
        } finally {
            DatabaseUtil.closeAllConnections(dbConnection, rs, prepStmt);

        if (log.isDebugEnabled()) {
            log.debug("User " + userName + " login attempt. Login success :: " + isAuthed);

        return isAuthed;


  • Register Custom User Store Manager in OSGI framework
This is just simple step to make sure new custom user store manager is available through OSGI framework. With this step the configuration of new user store manager becomes so easy with the UI in later steps. We just need to place following class inside the project.

 * @scr.component name="" immediate=true
 * @scr.reference name="user.realmservice.default"
 * interface="org.wso2.carbon.user.core.service.RealmService"
 * cardinality="1..1" policy="dynamic" bind="setRealmService"
 * unbind="unsetRealmService"
public class CustomUserStoreMgtDSComponent {
    private static Log log = LogFactory.getLog(CustomUserStoreMgtDSComponent.class);
    private static RealmService realmService;

    protected void activate(ComponentContext ctxt) {

        CustomUserStoreManager customUserStoreManager = new CustomUserStoreManager();
        ctxt.getBundleContext().registerService(UserStoreManager.class.getName(), customUserStoreManager, null);"CustomUserStoreManager bundle activated successfully..");

    protected void deactivate(ComponentContext ctxt) {
        if (log.isDebugEnabled()) {
            log.debug("Custom User Store Manager is deactivated ");

    protected void setRealmService(RealmService rlmService) {
          realmService = rlmService;

    protected void unsetRealmService(RealmService realmService) {
        realmService = null;

  • Define the Properties Required for the User Store Manager
There needs to be this method 'getDefaultProperties()' as follows. The required properties are mentioned in the class 'CustomUserStoreConstants'. In the downloaded sample it can be clearly seen how this is used.
    public org.wso2.carbon.user.api.Properties getDefaultUserStoreProperties(){
        Properties properties = new Properties();
                (new Property[CustomUserStoreConstants.CUSTOM_UM_MANDATORY_PROPERTIES.size()]));
                (new Property[CustomUserStoreConstants.CUSTOM_UM_OPTIONAL_PROPERTIES.size()]));
                (new Property[CustomUserStoreConstants.CUSTOM_UM_ADVANCED_PROPERTIES.size()]));
        return properties;

The advanced properties carries the required SQL statements for the user store, written according to the custom schema of our user store.
Now all set to go. You can build the project with your customization to the sample project or just use the jar in the target. Drop the jar inside CARBON_HOME/repository/components/dropins and drop mysql-connector-java-<>.jar inside CARBON_HOME/repository/components/lib. Start the server with ./ from CARBON_HOME/bin. In the start-up logs you will see following log printed.

INFO {} -  CustomUserStoreManager bundle activated successfully.

Configuration in Identity Server

In the management console try to add a new user store as follows.
 In the shown space we will see our custom user store manager given as an options to use as the implementation class, as we registered this before in OSGI framework. Select it and fill the properties according to the user store.

Also in the property space we will now see the properties we defined in the constants class as below.
If our schema changes at any time we can edit it here in dynamic manner. Once finished we will have to wait a moment and after refreshing we will see the newly added user store domain, here I have named it ''. 
So let's verify whether the user are there. Go to 'Users and Roles' and in Users table we will now see the users details who were there in the custom user store as below.

 If we check the roles these users are assigned to Internal/everyone role. Modify the role permission to have 'login' allowed. Now if any of the above two users tried to login with correct credentials they are allowed.
So we have successfully configured Identity Server to use our Custom User Store without much hassel.



Note: For the updated sample for Identity Server - 5.0.0, please use the link,


Sunday, September 08, 2013

WSO2 Identity Server 4.5.0 - User Store Management

In this post we will be going through the high level view of user management in WSO2 Carbon Products from Kernel 4.2.0 on. Specifically in WSO2 IS 4.5.0 which is based on this Kernel. These versions are armed with the capability to configure user stores at run time. 

Org.wso2.carbon.user.core is the  OSGI component responsible for handling users in Carbon products. There we have the concept of 'User Realm' which is a collection of users with attributes. It consists of following four aspects,
  • Use store management
  • Authorization Management
  • Claim management
  • Profile configuration management
You can get a clear picture of these 4 aspects from this blog, . Here we will see into the improvements done in User Store Management aspects with the newly released version. It provides the capability to configure user stores at run time, even in a clustered mode as described in this previous post by myself, using a convenient UI. Following diagram shows how it happening.

In the implementation, once we enter the user name and password, those are sent to User Store Manager(Taken from User Realm of the Tenant according to the user name) to authenticate. 
  • If user is as 'user1' user realm of super tenant is used. If user is as '', User realm of tenant '' is used. In any case the flow is same that, first Primary user store manager checks for a matching user with same credentials.
  • If user name is correct, but password is wrong still it will not issue a decision on authentication, but continue to check. If there is a matching user it will return and go for the next step in the flow which is authorization. 
  • If a matching user is not found in primary user store, then the secondary user store manager is used and it will look in secondary user store whether a matching user exists. Like wise this will go till the end of user store manager chain. at this point the user is not authenticated and will provide the info that authentication is failed for provided credentials.
This procedure not new in the latest version.  Plugging user store managers at run time is what is new. For this we have a UI which can be used to create the configuration file. Once this user store management configuration file is dropped into relevant folder 'Deployment Manager' is triggered and it updates the chain accordingly.
For super tenant files goes to,
For a general tenant files goes to,

How Configurations are Populated in Cluster

The user store configurations are populated in a cluster using 'SVN based deployment synchronizer' component of WSO2 Carbon. Once this is correctly enabled in the cluster, the modifications we do in the primary node are committed to the SVN repo. Once the committing is done this node sends a cluster message so that other nodes can check it out from the svn repo. Then the modifications are checked out to the relevant folders. So with this modification, the 'Deployment Manager' of each node is triggered and flow in a single node will start.

You can try this out following this post.


Cluster mode - User Store Management Configuration at Run Time

We can even simply try this out with following simple steps in WSO2 Identity Server.

In the extracted pack go to, 
  • CARBON_HOME/repository/conf/axis2/axis2.xml and enable clustering
<clustering class="org.wso2.carbon.core.clustering.hazelcast.HazelcastClusteringAgent"

  • CARBON_HOME/repository/conf/carbon.xml and set-up deployment synchronizer,


This is our primary node in the cluster. Now take two copies of this extracted folder and change following in carbon.xml,

I changed port offset as I will be running all the server instances in local machine. So port offset is set to 1 in one copy and it is set to 2 in the other copy. Other change is we are only letting the Primary to commit automatically to SVN repo, but not other nodes, hence auto-commit is set to false.

Now let's start all the 3 servers. Once started, follow this post being in Primary node. In a moment we will see the configurations are replicated to other two nodes as well.


Thursday, September 05, 2013

Deploying Identity Server over a JDBC Based User Store

With this post I am to demonstrate how to configure WSO2 Identity Server with a JDBC user store. For the demonstration I am using a MySQL user store, but same procedure applies to any other JDBC user store as well.
My environment is,
OS - Ubuntu 12.10
Java - 1.6
WSO2 IS 4.5.0
  1. Setting up MySQL database
  2. User Store Configuration in IS - Primary
  3. User Store Configuration in IS - Secondary
(I am referring to extracted wso2is folder as CARBON_HOME in this post)

Setting up MySQL database

We need MySQL running at first. This post will be helpful in setting up the MySQL database, if it's not already done. Once MySQL is running we have to set up the database as required by the Identity Server. The server packs the necessary sql scripts within itself, which can be located at CARBON_HOME/dbscripts. 

Let's login to MySQL server and execute the following,
Create a database,
mysql> create database JDBC_demo_user_store;
Check out the creation,
mysql> show databases; 
Then use the sql script and set up the database,
mysql> use JDBC_demo_user_store;
mysql> source <path_to>/wso2is-4.5.0/dbscripts/mysql.sql; 
This will run the queries in the SQL scripts and set up the required tables.
Now if we enter the commands following outputs will be shown.
mysql> show tables;

Now we are done with setting up the database. We can go ahead and ask Identiy Server to use it.

Note: Before going into the following steps we also need to add the mysql-jdbc connector to Identity Server. You can download it from here and drop it into CARBON_HOME/repository/components/lib.

User Store Configuration in IS - Primary 

Identity Server uses embedded H2 database to keep permission details etc. and the data source details of it resides in CARBON_HOME/repository/conf/datasources/master-datasources.xml. We can add data-source details of our new JDBC user store here as well. Here is the master-datasources.xml file according to my set-up.

            <description>The datasource used for JDBC_demo_user_store</description>
            <definition type="RDBMS">
                    <validationQuery>SELECT 1</validationQuery>
The  Primary configuration for user store resides at CARBON_HOME/repository/conf/user-mgt.xml file. By default this is pointing to a embedded ReadLDAPUserStoreManager. Now we are to change it to be a JDBCUserStoreManager. So let's comment out the default one and uncomment JDBCUserStoreManager. Now we will have a user-mgt.xml file similar to this, with the <Property name="dataSource"></Property>  property being set to what is given at datasource. If we want, we can modify these properties as we want, according to the context.

Now the configurations are over. Let's start the server with bin/ Once started if we go ahead and add user to the 'Primary' domain.

Now if we go and check the UM_USER table created in our database, it will list user as well.

User Store Configuration in IS - Secondary

Now let's see how we can use that same MySQL user store as a secondary user store in IS. This is pretty easy that we can do the whole thing via UI, without any modification to the above default configurations in master-datasources.xml or user-mgt.xml. We have to add driver name, URL, user name and password here as mandatory properties which we previously gave at master-datasources.xml.

Once added it will be shown in the available user stores list. It intuitive to define a user store manager in UI, but if you want more details, you can refer this post. If we want we can also edit the optional properties too. The advanced section carries the SQL statements required for JDBC user store manager.

Advanced Option: If we are editing database structure(sql script), we need to update these SQL queries according to that schema, using this Advanced option.

Now if we go and try add a new user, we will see this secondary domain as well.

We can see the users getting added in the database as same as it was in the Primary user store, if we select this domain and add the users.


Wednesday, September 04, 2013

Getting Started with MySQL

This is a simple beginners guide to use MySQL in linux, from installation to querying the databases.
  1. Installation
  2. Login
  3. Databases and tables


First let's make sure our package management tools are up-to date. For that run the following commands in command line.

sudo apt-get update
sudo apt-get dist-upgrade
Once it finishes update and upgrading, we can install MySQL with following command.
sudo apt-get install mysql-server mysql-client
This will take a moment to install and now we are ready to go. 


At first start up MySQL server is not set up with a password for root and we can login with,
mysql -u root -p 
If we are setting the password for the first time we can use following to set-up,
mysqladmin -u root -p NEWPASSWORD   
If we want to change a previously set password following command can be used,
mysqladmin -u root -p'oldpassword' password newpassword

Databases and Tables

First we should login to MySQL server with,
mysql -uroot -p<password>
Then it will point us to mysql console as follows where we can run queries,
To see the available databases,
show databases;

To see the available tables inside a database,
use <database_name>;
show tables;
To see the field formats of the table,
describe <table_name>; 
To delete database/tables,
drop database if exists <database_name>  (for a table use table name)
Just like that, we can also run SQL queries like "SELECT * FROM user;" which will print the result in console.

If we have an .sql script to set up the databases or tables, we can just run it with,

source <path_to_script_file>;

If we want to import a large database then following is recommanded,
mysql -u root -p database_name < database_dump.sql

Monday, September 02, 2013

Implemention of Support for Mutiple User Store Configuration at Run Time (A touch on the Beauty of WSO2 Carbon Architecture)

As I have shared in the previous post, WSO2 IS 4.5.0 version is released with added support for dynamic configuration of multiple user stores. While implementing this piece of component, I could touch some beautiful areas of WSO2 Carbon architecture,  which is known to be inherently dynamic and flexible. With this post I am to list those characteristics of Carbon platform that came handy in this implementation.The content of this post is,

  1. How dynamic User Store Configuration happens
  2. Carbon characteristics that facilitated rapid development

How dynamic User Store Configuration happens

Following figure highlights the flow of a new user store configuration.

  • The super admin or tenant admin can add user stores through the UI, to own domain. We have allowed dynamic configurations only for secondary user stores and 'Primary' user store is not configurable at run time. This was because it is available for all tenants and allowing changes to it's configuration at run time can guide to instability of system. With this limitation we have been able to keep the design simple and avoid some crucial run time complexities that may have occurred. (eg: Primary user store keeps super admin data used to sign in and if super admin himself changes the configuration of Primary user store, the status of the system in between, is not stable.) So the Primary is treated as a static property in the implementation that is a basic requirement to run the system properly.
  • These secondary user stores can be added in two ways, via the UI or directly dropping it in the corresponding location as shown in the figure. Anyway the recommended way is to use from the above is, using the UI, as it will mostly avoid us from putting wrong configuration files and guide us to do it correct. If we are providing it writing xml manually following factors needs to fulfilled. 
    1. The domain name should match the file name ( --> wso2_com.xml)
    2. All mandatory fields required by the User Store Manager implementation should be provided as properties. (UI it self has guidance for this or we can refer the documentation of WSO2 IS.)
  • If the configuration files are added though the UI, they will be saved in the locations given, according to the tenant. Now on, the deployer (an Axis2 custom deployers) will take care of it and update everything accordingly. The deployer is polling the 'userstores' folder to detect any changes and as soon as it is aware of an event, required update is called. The deployer will detect the changes with an upper limit of 15 seconds which means there will be a little delay to see the updates in the UI. Refreshing the page after this moment is wait will make the changes visible in the UI.
So what happens inside, after deployer detects the modification?

  • According to the done modification(add/delete/enable/disable/edit), the deployer will identify the events(Using the details of the modified file and whether it's deploy or undeploy). Then it lets the deployement manager to update the User Store Manager chain according the modification. If it's an addition the new user store manager will be added at the end of the chain. If it's a delete, the chain will be broken at the point and tail part will be added to head. Other modification will be effective in the chain as it is, without any effect to the order. 
  • So when a user comes and submit his/her credentials the authenticator goes through the chain and check for a matching user in own chain of tenant. If it's found in the user chain with matching credentials then the assigned roles are checked for authorization, which allows users to perform permitted actions. 

Carbon characteristics that facilitated rapid development 

  • Clear separation of Front end/Back end (SOA interfaces)
With this clear separation it was easy to re-use what has been already implemented that are also useful in this implementation. Also regarding this component, could see a clear separation of UI and back end that simplified the design. UI used the stub classes to talk to the backend component which talked to other back-end components and delivered the output to UI. So if we want to just consume the API at sometime, it is already available
  • Out the box support for multi-tenancy
After writing the component for Super tenant there was no more significant effort to make it work in a multi-tenanted environment. Once it runs fine for super tenant, it is running fine in multi-tenant environment too. So this can be available be run anywhere, cloud or on-premise without a single modification.
  • Clustering
How will this work in a cluster is our next question. So all these needed to be replicated in all the nodes. Then there is WSO2 Carbon feature that can be used to synchronize the nodes, 'Deployment Synchronizer' which has options to be based on SVN or registry. In the this implementation we used SVN based deployment synchronizer and target was achieved. Not a single line of code needed for this.
  • Extendibility
WSO2 Carbon is also called 'Eclipse for Servers' with it's high extendibility with the OSGI run time. This comes very useful regarding this component that after sometime if we wanted to add our own custom user store manager implementation like for Apache Cassandra it is also possible without any hassle. We just have to implement the provided UserStoreManager interface or extend the AbstractUserStoreManager class, write the customized code and pack it into a bundle (sample). Once this is dropped into CARBON_HOME/repository/components/dropins the run time will detect it at start-up and even show it in the UI for configurations.

Also when it was needed to detect the changes of configuration files, there was this custom deployer support inherited from Axis2 that made the life easier. We just had to extend the provided implementation pointing to the folder path to poll, write what we want do at new file addition(deploy)/file deletion(undeploy) and it was doing the job. 

Finally, it was so nice to see that all these components inter-operate with each other to satisfy the requirement in the exact expected way.


Tuesday, August 27, 2013

WSO2 Identity Server 4.5.0 brings support for dynamic configuration of multiple user stores

WSO2 Identity Server 4.5.0 is going to be released by the end of this month, with a new set of features and lot of improvements to the existed features. Allowing dynamic configuration of multiple user stores is one such new addition in this release, that improves the flexibility of the server to cater changes in the production environments.

This feature comes in handy when a requirement occurs to add more user stores, change some attributes of an existing user store or remove a user stores from the existing ones. With this new user store configuration UI, above scenarios can be executed smoothly and burden of editing XML files is taken away. This operates in dynamic manner that no server restart is required and changes are effective in few seconds.

Here are few screen-shots of the UI to be released.
The default view of user store configuration UI(Configure>User Store Management) will be as follows, when there are no secondary user stores added. (we will refer all the user stores added other than the 'Primary' as secondary.) 

Now if we hit on 'Add Secondary User Store' following form will appear allowing us to define the properties of the new user store. First we need to select the implementation class of the user store manager we are to use. By default WSO2 products comes with 4 user store manager implementations,
  1. org.wso2.carbon.user.core.jdbc.JDBCUserStoreManager
  2. org.wso2.carbon.user.core.ldap.ReadOnlyLDAPUserStoreManager
  3. org.wso2.carbon.user.core.ldap.ReadWriteLDAPUserStoreManager
  4. org.wso2.carbon.user.core.ldap.ActiveDirectoryLDAPUserStoreManager

If we select the JDBCUserStoreManager, the above form will appear. We should give a domain name for the user store and fill the mandatory property with data source name, which needs to be there already(which can be added from the UI itself). We can have look at the optional properties fill if them only we wish too. Then there advance properties, that can be left as it is, if we are using the default schema of WSO2 Identity Server. These advanced properties carries SQL statements related to queries, so those need updates only if the schema is customized.

Once the mandatory fields are filled, we can go ahead and add the user store manager. Then it will be shown in the table of available user store managers with the following information.

This message just ask us to wait a moment, until the new configuration are taken effect getting engaged in the current chain of user store managers.

Note: It does not mean that user store manager has been added successfully. We have to refresh the page in few seconds the check the status, as if there is some thing wrong in the given properties the user store manager will not be engaged correctly in the back-end.

The sole purpose of user stores is to keep users. So we should be able to add users if the user stores added correctly. We can check this by trying to add a new user to the system. Now if we list the domains in the user addition form, newly added user store domain should appear, if it is correctly added. If it is not something has gone wrong with the created user store manager and we have to re-check the given properties, specially the connectionURL.

Following is another user store manager implementation available, which is in LDAP category. It does not have any advanced properties relevant to it.

One more extension point to multiple user stores is we can plug our own custom user store managers to implementation. for example if we decided to move to a Cassandra user store later when our business grows up, it is also possible to write a CassandraUserStoreManager and plug it into the server. It just simply extending a provided interface, packing the implementation to a jar as an osgi bundle and dropping it inside the server. This is such a sample custom user manager written.

So if we select the CustomUserStoreManager as the implementation, the UI will accordingly ask for the relevant properties to be defined.

Thursday, August 08, 2013

How to Install Windows in a Virtual Machine in Linux(Ubuntu)

Recently I wanted to run Windows inside my Primary OS -  Ubuntu 12.04. I didn't want to go for dual boot option so I decided to use a virtual machine and install Windows inside Ubuntu. I faced couple of problems and here share my experience with how I overcame them.

In choosing a Virtual machine I tried out 2 options. Oracle VM VirtualBox and VMware Player that is available in Ubuntu Software Centre. I could create a virtual machine successfully hitting 'new' button and following the wizard. Then I started the created machine, pointing to the CD image. It tried to load Windows files and then printed the following error on screen.

Attempting to load a 64-bit application, This CPU is not compatible with 64-bit mode.

But my machine has a 64 bit processor. So something has gone wrong. The issue has been my BIOS settings in the host machine has not enabled hardware virtualization. So I manually did it. My machine was a Lenovo ThinkPad T530 and could find this setting under security tab of BIOS as virtualization. So I enabled it there and tried again. This error was fixed.

So I moved forward with VirtualBox. Oh here came another one.

Error status 0xc0000225

This was simpler to fix. Just went to,
VirtualBox>Settings>System>Motherboard and under Extended Features, ticked 'Enable IO APIC'. This has enabled Advanced Programmable Interrupt Controllers which has been requirement for Windows installation.

Cheers, now it is installed!

Wednesday, July 03, 2013

Hadoop Multi Node Set Up

   With this post I am hoping to share the procedure to set up Apache Hadoop in multi node and is a continuation of the post, Hadoop Single Node Set-up. The given steps are to set up a two node cluster which can be then expanded to more nodes according to the volume of data. The unique capabilities of Hadoop can be well observed when performing on a BIG volume of data in a multi node cluster of commodity hardware.
   It will be useful to have a general idea on the HDFS(Hadoop Distributed File System) architecture which is the default data storage for Hadoop, before proceed to the set up, that we can well understand the steps we are following and what is happening at execution. In brief, it is a master-slave architecture where master act as the NameNode which manages file system namespace and slaves act as the DataNodes which manage the storage of each node. Also there are JobTrackers which are master nodes and TaskTrackers which are slave nodes.
   This Hadoop document includes the details for setting up Hadoop in a cluster, in brief. I am here sharing a detailed guidance for the set up process with the following line up.
  • Pre-requisites
  • Hadoop Configurations
  • Running the Multi Node Cluster
  • Running a Map Reduce Job


   The following steps are to set up a Hadoop cluster with two linux machines. Before proceed to the cluster it is convenient if both the machines have already set-up for the single node, that we can quickly go for the cluster with minimum modifications and less hassle. 
   It is recommended that we follow the same paths and installation locations in each machine, when setting up the single node cluster. This will make our lives easy in installation as well as in catching up any problems later at execution. For example if we follow same paths and installation in each machine (i.e. hpuser/hadoop/), we can just follow all the steps in single-node set-up procedure for one machine and if the folder is copied to other machines, no modification is needed to the paths.

Single Node Set-up in Each Machine


  Obviously the two nodes needs to be networked that they can communicate with each other. We can connect them via a network cable or any other options. For the proceedings we just need the IP addresses of the two machines in the established connection. I have selected as the master machine and as a slave. Then we need to add these in '/etc/hosts' file of each machine as follows.
 192.168 .  0.1     master
  192.168 .  0.2     slave 

Note: The addition of more slaves should be updated here in each machine using unique names for slaves (eg: slave01, slave02).

Enable SSH Access

   We did this step in single node set up for each machine to create a secured channel between the localhost and hpuser. Now we need to make the hpuser in master, is capable of connecting to the hpuser account in slave via a password-less SSH login. We can do this by adding the public SSH key of hpuser in master to the authorized_keys of hpuser in slave. Following command from hpuser at master will do the work .
hpuser@master :~ $ ssh - copy - id  - i $HOME /. ssh / id_rsa . pub hpuser@slave 
Note: If more slaves are present this needs to be repeated for them. This will prompt for the password of hpuser of slave and once given we are done. To test we can try to connect from master to master and master to slave as per our requirement as follows.

hpuser@master :~ $ ssh slave
 The  authenticity of host  'slave ('  can 't be established.
RSA key fingerprint is ............................................................
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added ' slave  (  192.168 .  0.2 ) ' (RSA) to the list of known hosts.
hpuser@slave' s password :  
 Welcome  to  Ubuntu    11.10   ( GNU / Linux    3.0 .  0 -  12 - generic i686 ) 
If a similar kind of output is given for 'ssh master' we can proceed to next steps.

Hadoop Configurations

We have to do the following modifications in the configuration files.

In master machine

1. conf/masters
This file is defining in which nodes are the secondary NameNodes are starting, when bin/ is run. The duty of secondary NameNode is to merge the edit logs periodically and keeping the edit log size within a limit.

2. conf/slaves

   This file lists the hosts that act as slaves processing and storing data. As we are just having two nodes we are using the storage of master too.
Note: If more slaves are present those should be listed in this file of all the machines.

In all machines

1. conf/core-site.xml

   <name> </name> 
   <value> hdfs://master:54310 </value> 
   <description> .....  </description> 
We are changing the 'localhost' to master as we can now specifically mention to use master as NameNode.

2. conf/mapred-site.xml
   <name> mapred.job.tracker </name> 
   <value> master:54311 </value> 
   <description> The host and port that the MapReduce job tracker runs
  at.  </description> 
We are changing the 'localhost' to master as we can now specifically mention to use master as JobTracker.

3. conf/hdfs-site.xml
   <name> dfs.replication </name> 
   <value> 2 </value> 
   <description> Default number of block replications.
It is recommended to keep the replication factor not above the number of nodes. We are here setting it to 2.

Format HDFS from the NameNode

Initially we need to format HDFS as we did in the single node set up too.
hpuser@master :~/ hadoop -  1.0 .  3 $ bin / hadoop namenode  - format

  12 /  11 /  02    23 :  25 :  54  INFO common . Storage :   Storage  directory  / home / hpuser / temp / dfs / name has been successfully formatted . 
  12 /  11 /  02    23 :  25 :  54  INFO namenode . NameNode :  SHUTDOWN_MSG :  
If the output for the command ended up as above we are done with formatting the file system and ready to run the cluster.

Running the Multi Node Cluster

   Starting the cluster is done in an order that first starts the HDFS daemons(NameNode, Datanode) and then the Map-reduce daemons(JobTracker, TaskTracker). 
Also it is worth to notice that we can observe what is going on in slaves when we run commands in master from the logs directory inside the HADOOP_HOME of the slaves.

1. Start HDFS daemons - bin/ in master

 hpuser@master :~/ hadoop -  1.0 .  3  $ bin / start - dfs . sh
starting namenode ,  logging to  ../ bin /../ logs / hadoop - hpuser - namenode - master . out 
slave :   Ubuntu    11.10 
slave :  starting datanode ,  logging to  .../ bin /../ logs / hadoop - hpuser - datanode - slave . out 
master :  starting datanode ,  logging to  ..../ bin /../ logs / hadoop - hpuser - datanode - master . out 
master :  starting secondarynamenode ,  logging to  .../ bin /../ logs / hadoop - hpuser - secondarynamenode - master . out 
This will get the HDFS up with NameNode and DataNodes listed in conf/slaves.
At this moment, java processes running on master and slaves will be as follows.

hpuser@master :~/ hadoop -  1.0 .  3 $ jps
  5799   NameNode 
  6614   Jps 
  5980   DataNode 
  6177   SecondaryNameNode 
hpuser@slave :~/ hadoop -  1.0 .  3 $ jps
  6183   DataNode 
  5916   Jps 
2. Start Map-reduce daemons - bin/ in master

hpuser@master :~/ hadoop -  1.0 .  3 $ bin / start - mapred . sh
starting jobtracker ,  logging to  .../ bin /../ logs / hadoop - hpuser - jobtracker - master . out 
slave :   Ubuntu    11.10 
slave :  starting tasktracker ,  logging to  ../ bin /../ logs / hadoop - hpuser - tasktracker - slave . out 
master :  starting tasktracker ,  logging to  .../ bin /../ logs / hadoop - hpuser - tasktracker - master . out 
Now jps at master will show up TaskTracker and JobTracker as running Java processes in addition to the previously observed processes. At slaves jps will additionally show TaskTracker.
   Stopping the cluster is done in the reverse order as of the start. So first Map-reduce daemons are stopped with bin/ in master and then bin/ should be executed from the master.
Now we know how to start and stop a Hadoop multi node cluster. Now it's time to get some work done.

Running a Map-reduce Job

   This is identical to the steps followed in single node set-up, but we can use a much larger volume of data as inputs as we are running in a cluster.  

hpuser@master :~/ hadoop -  1.0 .  3 $ bin / hadoop jar hadoop * examples *. jar wordcount  / user / hpuser / testHadoop  / user / hpuser / testHadoop - output 
   The above command give a similar output as of single node set-up. In addition we can observe how each slave have completed mapped tasks and reduced the final results from the logs.
Now we have executed a map-reduce job in a Hadoop multi-node cluster. Cheers!
  The above steps are just the basics. When setting up a cluster for a real scenario, there are several fine tuning that needs to be done, considering the cluster hardware, data size etc. and the best practices to be followed that were not stressed in this post.