Wednesday, 13 May 2015

Matrix Parsing - Spiral Way

Problem Statement:

Input:

1    2    3    4
5    6    7    8
9    10  11  12
13  14  15  16

Output:


1, 2, 3, 4, 8, 12, 16, 15, 14, 13, 9, 5, 6, 7, 11, 10

Solution in Java:

public class MatrixSpiralParser {

    public static void main(String[] args) {
        int[][] matrix = new int[][]{ {1,2,3,4}, {5,6,7,8}, {9, 10, 11, 12}, {13,14,15,16}};
        parseSprialWay(matrix);
    }
    
    private static void parseSprialWay(int[][] matrix) {
        int startRow = 0;
        int endRow = matrix.length - 1;
        int startColumn = 0;
        int endColumn = matrix[endRow-1].length -1;
        
        while(startRow <= endRow && startColumn <= endColumn) {
            
            for(int i = startColumn ; i <= endColumn ; i++) {
                System.out.print(matrix[startRow][i] + ", ");
            }
            startRow++;
            
            for(int i= startRow; i<= endRow; i++) {
                System.out.print(matrix[i][endColumn] + ", ");
            }
            endColumn--;
            
            for(int i = endColumn; i >= startColumn ; i--) {
                System.out.print(matrix[endRow][i] + ", ");
            }
            endRow--;
            
            for(int i= endRow; i >= startRow ; i-- ) {
                System.out.print(matrix[i][startColumn] + ", ");
            }
            startColumn++;
        }
        
    }

}

Monday, 7 January 2013

oraoop installation issue

You may face the following issue when oraoop is installed with the command

sudo ./install.sh.


./install.sh: 154: ./install.sh: Syntax error: "(" unexpected


To make it work,

sudo bash ./install

Sqoop integration with hadoop for oracle data import with oraoop

I have been trying to import data from oracle to hadoop using scoop with oraoop.

It took me a couple of days to install the free version of oracle - oracle express edition 11g R2.
Then i installed CDH sqoop and tried to integrate the already running pache hadoop.

I found that oraoop is used correctly but i face the following issue on import. I also tried with apache sqoop with apache hadoop but still faced the following issue.
The web search suggested to use CDH hadoop as well instead of apache Hadoop.


Exception in thread "main" java.lang.IncompatibleClassChangeError: Found class org.apache.hadoop.mapreduce.JobContext, but interface was expected
        at com.quest.oraoop.OraOopDataDrivenDBInputFormat.getDesiredNumberOfMappers(OraOopDataDrivenDBInputFormat.java:201)
        at com.quest.oraoop.OraOopDataDrivenDBInputFormat.getSplits(OraOopDataDrivenDBInputFormat.java:51)

To summarize

CDH sqoop + Apache Hadoop - Data import failed with the above exception
Apache Sqoop + Apache hadoop - Data import failed with the above exception
CDH Sqoop +CDH Hadoop - Hope it works!!

I am still working on the  last combination. Will Keep you posted.

Bye for now.

Tuesday, 18 December 2012

Kettle Integration for BigData - My first ETL

I got introduced to Pentaho Kettle very recently and immediately excited to get my hands dirty. I have been playing around with Hadoop eco-system for quite a while now and Pentaho for BigData drew my attention.

It took me quite a while to get a hang of the conepts (well not that long, its 2 days!!!).

I already had Hadoop-1.1.1 cluster running and my job now is to integrate Kettle with the already running Hadoop.  The Spoon UI that is bundled with Kettle helps to design jobs and transformations. I had real hard times in getting the spoon UI opened on my AWS EC2 Ubuntu instance. Well, i do not want to talk about those issues here and its still not solved :-(. It may be worth a separate post once i have the solution.
In short, i suspect that it could be video graphics driver issue.

Spoon would have taken care of my need end to end - from designing the jobs and transformations to running them. But unfortunately the ubuntu issue forced me to use Spoon on windows and use the generated kjb and ktr files on Ubuntu. Well, Kettle comes with very useful scripts to run jobs and transformation (pan and kitchen respectively). Cool, atleast i could integrate kettle with HDFS and Hive successfully.

Using Spoon from windows has its caveats. Certain design steps would try to connect to the running instance of hadoop, hbase etc, which in my case is not possible as my windows PC reside in the private network.

After 2-3 days of struggle, i am atleast happy that few things worked.


HBase Master startup issue

I had tough time trying to solve an issue that occured when HBase started. The HBase master failed to start with an error "host name cant be null".

HBase was earlier started with a 2 node cluster (2 region servers) pointing to a specific HDFS folder. When data was inserted earlier, it keeps reference to the hostnames in the cluster.

Now when i start the Hbase now with just one node, it tries to look at the data that was earlier there and failed to see one of the nodes, which is not used now.

The solution is to delete the HDFS folder that was earlier used (if you do not mind the loss of data).
I restarted the HBase after deleting this folder and it worked fine.

Another option would be to make HBase point to a different HDFS folder (hbase-site.xml).

Hope this helps!!

Friday, 14 December 2012

Sqoop-Hadoop integration


When sqoop is integrated with the already running hadoop cluster, you might face several issues including the following. I faced these issues when i tried to import the data from mysql to my hadoop instance.

Keep these points in mind
1. Install JDK, JRE alone is not enough.

2. JDBC driver for the corresponding database from where the data to be imported. Copy the driver jar to $SCOOP_HOME\lib

3. The hadoop jar that is bundled with sqoop may not be compatible with your hadoop cluster. Replace the hadoop jat bundled in scoop with the hadoop core from your hadoop installation.

4. Make sure that sqoop is using the right hadoop installation. If not then you may have to tweak $SCOOP_HOME/usr/lib/sqoop/bin/configure_sqoop file.

Exception due to point 3 and 4
ERROR tool.ImportTool: Encountered IOException running import job: org.apache.hadoop.ipc.RPC$VersionMismatch: Protocol org.apache.hadoop.hdfs.protocol.ClientProtocol version mismatch. (client = 63, server = 61)
        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:403)

Thursday, 13 December 2012

Ubuntu mysql installation



MySQL Server Installation

sudo apt-get install mysql-server

This would prompt for a root user password.

To verify the installation:

sudo netstat -tap | grep mysq

tcp        0      0 ip6-localhost:mysql     *:*                     LISTEN      14731/mysqld

Then type,
sudo mysql -u root -p

It prompts for the password after which should take you to the mysql shell.

mysql> CREATE DATABASE sqoopDB;

mysql> USE sqoopDB;

mysql> CREATE TABLE sample (name VARCHAR(10), age VARCHAR(10));

mysql> DESCRIBE sample;

mysql>SHOW TABLES;

mysql> INSERT INTO sample VALUES ('COGN1', '25')

Thats it.