Monday, August 12, 2013

Notes from NoSQL meetup in ATLSWC

Some notes from "Overview of NoSQL Database systems" hosted by Thoughtworks, Atlanta.

Presenter: Noah Kriegel

Starters: Awesome pizza and drinks.

Presentation gist:

  • Beautiful prezi layout (check below image - Courtesy: James Brechtel).

  • Most of the content is inspired from NoSQL Distilled book, but Noah did a good job in extracting the necessary stuff and supplementing it with his own experience.

Why we love RDBMS

  • ACID compliant
  • Standard, Expressive and Powerful SQL.

Why the hate?

  •  Challenges with data replication and multi node setup resulting in expensive infrastructure.
  •  Application object and db data type mismatch. Tools like ActiveRecord and Hibernate plays a major part in overcoming this but has their own pitfalls.

NoSQL movement

  • Inspired by two papers on Google Big Table and Amazon DynamoDB
  • Gain traction with the developers unlike failures like Object databases in the past.

Some important facts to remember

  • CAP theorem
  • Quorum-based voting techniques

NoSQL types

  •  Key Store
  • Document oriented database
  • Column oriented DBMS 
  • Graph databases

Each of the type was discussed on the following terms:

  • Introduction and key feature
  • When to use
  • When not to use
  • Sample code

I'll put some more details on each of these data types in the upcoming posts.

Some key TIL:

  • CQL closely mirrors SQL in its syntax.
  • Redis stands for REmote DIctionary Server.
  • Mongo is a strip-off from the word - humongous

Tuesday, April 9, 2013

Hadoop installation in AWS EC2

I was using the CDH's vm image on my local for a hands-on experience with Hadoop. I thought let's try it out in AWS and see how smooth is the process.

So, I started following the cloudera's blog post on the same. But the blog had a lot of issues and things didn't work as outlined. I received a little more help from this blog.

So, here's the brief setup after creating the instance. As told in both the posts, I used whirr to install the cluster to avoid manual setup.

Step 1: Get the latest whirr binary
Step 2: Setup the whirr config file. You can copy the below contents and update the AWS Access Key ID and Secret Access Key accordingly.
Step 3: Install java
Step 4: Generate public key. I just entered on the first prompt.
Step 5: Launch cluster. Wait till you get the instruction to ssh to the nodes.
Step 6: ssh to the nodes.
Step 7: verify if hadoop installation works. We'll look more on this later.

Sunday, February 10, 2013

Currently reading: NoSQL Distilled

When the rought cut version of "NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot Persistence" was released in Safari books, I dived in and pointed out some comments. Those were totally useless to the authors and the publishers as they were either incomplete sentences or grammatical mistakes which were already caught by the proof readers. However, Martin Fowler was kind enough to respond to each and every comment.

Later on, when the book was released, I made a joke to one of my friends who was looking to buy the book that its better to get the Table of Contents from Amazon website and read the topic in wikipedia and save a few bucks. Last week, I thought I should seriously give it a read. So far, I've completed 3 chapters and have learnt a lot of stuff. Not only did I realize that I made a fool of myself by making that joke but also I'm learning many things a little late.

My upcoming blogs in the near future will definitely reflect a few things that I'm learning from this book.

Thursday, June 21, 2012

An open letter to Thoughtworks Tech Radar Team

Dear Thoughtworks Radar Team,

I attended your Tech Radar webinar this Tuesday. Its a great iniative from your side to come up with something like and sharing it with the rest of the world. I would like to share some of my thoughts on the same.

  • One hour to cover all the quadrants was very less. Because of this, you had to rush through some points and there was very less time for Q&A. I know this is a free webinar and you are putting in a lot of effort and money into imparting your research and learning to the outside world. However, it would be great if you have it as a 4 or 2 part webinar covering either 1 or 2 quadrants respectively.
  • In the techniques section, I feel we need to include 'Application Security as a first class citizen' on par with performance test. A security flaw in an application is the most embarrassing moment for any organization. Most of the times, they are implicit in the user stories. It is the duty of the development team to identify, test and code them as early as possible. I would like to know if you guys are using any tool for this specifically apart from including it in your Functional testing.

On the good side, there were a lot of take away for a developer like me. A lot of new ideas were presented; some of the holds category made a lot of sense after listening and reading through them thereafter. I'm eagerly looking forward for the next edition.

Yours truly,
A Developer still learning.

PS: For others, here's the link to their radar.

Monday, May 28, 2012

Upgrade to Play 2.0? - Not now

What is the first thing for a seasoned Java developer to leave Spring/J2EE and move to something cool like Play? Of course, its easy setup and improvement in productivity. When I tried play 1.x series, I was very happy. After a few months, I recommended and used it for one of our internal projects and also successfully hosted it to heroku. It had matured a lot with new and useful modules.

Then saw the announcement of Play 2.0 which was completely being rewritten in Scala. As every other person, I too was excited about it and the twitter feed was full of praise once its released. And add on top of it, its inclusion in the Typesafe stack that created more buzz. So, with lot of excitement, I wanted to evaluate so that I can upgrade my application to this latest and greatest version.

First, I tried the Play-Scala module with their todoList example. As promised, the setup time was negligibly small and it lived up to the promise. But the real problem started right after that. Any change to your source code and the next page load takes a lot of time. Its definitely a lot better than the old J2EE-RAD setup but very bad when compared with its predecessor. I tried it with the java module and faced with the same issue.

There are some promises on their groups that they are working on improving the performance which is good. The two main issues must be the compile time of the Scala code (this is pure guess considering similar issues in other projects) and new feature that compiles and watches more components. Hopefully, they figure out a solution soon.

Until then, I'm going to stick with Play 1.2.4 for the application that we developed. As of now, we don't have any specific need to upgrade it to the next version. But its a definitely good thing to upgrade the application to incorporate any such things in the future.

Thursday, July 7, 2011

Anti-If Campaign for Cincy Clean Coders

Below is the presentation I used for my talk on Anti-If Campaign for Cincy Clean Coders today.

The source code used in the presentation is available in github.

Thursday, June 23, 2011

Dumping the thread - Websphere Performance Analysis

There are numerous occasions when we land up working in a performance problem be it at a development box or a defect opened from Production or Performance testing environment. While there are good profiling tools to help us get to the root of the issue like JProbe or YourKit, they come with a price tag with them. If there's a constraint on the team budget you wouldn't be able to use these tools. You can still workaround and use the trial version but if you read through the fine prints in the license file, you and your project stakeholders can be in trouble. So, stay out if not officially working on evaluating on these tools.

Though it could be a handicap there are other options that can be of use. In this mail, I'm going to demonstrate how to get the thread dump for your Websphere (WAS) server for performance analysis.

What is a thread dump?
A thread dump is a list of all the Java threads that are currently active in a Java Virtual Machine (JVM). When the jvm receives the signal for the same, it collects all the thread statistics and outputs it to a .txt file.

How do I generate it?
The most reliable way to generate a thread dump in WAS is using wsadmin utility. The steps for the same are as follows:
1.Navigate to the bin directory
cd <was_root>/profiles/<PROFILE_NAME>/bin/

2. Connect to deployment manager using wsadmin script
wsadmin.bat -conntype SOAP -username -password

3. The above command opens a wsadmin prompt. Now set the object variable to be used for generating the dumps
wsadmin> set jvm [$AdminControl completeObjectName type=JVM,process=,node=,*]

4.Run this command:
wsadmin> $AdminControl invoke $jvm dumpThreads

5. If you want to force heap dump, run the following command:
wsadmin> $AdminControl invoke $jvm generateHeapDump

Besides, if you have unix based systems like Linux/Mac, you can generate threaddump by just running the command:
kill -3 <pid>.

use ps -ef | grep java or ps -ef | grep to get the process-id(pid).

If you run WAS in console mode in Windows, Ctrl+Break helps to generate the dump but I have never tried it before.

A sample for generating a thread dump in a machine with node 5184Node01 and hosted locally in port 9443 with user/pass (ADMIN/password) is as follows:
wsadmin.bat localhost 9443 -username ADMIN -password password
The following commands will be run in wsadmin prompt.
set jvm [$AdminControl completeObjectName type=JVM,process=server1,node=5184Node01,*]
$AdminControl invoke $jvm dumpThreads

How to read the logs?
There are several ways to do it. Doing it manually is one of the most painful thing. IBM alphaworks has a cool tool for the same - IBM Thread and Monitor Dump Analyzer for Java shortly called as JCA. The ReadMe html inside the jar and the FAQ section talk a lot about the usage and interpretation of the data. I have used this a lot in the past and it helped us fix a lot of problems.

All said and done, try them in your leisure or when you are dealing with performance problems.