Java World of Sridhar: October 2011

Monday, October 31, 2011

Understanding Hibernate - Part I

A simple pojo class representing a customer

package org.training.hibernate;

import java.io.Serializable;

import java.util.List;

import java.util.Set;

public class Customer implements Serializable {

private int customerID;

private String customerName;

private List<String> customerPhoneNumbers;

private Set<String> customerAddress;

public Customer() {

// TODO Auto-generated constructor stub

}

public Customer(int customerID, String customerName,

List<String> customerPhoneNumbers, Set<String> customerAddress) {

super();

this.customerID = customerID;

this.customerName = customerName;

this.customerPhoneNumbers = customerPhoneNumbers;

this.customerAddress = customerAddress;

}

public int getCustomerID() {

return customerID;

}

public void setCustomerID(int customerID) {

this.customerID = customerID;

}

public String getCustomerName() {

return customerName;

}

public void setCustomerName(String customerName) {

this.customerName = customerName;

}

public List<String> getCustomerPhoneNumbers() {

return customerPhoneNumbers;

}

public void setCustomerPhoneNumbers(List<String> customerPhoneNumbers) {

this.customerPhoneNumbers = customerPhoneNumbers;

}

public Set<String> getCustomerAddress() {

return customerAddress;

}

public void setCustomerAddress(Set<String> customerAddress) {

this.customerAddress = customerAddress;

}

Tuesday, October 4, 2011

Common Maven Error

Most of the time when I work with maven I encounter this error:

Reason: Error getting POM for 'org.apache.maven.plugins:maven-eclipse-plugin' from the repository: Failed to resolve artifact, possibly due to a repository list that is not appropriately equipped for this artifact's metadata.

org.apache.maven.plugins:maven-eclipse-plugin:pom:2.9-SNAPSHOT

As it happens with other maven dependencies their pom.xml file gets somehow modified and the plugin does not work as expected anymore.

The solution here is to navigate to your local repository, delete the dependency and try to download it again. If it still fails (clearly a pom.xml that has not been corrected) then you need to edit the metadata file. In my case:

/Users/sridhar/.m2/repository/org/apache/maven/plugins/maven-eclipse-plugin/maven-metadata-central.xml

You will notice the "latest node", for example:

<?xml version="1.0" encoding="UTF-8"?>

<groupId>org.apache.maven.plugins</groupId>

<artifactId>maven-eclipse-plugin</artifactId>

<latest>2.9-SNAPSHOT</latest>

<version>2.5.2-SNAPSHOT</version>

<version>2.6-SNAPSHOT</version>

<version>2.6.1-SNAPSHOT</version>

<version>2.7-SNAPSHOT</version>

<version>2.8-SNAPSHOT</version>

<version>2.9-SNAPSHOT</version>

</versions>

</versioning>

</metadata>

Just remove the "latest" node or edit to reflect the version you want to use and problem solved

SQL or NoSQL ?

NoSQL is to consider scalability - the classical non-functional architectural concern. In a classical OLTP architecture, when load increases and your JVM is under pressure, you need to scale. You have two choices:

· vertical scaling - adding more CPU power to your JVM

· horizontal scaling - adding more JVMs (usually one more boxes)

It's generally never any problem scaling the business tier horizontally. Follow J2EE / JEE specs and unless you've done something crazy your business tier will scale. Just add more JVMs and load balance between them. However, while the business tier may be straightforward, the persistence tier ain't so easy. Let's say you are using a classical relational database (such as MySQL, SQLServer, DB2 or Oracle) for your persistence, you can't just add database machines like you can add JVMs. Why not? Imagine trying to do SQL joins when tables are on the same machine and when the tables are on different machines! Imagine trying to do maintain ACID characteristics for your transactions when your database is split across various CPUs? Now think trying to do all that on 5 machines, 50 , 500, 5000 machines? The more machines the harder it gets.

The leading relational databases will scale horizontally. But only by so much. To get around this an architect usually will consider:

· Scaling vertically - putting the database on the best hardware that can be afforded

· Partitioning out legacy data and thus reduce things like the size of index tables. This will boost performance and put less pressure on the need to scale

· Remove the amount of pressure on the database by caching more in the business tier

· Pay a DBA a lot of money!

But what if you just run out of all possible database optimizations options and you have to scale horizontally? Not just to a few machines but to a few hundred if not thousand. This is where NoSQL architectures become relevant.

With a NoSQL database there is no strict schema. Everything is effectively collapsed into one very fat table - a bit like an old school flat file, but where each row stores a huge amount of data. So, instead of having a table for Users and a table for Activities (representing User's activities), you put all the User information together in one fat row. This means there are no joins across tables. It also means there is a lot of data redundancy which means more storage space required. In addition, more computational power will be needed for writes. But because data that is used data is located at the very same place - within the same row - it means no complex joins and hence it is easier to scale. The computational requirement for reads is also less. So reads can go faster.

Another advantage of NoSQL databases is derived from the freedom that comes with not having to be tied to strict schema. You know that headache where a change to a data model can cause big problems? Well since there is no strict schema with NoSQL - this problem does not exist. This makes the architecture more flexible and more extensible.

Right now, it's fair to say NoSQL is only relevant in the minority of architectures. But could this be another case of technical innovation driving business innovation as we have seen with smart phones? There wasn't a need for smart phones but the technical innovation provided business opportunities. I think the same could happen with NoSQL Architectures.

Take a step back from Computer Science and just think Science. Science used to be hypothesis centric, now it is becoming more and more data centric. CERN, genome sequencing, climate change analysis - all involve tonnes and tonnes of data. Surely NoSQL architectures allied with searching technologies such as MapReduce / Hadoop will open up new ways to do Science?

So any disadvantages with NoSQL architectures? Well it's still an immature technology. Indexing, Security models are just not as sophisticated as they are with classical relational databases.

Monday, October 3, 2011

Getting the size of an Object

Overview

Java was designed with the principle that you shouldn't need to know the size of an object. There are times when you really would like to know and want to avoid the guess work.

Measuring how much memory an object uses

There are three factors which make measuring how much an object uses difficult.

The TLAB allocates blocks of memory to a thread. This means small amount of memory don't appear to reduce the free memory. If you do this repeatedly, you will see a block of free memory be used. The way around this is to turn off the TLAB. -XX:-UseTLAB
A GC can occur while you are creating your object. This will result in more free memory at the end than when you started. I ignore any negative sizes in this test ;)
Other threads in the system could use memory at the same time. I perform multiple test and take the median, which removes any outliers.

Size of objects in a 32-bit JVM

Running this SizeofTest, with on 32-bit Sun/Oracle Java 6 update 26, -XX:-UseTLAB I get

The average size of an int is 4.0 bytes

The average size of an Object is 8.0 bytes

The average size of an Integer is 16.0 bytes

The average size of a Long is 16.0 bytes

The average size of an AtomicReference is 16.0 bytes

The average size of an SimpleEntry(Map.Entry) is 16.0 bytes

The average size of a DateTime is 24.0 bytes

The average size of a Calendar is 424.0 bytes

The average size of an Exception is 400.0 bytes

The average size of a bit in a BitSet is 0.125 bytes

Looking a the size of Long confirms the size of header/Object being 8 bytes.

Size of objects with 32-bit references

Running this SizeofTest, with 32-bit references On Sun/Oracle Java 6 update 26, -XX:+UseCompressedOops -XX:-UseTLAB I get

The average size of an int is 4.0 bytes

The average size of an Object is 16.0 bytes

The average size of an Integer is 16.0 bytes

The average size of a Long is 24.0 bytes

The average size of an AtomicReference is 16.0 bytes

The average size of an SimpleEntry(Map.Entry) is 24.0 bytes

The average size of a DateTime is 24.0 bytes

The average size of a Calendar is 448.0 bytes

The average size of an Exception is 440.0 bytes

The average size of a bit in a BitSet is 0.125 bytes

Objects are 8-byte aligned on this JVM, and you could conclude from the size of an Integer that the header is 12-bytes in size.

Size of objects with 64-bit references

Running the same test with 64-bit references. i.e. -XX:-UseCompressedOops -XX:-UseTLAB

The average size of an int is 4.0 bytes

The average size of an Object is 16.0 bytes

The average size of an Integer is 24.0 bytes

The average size of a Long is 24.0 bytes

The average size of an AtomicReference is 24.0 bytes

The average size of an SimpleEntry(Map.Entry) is 32.0 bytes

The average size of a DateTime is 32.0 bytes

The average size of a Calendar is 544.0 bytes

The average size of an Exception is 648.0 bytes

The average size of a bit in a BitSet is 0.125 bytes

From looking at the size of a Long, confirms the size of the header/Object is 16 bytes in length.

// sizeofUtil.java

package com.google.code.java.core.sizeof;

import java.util.Arrays;

public abstract class SizeofUtil {

public double averageBytes() {

int runs = runs();

double[] sizes = new double[runs];

int retries = runs / 2;

final Runtime runtime = Runtime.getRuntime();

for (int i = 0; i < runs; i++) {

Thread.yield();

long used1 = memoryUsed(runtime);

int number = create();

long used2 = memoryUsed(runtime);

double avgSize = (double) (used2 - used1) / number;

// System.out.println(avgSize);

if (avgSize < 0) {

// GC was performed.

i--;

if (retries-- < 0)

throw new RuntimeException("The eden space is not large enough to hold all the objects.");

} else if (avgSize == 0) {

throw new RuntimeException("Object is not large enough to register, try turning off the TLAB with -XX:-UseTLAB");

} else {

sizes[i] = avgSize;

}

Arrays.sort(sizes);

return sizes[runs / 2];

}

protected long memoryUsed(Runtime runtime) {

return runtime.totalMemory() - runtime.freeMemory();

}

protected int runs() {

return 11;

}

protected abstract int create();

}