Monday, January 13, 2014

How HashMap Works

HashMap stores the values in key-value pair and works on the concept of hashing.

Hashing

Hashing is a function which generates a unique value using some algorithms and functions. The first and foremost rule of hashing function is it has to generate the same hash always for the same value.

HashMap

HashMap as mentioned earlier, works on Hashing.
  • It uses a hash table for storing the values in key-value pair. 
  • table is an linked list of type Entry (an inner class) which stores the entries for the same hashCode.
  • It can store key as null (only one).
  • It has two methods get and put for storing and retrieving the values.

Hashing in HashMap

  • To generate the hash value, we have to implement hashCode method for the key.
  • HashMap does one more level of hashing because it may be possible that our implementation may not generate unique value of hash always [As unique value of hash to be generated always].
  • Get the hashCode value defined and performs the hashing on top of it as below.
    final int hash(Object k) {
        int h = hashSeed;
        if (0 != h && k instanceof String) {
            return sun.misc.Hashing.stringHash32((String) k);
        }
        h ^= k.hashCode();
        h ^= (h >>> 20) ^ (h >>> 12);
        return h ^ (h >>> 7) ^ (h >>> 4);
    }

What equals() does?

  • The uniqueness of a key is decided both on equals and hashCode methods.
  • The hash is used for deciding the index of the bucket to store the value. 
  • equals method is used to decide the equality of the values. 
  • Two values having the same hashCode may not be equal. Those values are stored as an array in the same bucket. 
  • But, two values which are equal (true with equals method) must have same hashCode
The following diagram explains the overview of objects stored in the HashMap

  • HashMap internally stores the value in a list of buckets (named table) of type Entry.
  • Each bucket (Entry) is a linked list of key-value pairs of same hashCode.
  • Current key-value pair has a link to next entry 
  • In the above hashMap, 
    • There are two entries with hashCode H1 and three entries with hashCode H9 which are linked list
    • Only one key/value pair stored for H14 and H16
    • Other buckets are empty. 
  • While storing or retrieving a value using a key. HashMap does the following
    • Calculates the hashCode for the key 
    • Finds the bucket for the key.
    • Creates the bucket (If not exist) and stores the values in the linked list for the particular bucket [In case of put method]
    • If bucket not found, returns null. Otherwise parse through the linked list for that particular bucket and finds the entry for the key[In case of get method].

How get method works

Look at the get method of HashMap
    public V get(Object key) {
        if (key == null)
            return getForNullKey();
        Entry<K,V> entry = getEntry(key);
        return null == entry ? null : entry.getValue();
    }

Only three steps

  1. It checks for null. If the key is null, then returns the value which stored at null key.
    1. Value for null key is stored at location 0. Returns null if size is 0 otherwise value at null.
  2. Gets the bucket entry for the key.
    1. Performs hash of the key[as in above code of hash]. 
    2. Find the bucket for that entry and parse through the table (linked list) and finds the value. 
  3. If no entry found, returns null otherwise returns the value at the entry.

How put method works

The put method code is as follows
    public V put(K key, V value) {
        if (table == EMPTY_TABLE) {
            inflateTable(threshold);
        }
        if (key == null)
            return putForNullKey(value);
        int hash = hash(key);
        int i = indexFor(hash, table.length);
        for (Entry<K,V> e = table[i]; e != null; e = e.next) {
            Object k;
            if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
                V oldValue = e.value;
                e.value = value;
                e.recordAccess(this);
                return oldValue;
            }
        }
        modCount++;
        addEntry(hash, key, value, i);
        return null;
    }
Put method also works same as get method but little difference
  • If key is null, adds the value at null key (Index 0).
  • Get the bucket entry for the key.
    • Performs hash of the key[as in above code of hash]. 
    • Finds the entry with hashCode. If not there, creates it.
  • Stores the value in the particular bucket and links to the previous element in the bucket(linked list).
One Note to remember that, put method always returns the old value present at the key. If it's newly stored then returns null.

Size of HashMap

  • HashMap initially will be created with default capacity of 16. 
  • We can explicitly specify the size of hashMap with an argument (int value)
  • HashMap can have a maximum capacity of 1073741824
Happy Learning!!!!!

Saturday, January 4, 2014

Comparable v/s Comparator

Comparable and Comparator are the interfaces used for sorting the elements in the collection. To sort the data in collection either, the data object in the collection has to implement Comparble interface or an extra object to be passed as an parameter to the sort method which implemented Comparator.
Collection.sort method
public static <T extends Comparable<? super T>> void sort(List<T> list) //Using comparable 
public static <T> void sort(List<T> list, Comparator<? super T> c) //Using comparator

Example using Comparable interface

Object implemented Comparable

package com.test.collection;

public class Person implements Comparable<Person> {
   private long id;
   //Getters and Setters
   public int compareTo(Person o) {
      return (int) (getId() - o.getId());
   }
}

How to sort

Create a list of Persons and use Collections.sort method to sort the elements. 
package com.test.collection;

import java.util.ArrayList;
import java.util.Collections;
import java.util.List;

public class CustomSort {
    public static void main(String[] args) {
        List<Person> list = new ArrayList<Person>();
        list.add(new Person(10));
        list.add(new Person(2));
        list.add(new Person(15));
  
        System.out.println("Elements before sorting : ");
        for(Person i : list)
           System.out.print(" "+i.getId());
        Collections.sort(list); 
        System.out.println("\nElements after sorting : ");
        for(Person i : list)
           System.out.print(" "+i.getId());
    }
}

Output will be

Elements before sorting : 
 10 2 15
Elements after sorting : 
 2 10 15

Example Using Comparator interface

Student Object - Which is used in Collection 

package com.test.collection;

public class Student {
     private long id;
     public Student(long id) {
       this.id = id;
     }
     //Getters and setters
}

StundetComparator class - Implementing Comparator interface of type Student 

package com.test.collection;

import java.util.Comparator;

public class StudentComparator implements Comparator<Student> {
   public int compare(Student o1, Student o2) {
        return (int) (o1.getId() - o2.getId());
   }
}

How to sort

package com.test.collection;

import java.util.ArrayList;
import java.util.Collections;
import java.util.List;

public class CustomSort {
    public static void main(String[] args) {
      List<Student> list = new ArrayList<Student>();
      list.add(new Student(10));
      list.add(new Student(2));
      list.add(new Student(15));
  
      System.out.println("Elements before sorting : ");
      for(Student i : list)
         System.out.print(" "+i.getId());
      Collections.sort(list,new StudentComparator()); 
      System.out.println("\nElements after sorting : ");
      for(Student i : list)
         System.out.print(" "+i.getId());
   }
}

Output will be

Elements before sorting : 
 10 2 15
Elements after sorting : 
 2 10 15

Comparision

The implementation looks very similar, lets see the similarities and differences between them
Description Comparbale Comparator
Package java.lang.Comparable java.util.Comparator
Implementation
Need to implement in the same where which to be compared. 
Need to implement in separate Class
Arguments
Takes only one argument as the same type where it is implemented. Need to compare with itself and return an integer value
Takes two arguments of same type and should returns an integer value after comparing them.
Collection.sort
No extra arguments are required during sort. Collection.sort uses the compareTo method implemented to sort
Collection.sort requires this object as second argument to sort the Collection which is passed as first argument
Note: The Comparator should be of same type as the Collection.
Sorting
Always sort based on the compareTo method
Sorting will be done based on the comparator object passed in the second argument. So several Comparators can be created with each has it’s sorting logic. Based on the criteria, we can pass any comparator which is required.

Happy Learning!!!

Basics of Collections in Java

Back to basics again. Let's see what Collections are about.
Collection is a set of object or group of objects and framework provides a defined set of functions to access or evolve. So, Collections Framework provides set of functions to manipulate set of objects.
Briefly, Collections consists of
  • Interfaces: Abstract datatypes which represents the collections.
  • Implementations: Classes which implements the above interfaces. 
  • Algorithms: Methods which performs the operations on the collections. 
Note: Map is not a collections but has the above features. Map is an object which stores the data in key/value pairs.
The below figure shows the brief overview of the Collections. [Green - Interfaces and Violet - Classes]

Explanation:

List: 

  • List allows the data to be stored in re-sizable container.
  • Elements can be accessed randomly by using position i.e index
  • Elements are sorted in order of insertion. 
  • List is an interface, it is implemented by 
    • ArrayList - In one word it's a re-sizable array. But it has one parameter intialCapacity - the number of elements it can hold maximum before it grows.
    • LinkedList - Its implemented as Double linked list. The performance of write operation is better than ArrayList.

Queue:

  • Queue stores a series of elements for processing. (In other words it's sorted list).
  • Queue has the special logic for insertion and removal of the elements 
  • Queue is implemented by
    • PriorityQueue - It imposes the ordering during construction of it or by using Comparator. It has special methods to access - poll, remove, peek, element
    • LinkedList - Linked list is also an implementation of Queue (also of List).

DeQueue:

  • Dequeue is Double ended Queue (It extends Queue). 
  • The operations can be performed from both the ends. 
  • Its implemented by ArrayDequeue
  • It has methods to perform- 
    • addFirst, removeFirst, pollFirst etc for the elements at the beginning
    • addLast, removeLast, pollLast etc for the elements at the end. 

Set:


  • Set interface works exactly same as mathematical Set. 
  • It cannot contain duplicate elements. 
  • Set is implemented by
    • HashSet - Its a plain vanilla implementation of Set. It doesn't guarantee the order of the elements stored but cannot contain duplicates. It used Hash table for storing the data.
    • LinkedHashSet - It also uses hash table but carries the insertion order like List.

SortedSet

  • SortedSet as the name explains it's a sorted implementation of set. (Extends Set)
  • SortedSet is implemented by
    • TreeSet - It stores the elements in order but performance is a bottle-neck. HashSet is much faster than TreeSet
Note: Basic data types like (int, char, double etc) cannot be used as type in Collections. Instead use Wrapper classes (Integer, Double etc).

Which Collection to Use

Based on the case we are implementing, the collections to be use will be changed.
  • Iterating: Each of the collections implements Iterable. So when iterate over all the elements.
  • Random Access: All the list implementation allows random access.
  • Duplicates: Sets dont allow duplicates whereas others allows.
  • Ordering: Each collection has it own way of ordering but explicit ordering can be using by using Comparator
    • Sorted Order - SortedSet, PriorityQueue has can impose ordering while creation
    • Insertion Order - List implements allows the ordering based on the insertion.
    • No ordering - Set, Queue (Other than PriorityQueue) doesn't impose any ordering
  • Key-Value pair: HashMap allows the key-value pair.
  • Operation Performance: Always the sorted collections are bit poor in performance. In addition to that, the above four selections also adds to it.
We will see more about features of Collections in future posts. Happy Learning!!!!

Thursday, January 2, 2014

Spring Batch - Chunk Oriented Processing

This is continuation to post. In General, the batch processing needs to process tons tons of data instead of running simple tasks(as we saw in last post). Spring batch allows us to do this with Reader/Writer and Processors. As the name suggests, reader will read the data from the source, processor processes the source data and converts to output. Finally, Writers writes to the destination.
Looks simple, then how can we achieve chunk oriented processing with this. Spring simplified the process to the developers. We need just take care of from where to read(Reader), how to process it(Processor) and where and how to write the processed data(Writer).

Now will see the following with example:
  • Setup a job with steps ( This we already know)
  • Configure readers, writers and processor for each step
  • Conditional based routing to next step
  • Read and process one by one but commit once.
  • Exceptional cases
Spring config XML
<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:batch="http://www.springframework.org/schema/batch"
 xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-3.0.xsd http://www.springframework.org/schema/batch  http://www.springframework.org/schema/batch/spring-batch-2.1.xsd">

 <!-- Transaction Manager -->
 <bean id="transactionManager"
    class="org.springframework.batch.support.transaction.ResourcelessTransactionManager" />
  
 <!-- jobRepository Declaration -->
 <bean id="jobRepository"
  class="org.springframework.batch.core.repository.support.MapJobRepositoryFactoryBean">
  <property name="transactionManager" ref="transactionManager" />
 </bean>

 <!-- jobLauncher Declaration -->
 <bean id="jobLauncher"
  class="org.springframework.batch.core.launch.support.SimpleJobLauncher">
  <property name="jobRepository" ref="jobRepository" />
 </bean>

 <!-- Readers -->
 <bean id="sucessTestReader" class="com.test.batch.TestReader">
  <property name="count" value="10" />
 </bean>
 
 <bean id="failedTestReader" class="com.test.batch.TestReader">
  <property name="raiseError" value="true" />
 </bean>

 <!-- Processor Bean Declaration -->
 <bean id="testProcessor" class="com.test.batch.TestProcessor" />

 <!-- Writer Bean Declaration -->
 <bean id="testWriter" class="com.test.batch.TestWriter" />
 
 <!-- Tasklet -->
 <bean id="testTasklet" class="com.test.batch.TestTasklet">
  <property name="fail" value="false" />
 </bean>
 <bean id="failTasklet" class="com.test.batch.TestTasklet" >
  <property name="fail" value="true" />
 </bean>

 <!-- Batch Job Declarations -->
 <batch:job id="sucessJob">
  <batch:step id="firstStep" next="secondStep">
   <batch:tasklet>
    <batch:chunk reader="sucessTestReader" processor="testProcessor"
     writer="testWriter" commit-interval="5" />
   </batch:tasklet>
  </batch:step>
  <batch:step id="secondStep">
   <batch:tasklet>
    <batch:chunk reader="sucessTestReader" processor="testProcessor"
     writer="testWriter" commit-interval="2" />
   </batch:tasklet>
  </batch:step>
 </batch:job>

 <batch:job id="anotherJob">
  <batch:step id="failStep">
   <batch:tasklet>
    <batch:chunk reader="failedTestReader" processor="testProcessor"
     writer="testWriter" commit-interval="2" />
   </batch:tasklet>
   <batch:next on="*" to="sucessStep" />
   <batch:next on="FAILED" to="failedStep" />
  </batch:step>
  <batch:step id="sucessStep">
   <batch:tasklet ref="testTasklet" />
  </batch:step>
  <batch:step id="failedStep">
   <batch:tasklet ref="failTasklet" />
  </batch:step>
 </batch:job>
</beans>

Explanation:
  • There are two jobs
    • sucessJob - Consists of two steps
      • firstStep - A combination of testReader, testWriter and testProcessor with commit-interval of 5. Once the step is completed, the job proceeds to secondStep (because next attribute is defined as secondStep). The commit-interval says that - five items will be read one by one and processed but the 5 items will be written at once
      • secondStep - Same as above but commit interval is 2
    • anotherJob - Consists of three steps
      • failStep - A combination of failedTestReader, testWriter and testProcessor with a commit-interval of 2.
        • Goes to sucessStep when completed properly
        • Goes to failedStep when there is an exception.
      • sucessStep - Job executes this step for all cases except when failed
      • failedStep - Job executes this step if the first step is failed or throws an exception.
  • Step can be defined as Tasklet(Simple task) or a combination of Reader, Writer and Processor (Chunk oriented processing). 
  • We can define next step either 
    • As attribute of step tag  - The next step will be executed irrespective of the success criteria
    • Separate tag inside step - The next step will be executed based on the outcome of current step.
  • Reader, Writer and Processor should implement ItemReader, ItemWriter and ItemProcessor respectively. (As shown in the next code snippet).
    • ItemReader has method read - which returns a generic type I (Should be an item which is to be read)
    • ItemProcessor has method process - takes I  as an input (which is read by Reader) and returns O (Should be an item which should be committed).
    • ItemWriter has method write - takes O as an input which is processed by Processor.
  • ItemWriter has the write method which takes list as an argument - Because commit will be done in batch (batch count will be equal to commit-interval).
  • Reader reads continuously (in a step), unless it returns a null. One reader returns null, the step will be completed after committing the batch. 
  • Writer commits the data once the reader read n (commit-interval) number of inputs and processor processed them or reader completed reading all the inputs (returns null).
  • When an exception occurs, writer ignores the data to write. This case need to handled explicitly - (Using failed step in the above XML). 
The below is the code representing the above jobs and their reader, writer, processor and tasklets with input and output data objects.

Source

Source.java - Which is used an input - prepared by Reader
package com.test.entity;

public class Source {
 private String input;
        //getters and setters
}
Destination.java - Which is prepared by Processor and used by Writer
package com.test.entity;

public class Destination {
 private String output;
        //getters and setters
}
TestReader.java
package com.test.batch;

import org.springframework.batch.item.ItemReader;
import org.springframework.batch.item.NonTransientResourceException;
import org.springframework.batch.item.ParseException;
import org.springframework.batch.item.UnexpectedInputException;
import com.test.entity.Source;

public class TestReader implements ItemReader<Source> {
 
 private int count = 10;
 private boolean raiseError = false;
 private static int internalCount = 0;

 public Source read() throws Exception, UnexpectedInputException,
   ParseException, NonTransientResourceException {
  
  if(raiseError)
  {
   System.out.println("Exception occured while reading");
   throw new Exception("New Exception");
  }
  internalCount++;
  if(internalCount >= count)
  {
   System.out.println("Reached max count "+internalCount);
   return null;
  }
  Source source = new Source();
  source.setInput("Input "+internalCount);
  System.out.println("Reading item "+internalCount);
  return source;
 }
        //Getters and Setters
}
TestProcessor.java
package com.test.batch;

import org.springframework.batch.item.ItemProcessor;
import com.test.entity.Destination;
import com.test.entity.Source;

public class TestProcessor implements ItemProcessor {

 public Destination process(Source source) throws Exception {
  Destination destination = new Destination();
  destination.setOutput(source.getInput().replace("Input", "Output"));
  System.out.println("Converted "+source.getInput()+" to "+destination.getOutput());
  return destination;
 }

}
TestWriter.java
package com.test.batch;

import java.util.List;
import org.springframework.batch.item.ItemWriter;
import com.test.entity.Destination;

public class TestWriter implements ItemWriter {

 public void write(List arg0) throws Exception {
  for(Destination dest : arg0)
  {
   System.out.println("Writing : "+dest.getOutput());
  }
  System.out.println("Finished Writing");
 }

}
TestTasklet.java
package com.test.batch;

import org.springframework.batch.core.StepContribution;
import org.springframework.batch.core.scope.context.ChunkContext;
import org.springframework.batch.core.step.tasklet.Tasklet;
import org.springframework.batch.repeat.RepeatStatus;

public class TestTasklet implements Tasklet {

 private boolean fail = false;

 public RepeatStatus execute(StepContribution arg0, ChunkContext arg1)
   throws Exception {
  if (!fail) {
   System.out.println("Finished sucessfully");
   return RepeatStatus.FINISHED;
  } else {
   System.out.println("Exception... So rollback should take place");
   return RepeatStatus.FINISHED;
  }
 }
        //Getters and Setters
}

Output
  • When we run the job sucessJob
INFO: Executing step: [firstStep]
Reading item 1
Reading item 2
Reading item 3
Reading item 4
Reading item 5
Converted Input 1 to Output 1
Converted Input 2 to Output 2
Converted Input 3 to Output 3
Converted Input 4 to Output 4
Converted Input 5 to Output 5
Writing : Output 1
Writing : Output 2
Writing : Output 3
Writing : Output 4
Writing : Output 5
Finished Writing
Reading item 6
Reading item 7
Reading item 8
Reading item 9
Reached max count 10
Converted Input 6 to Output 6
Converted Input 7 to Output 7
Converted Input 8 to Output 8
Converted Input 9 to Output 9
Writing : Output 6
Writing : Output 7
Writing : Output 8
Writing : Output 9
Finished Writing
INFO: Executing step: [secondStep]
Reached max count 11
INFO: Job: [FlowJob: [name=sucessJob]] completed with the following parameters: [{}] and the following status: [COMPLETED]
  • When we run the anotherJob. Output looks like
INFO: Executing step: [failStep]
SEVERE: Encountered an error executing the step
java.lang.Exception: New Exception
 at com.test.batch.TestReader.read(TestReader.java:24)
 at com.test.batch.TestReader.read(TestReader.java:1)
 at org.springframework.batch.core.step.item.SimpleChunkProvider.doRead(SimpleChunkProvider.java:90)
 ...
Exception occured while reading
Jan 02, 2014 11:34:30 PM org.springframework.batch.core.job.SimpleStepHandler handleStep
INFO: Executing step: [failedStep]
Exception... So rollback should take place
INFO: Job: [FlowJob: [name=anotherJob]] completed with the following parameters: [{}] and the following status: [COMPLETED]