HashFold

Wednesday, October 16, 2013

Spring Batch - simple Range Partitioner

This sample range partitioner could be used in Spring Batch application to partition data based on the Grid size and allocate them to individual executors.

[java]

package com.hashfold.spring.batch;

import java.util.Collections;
import java.util.HashMap;
import java.util.Map;

import org.springframework.batch.core.partition.support.Partitioner;
import org.springframework.batch.item.ExecutionContext;

public class RangePartitioner implements Partitioner {

private long start;
private long end;

public final void setStart(long start) {
this.start = start;
}

public final void setEnd(long end) {
this.end = end;
}

public RangePartitioner() {
}

@Override
public Map<String, ExecutionContext> partition(int gridSize) {

long rangeSize = end - start;
if (rangeSize <= 0)
return Collections.<String, ExecutionContext> emptyMap();

int numberOfIntervals = gridSize;
long sizeOfSmallSublists = rangeSize / numberOfIntervals;
long sizeOfLargeSublists = sizeOfSmallSublists + 1;
long numberOfLargeSublists = rangeSize % numberOfIntervals;
long numberOfSmallSublists = numberOfIntervals - numberOfLargeSublists;

Map<String, ExecutionContext> result = new HashMap<String, ExecutionContext>();

long numberOfElementsHandled = 0;

for (long i = 0; i < numberOfIntervals; i++) {

long size = i < numberOfSmallSublists ? sizeOfSmallSublists
: sizeOfLargeSublists;

long threadSeq = i + 1;
long startId = numberOfElementsHandled;
long endId = numberOfElementsHandled + size;

/*
* happens when range is less than the grid size. e.g. start=0, end
* = 2 and gridSize=3
*/
if ((endId - startId) <= 0)
continue;

ExecutionContext value = new ExecutionContext();

value.putLong("start", startId);

value.putLong("end", endId);

value.putString("name", "Thread-" + threadSeq);
result.put("partition" + threadSeq, value);

System.out.println("Starting : Thread-" + threadSeq + " : "
+ startId + " - " + endId);

numberOfElementsHandled += size;
}

return result;
}

// used only to test some cases. we should use junit and remove this method.
public static void main(String args[]) {
RangePartitioner rp = new RangePartitioner();
rp.setStart(1);
rp.setEnd(11);
rp.partition(3);

}

}

[/java]

Batch configuration change

[xml]


<job id="partitionJob" xmlns="http://www.springframework.org/schema/batch">


<step id="masterStep">
<partition step="slave" partitioner="transactionPartitioner">
<handler grid-size="10" task-executor="taskExecutor" />
</partition>
</step>
</job>

<bean id="taskExecutor"
class="org.springframework.scheduling.concurrent.ThreadPoolTaskExecutor">
<property name="corePoolSize" value="10" />
<property name="maxPoolSize" value="10" />
</bean>

<bean id="transactionPartitioner" class="com.hashfold.spring.batch.RangePartitioner">

<property name="start" value="1" /> 
<property name="end" value="11" /> 

</bean>

[/xml]

Tuesday, October 15, 2013

SQL Operators order by the performance

key operators used in the WHERE clause, ordered by their performance. Those operators at the top will produce results faster than those listed at the bottom.

=
>
>=
<
<=
LIKE
<>

via here

Wednesday, May 1, 2013

GIT Video Tutorials - learn Git from experts

GIT-SCM.com tutorials:

GIT Casts:

GIT 101:

GIT & GitHub:

text book @ http://git-scm.com/book

git flow: http://nvie.com/posts/a-successful-git-branching-model/

using git flow: http://yakiloo.com/getting-started-git-flow/

Sunday, March 31, 2013

Improving performance of Java String methods

Regular expression matching always comes with the cost of pattern matching. In case the pattern is not precompiled then the the expression parsing happens every time the code is invoked.

Some of Java String methods use regular expressions e.g. matches(...), replaceFirst(...), replaceAll(...) and split(...). These methods actually use Pattern matching library methods internally however these patterns are parsed every time these methods are invoked. The performance impact is significant if these methods are called too frequently or in high traffic zone.

Java library provides Pattern package which could be used to precompile the regular expressions.

Here are equivalent code which uses the precompiled pattern.

1) String.split(regex)

[java]

private static final String regex = "\\.";

String[] keys = str.split(regex);

[/java]

[java]
import java.util.regex.Pattern;
private static final Pattern myPattern = Pattern.compile(regex);
String[] keys = myPattern.split(str);

[/java]

2) String.replace*(regex...)

2.1) replaceFirst() - http://docs.oracle.com/javase/6/docs/api/java/lang/String.html#replaceFirst(java.lang.String, java.lang.String)

[java]

str.replaceFirst(regex, repl) yields exactly the same result as the expression
"Pattern.compile(regex).matcher(str).replaceFirst(repl)""

[/java]

[java]

private static final String regex = "\\.";
private static final Pattern myPattern = Pattern.compile(regex);
myPattern.matcher(str).replaceFirst(repl);

[/java]

2.2) replaceAll() - http://docs.oracle.com/javase/6/docs/api/java/lang/String.html#replaceAll(java.lang.String, java.lang.String)

[java]

str.replaceAll(regex, repl) yields exactly the same result as the expression
"Pattern.compile(regex).matcher(str).replaceAll(repl)""

[/java]

[java]

private static final String regex = "\\.";
private static final Pattern myPattern = Pattern.compile(regex);
myPattern.matcher(str).replaceAll(repl);

[/java]

3) matches(regex) - http://docs.oracle.com/javase/6/docs/api/java/lang/String.html#matches(java.lang.String)

[java]

str.matches(regex) yields exactly the same result as the expression
"Pattern.matches(regex, str)"

[/java]

[java]
private static final String regex = "\\.";
private static final Pattern myPattern = Pattern.compile(regex);
myPattern.matches(regex, str);

[/java]

Wednesday, March 13, 2013

The Zen of Python, by Tim Peters (python -m this)

$ python -m this
The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

Thursday, February 21, 2013

Introducing Google Official Blog Reader

Its a simple Google blog news reader. Written in JavaScript, it uses Google's feed history system to pull feed entries and shows the latest news.

Try it here. http://www.hashfold.com/news/

Note: It uses user's internet to pull the news updates.

send me your feedback and ideas to improve the performance and look n feel of the page.

Monday, February 18, 2013

Writing Your First DSL using Groovy

Writing Your First DSL using Groovy
In this example, we will be writing a DSL to configure the valid values for the columns of a given table.

Below is how we write the DSL:
Sample.groovy:

[java]
package com.hashfold.groovy.dsl

import com.hashfold.groovy.dsl.Schema

Schema.create("MyTable") {
column1 1,2,3,4,5,6,7,8,9,10
column2 50,80,{println 90+1 }
column3 "fixed", { param -> println "envData=${envData}-param="+param }
}

[/java]

Schema.groovy:

[java]
package com.hashfold.groovy.dsl

import groovy.xml.MarkupBuilder

/**
* Processes a simple DSL to create various formats of a memo: xml, html, and text
*/
class Schema {

String name
//def columns = []
def cols = [:]

/**
* This method accepts a closure which is essentially the DSL. Delegate the closure methods to
* the DSL class so the calls can be processed
*/
def static Object create(String dataName, closure) {
Schema dataDsl = new Schema()
dataDsl.name = dataName

//println dataDsl.name

// any method called in closure will be delegated to the memoDsl class
closure.delegate = dataDsl
closure()

//test closure will be invoked from Java!
//dataDsl.cols["testClosure"] = [{ param -> println "print my closure - "+param }]

return dataDsl.cols
}

def storage = [:]
def propertyMissing(String name, value) {
println "Undefined Property setter: ${name} = ${value}"
storage[name] = value
}
def propertyMissing(String name) {
println "Undefined Property getter: ${name}"
storage[name]
}

/*
* This method will be called for each column names e.g. column1…
*/
def methodMissing(String methodName, args) {

List data = []

args.each {

if(it instanceof Closure) {
/*
* lets not evaluate the 'validate' closure.
* we leave it to evaluate on Java side against
* actual data. in case the evaluation returns 0/False,
* we throw validation error
*/
if(methodName.toLowerCase() == "validate") {
data << it
}
else {
//lets not evaluate the closure here. leave it for java
//data << it()
data << it
}
}
else {
data << it } } //this is the last statement of the method which will be returned from the Groovy to Java cols[methodName] = data } //This is the test method which executes the given closure def static test(String name, closure) { println name Schema dataDsl = new Schema() closure.delegate = dataDsl closure() } def getDump() { println ">>>Dumping..."
println name
cols.each {
println "name:"+it.name
println "obj:"+it.range

println "\t"+it.range
it.range.each {
println "\t\t"+it
if(it instanceof Closure) {
it()
}
}
}
}

}
[/java]

Here is how we call the above Groovy script from Java and execute it:

GroovyCaller.java:

[java]
import groovy.lang.Binding;
import groovy.lang.GroovyShell;
import java.io.File;

public class GroovyCaller {

public static void main(String[] args) throws Exception {

Binding binding = new Binding();
//Lets pass some data from Java to Groovy using Binding
binding.setVariable("envData", new Integer(2));
GroovyShell shell = new GroovyShell(binding);

File file = new File("Sample.groovy");

Object value = shell.evaluate(file);
//Lets print the data returned by Groovy (from last method call of MethodMissing)
System.out.println(value);

/* lets invoke the Closure if any! Note here that
* any column value written inside brackets ‘{}’ will be treated
* as Groovy Closure which are execution blocks
*/
LinkedHashMap<String, ArrayList<Object>> cols = (LinkedHashMap<String, ArrayList<Object>>) value;

for (String variable : cols.keySet()) {

ArrayList<Object> list = cols.get(variable);
for(Object data: list) {
if(data instanceof Closure) {
Closure c = (Closure) data;
c.call("MyTest");
}
}
}
}

}
[/java]

Output from GroovyCaller.java:

{column1=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
column2=[50, 80, com.hashfold.groovy.dsl.Sample$_run_closure1_closure2@37963796],
column3=[fixed, com.hashfold.groovy.dsl.Sample$_run_closure1_closure3@1d001d0],
91
envData=3 - param=MyTest

The above last two lines are the result of the Closure execution from Java.

In order to build the java code, you need to add below plugin to your maven pom.xml file:

[xml]
<plugin>
<groupId>org.codehaus.gmaven</groupId>
<artifactId>gmaven-plugin</artifactId>
<version>1.2</version>
<configuration>
<providerSelection>1.7</providerSelection>
</configuration>
<dependencies>
<dependency>
<groupId>org.codehaus.gmaven.runtime</groupId>
<artifactId>gmaven-runtime-1.7</artifactId>
<version>1.2</version>
<exclusions>
<exclusion>
<groupId>org.codehaus.groovy</groupId>
<artifactId>groovy-all</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.codehaus.groovy</groupId>
<artifactId>groovy-all</artifactId>
<version>1.7.0</version>
</dependency>
</dependencies>
<executions>
<execution>
<goals>
<goal>generateStubs</goal>
<goal>compile</goal>
<goal>generateTestStubs</goal>
<goal>testCompile</goal>
</goals>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>2.3.2</version>
<configuration>
<source>1.6</source>
<target>1.6</target>

</configuration>
</plugin>
[/xml]

Also add below dependencies:

[xml]
<dependencies>
<dependency>
<groupId>org.codehaus.groovy</groupId>
<artifactId>groovy-all</artifactId>
<version>1.7.0</version>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.8.1</version>
</dependency>
</dependencies>
[/xml]

Groovy DSL Reference Book: Groovy for Domain-Specific Languages