String Handling
Questions often arise about how to perform manipulations of strings, using Talend.
Strings are represented by the Java String Class. Questions about how to manipulate these using Talend, often result in a Java answer.
String Class
The first place to look, is at the Methods that are provided by the String class, itself.
The String class provides many of the everyday functions that you will need, for example toUpperCase() andtoLowerCase(). It is worth spending a few minutes to look at what is available, as you may be surprised at what can be achieved, with very little effort.
The first thing to remember, when addressing a String object is that you must ensure that it is not a Null Pointer, before calling any of its Class Methods.
String.substring(int startIndex, int endIndex)
If you're new to Java, you can be easily caught out when using String.substring(int strtIndex, int endIndex).
The first thing to remember with the Java Class String, is that indexing starts at 0 (zero).
The most often misunderstood aspect of this method is that, while the startIndex is inclusive, the endIndex isexclusive. Not understanding this can yield some confusing result.
Let's look at some calls, using the 19 character String The quick brown fox.
Call | Result | Comment |
---|---|---|
myString.substring(0, 3) | The | Index 0 is the first character of the String and is included. Index 3 is the fourth character of the String and is excluded. |
myString.substring(10, 15) | brown | See above. |
myString.substring(16, myString.length()) | fox | The length of the String is19 characters and is beyond the index range of the String. This yields a String from index 16 to the end of the String. |
myString.substring(0) | The quick brown fox | This yields the entire String. |
myString.substring(myString.length()) | "" | Note that entering a starting position one character beyond the last character of the String, yields an empty String. |
myString.substring(myString.length() + 1) | java.lang.StringIndexOutOfBoundsException | Reading beyond the length of the String causes an Exception to be thrown. |
TalendString & StringHandling
Talend provide the TalendString and StringHandling routines. These provide wrappers to some of the more commonly used (from an ETL point-of-view) String manipulation functions. Often, these simply perform a null-test before calling a basic String Method.
StringUtils
StringUtils is a useful and general purpose library, provided by the Apache Organisaion. StringUtils provides a huge array of string manipulation functions. If you find that you're looking for something beyond what is provided by Talend, this is a great place to look.