Archive for the 'java' Category

public static void main (String [] argv) {

Saturday, November 4th, 2006

public static void main (String [] argv) {

Saturday, November 4th, 2006

package java.lang; public final class String implements java.io.Serializable, Comparable, CharSequence { // This is a partial API listing public boolean matches (String regex) public String [] split (String regex) public String [] split (String regex, int limit) public String replaceFirst (String regex, String replacement) public String replaceAll (String regex, String replacement) } All the new String methods are pass-through calls to methods of the Pattern or Matcher classes. Now that you know how Pattern and Matcher are used and inter-operate, using these String convenience methods should be a no brainer. Instead of describing each method, they are summarized in Table 5-6. Table 5-6. Regular expression methods of the String class String method signature java.util.regex equivalent input.matches (String regex) Pattern.matches (String regex, CharSequence input) input.split (String regex) pat.split (CharSequence input) input.split (String regex, int limit) pat.split (CharSequence input, int limit) input.replaceFirst (String regex, String replacement) match.replaceFirst (String replacement) input.replaceAll (String regex, String replacement) match.replaceAll (String replacement) In Table 5-6, assume that there is a String named input, a Pattern object named pat, and a Matcher named match: String input = “Mary had a little lamb”; String [] tokens = input.split (”\s+”); // split on whitespace As of JDK 1.4, none of these regular expression convenience methods cache any expressions or do any other optimizations. Some JVM implementations may choose to cache and reuse pattern objects, but you should not rely on them. If you expect to apply the same pattern-matching operations repeatedly, it will be more efficient to use the classes in java.util.regex. 5.4 Java Regular Expression Syntax Following is a summary of the regular expression syntax supported by the java.util.regex package, as released in JDK 1.4. Things change quickly in the Java world, so you should always check the current documentation provided with the Java implementation you’re using. The information provided here is a quick reference to get you started. 186

Hint: This post is supported by Gama web hosting hrvatska services

public static void main (String [] argv) {

Saturday, November 4th, 2006

public static void main (String [] argv) { String input = “Thanks, thanks very much”; String regex = “([Tt])hanks”; Pattern pattern = Pattern.compile (regex); Matcher matcher = pattern.matcher (input); StringBuffer sb = new StringBuffer(); // Loop while matches are encountered while (matcher.find()) { if (matcher.group(1).equals (”T”)) { matcher.appendReplacement (sb, “Thank you”); } else { matcher.appendReplacement (sb, “thank you”); } } // Complete the transfer to the StringBuffer matcher.appendTail (sb); // Print the result System.out.println (sb.toString()); // Let’s try that again using the $n escape in the replacement sb.setLength (0); matcher.reset(); String replacement = “$1hank you”; // Loop while matches are encountered while (matcher.find()) { matcher.appendReplacement (sb, replacement); } // Complete the transfer to the StringBuffer matcher.appendTail (sb); // Print the result System.out.println (sb.toString()); // and once more, the easy way (because this example is simple) System.out.println (matcher.replaceAll (replacement)); // one last time, using only the String System.out.println (input.replaceAll (regex, replacement)); } } 5.3 Regular Expression Methods of the String Class It should be pretty obvious from the preceding sections that strings and regular expressions go hand in hand. It’s only natural then that our old friend the String class has added some convenience methods to do common regular expression operations: 185

Hint: This post is supported by Gama web hosting hrvatska services

To generate a replacement sequence of ab, the

Saturday, November 4th, 2006

Matcher matcher = pattern.matcher (”Thanks, thanks very much”); StringBuffer sb = new StringBuffer(); while (matcher.find()) { if (matcher.group(1).equals (”T”)) { matcher.appendReplacement (sb, “Thank you”); } else { matcher.appendReplacement (sb, “thank you”); } } matcher.appendTail (sb); Table 5-5 shows the sequence of changes applied to the StringBuffer by the above code. Table 5-5. Using appendReplacement() and appendTail() Append position Execute Resulting StringBuffer 0 appendReplacement (sb, “Thankyou”) Thank you 6 appendReplacement (sb, “thankyou”) Thank you, thank you 14 appendTail (sb) Thank you, thank youvery much This sequence of append operations results in the StringBuffer object sb containing the string “Thank you, thank you very much”. Example 5-8 is a complete code example showing this type of replacement, as well as alternate ways of performing the same substitution. In this simple case, the value of a capture group can be used because the first letter of the matched pattern is the same as that of the replacement. In a more complex case, there may not be an overlap between the input and the replacement values. Using Matcher.find() and Matcher.appendReplacement() allows you to programmatically mediate each replacement, possibly injecting different replacement values at each point along the way. Example 5-8. Regular expression append/replace package com.ronsoft.books.nio.regex; import java.util.regex.Pattern; import java.util.regex.Matcher; /** * Test the appendReplacement() and appendTail() methods of the * java.util.regex.Matcher class. * * @author Ron Hitchens (ron@ronsoft.com) */ public class RegexAppend{ 184
Note: If you are looking for cheap and quality provider to host and run your java application check Astra java hosting services

To generate a replacement sequence of ab, the

Saturday, November 4th, 2006

The two append methods listed in the Matcher API are useful when iterating though an input character sequence, repeatedly invoking find(): package java.util.regex; public final class Matcher{ // This is a partial API listing public StringBuffer appendTail (StringBuffer sb) public Matcher appendReplacement (StringBuffer sb, String replacement) } Rather than returning a new String with the replacement already performed, the append methods append to a StringBuffer object you provide. This allows you to make decisions about the replacement at each point a match is found or to accumulate the result of matching against multiple input strings. Using appendReplacement() and appendTail() gives you total control of the search-and-replace process. One of the bits of state information remembered by Matcher objects is an append position. The append position is used to remember the amount of the input character sequence that has already been copied out by previous invocations of appendReplacement(). When appendReplacement() is invoked, the following process takes place: 1. Characters are read from the input starting at the current append position and appended to the provided StringBuffer. The last character copied is the one just before the first character of the matched pattern. This is the character at the index returned by start() minus one. 2. The replacement string is appended to the StringBuffer and substitutes any embedded capture group references as described earlier. 3. The append position is updated to be the index of the character following the matched pattern, which is the value returned by end(). The appendReplacement() method works properly only if a previous match operation was successful (usually a call to find()). You will be rewarded with a delightful java.lang.IllegalStateException if the last match returned false or if the method is called immediately following a reset. But don’t forget that there may be remaining characters in the input beyond the last match of the pattern. You probably don’t want to lose those, but appendReplace-ment() will not have copied them otherwise, and end() won’t return a useful value after find() fails to find any more matches. The appendTail() method is there to copy the remainder of your input in this situation. It simply copies any characters from the current append position to the end of the input and appends them to the given StringBuffer. The following code is a typical usage scenario for appendReplacement() and appendTail(): Pattern pattern = Pattern.compile (”([Tt])hanks”); 183
Note: If you are looking for cheap and quality provider to host and run your java application check Astra java hosting services

To generate a replacement sequence of ab, the

Saturday, November 4th, 2006

To generate a replacement sequence of ab, the String literal argument to replaceAll() must be a\\b (see Example 5-7). Be careful when counting those backslashes! Example 5-7. Backslashes in regular expressions package com.ronsoft.books.nio.regex; import java.util.regex.Pattern; import java.util.regex.Matcher; /** * Demonstrate behavior of backslashes in regex patterns. * * @author Ron Hitchens (ron@ronsoft.com) */ public class BackSlashes{ public static void main (String [] argv) { // Substitute “ab” for XYZ or ABC in input String rep = “a\\b”; String input = “> XYZ <=> ABC <"; Pattern pattern = Pattern.compile ("ABC|XYZ"); Matcher matcher = pattern.matcher (input); System.out.println (matcher.replaceFirst (rep)); System.out.println (matcher.replaceAll (rep)); // Change all newlines in input to escaped, DOS-like CR/LF rep = "\\r\\n"; input = "line 1nline 2nline 3n"; pattern = Pattern.compile ("\n"); matcher = pattern.matcher (input); System.out.println (""); System.out.println ("Before:"); System.out.println (input); System.out.println ("After (dos-ified, escaped):"); System.out.println (matcher.replaceAll (rep)); } } Here's the output from running BackSlashes: > ab <=> ABC < > ab <=> ab < Before: line 1 line 2 line 3 After (dos-ified, escaped): line 1rnline 2rnline 3rn 182
Note: If you are looking for cheap and quality provider to host and run your java application check Astra java hosting services

The number of capture groups in the regular

Saturday, November 4th, 2006

{ public static void main (String [] argv) { // sanity check, need at least three args if (argv.length < 3) { System.out.println ("usage: regex replacement input ..."); return; } // Save the regex and replacment strings with mnemonic names String regex = argv [0]; String replace = argv [1]; // Compile the expression; needs to be done only once Pattern pattern = Pattern.compile (regex); // Get a Matcher instance and use a dummy input string for now Matcher matcher = pattern.matcher (""); // print out for reference System.out.println (" regex: '" + regex + "'"); System.out.println (" replacement: '" + replace + "'"); // For each remaining arg string, apply the regex/replacmentfor (int i = 2; i < argv.length; i++) { System.out.println ("------------------------"); matcher.reset (argv [i]); System.out.println (" input: '" + argv [i] + "'"); System.out.println ("replaceFirst(): '" + matcher.replaceFirst (replace) + "'"); System.out.println (" replaceAll(): '" + matcher.replaceAll (replace) + "'"); } } } And here's the output from running RegexReplace: regex: '([bB])yte' replacement: '$1ite' input: 'Bytes is bytes' replaceFirst(): 'Bites is bytes'replaceAll(): 'Bites is bites' Remember that regular expressions interpret backslashes in the strings you provide. Also remember that the Java compiler expects two backslashes for each one in a literal String. This means that if you want to escape a backslash in the regex, you'll need two backslashes in the compiled String. To get two backslashes in a row in the compiled regex string, you'll need four backslashes in a row in the Java source code. 181
Note: If you are looking for cheap and inexpensive provider to host and run your tomcat application check Actions tomcat hosting services

The number of capture groups in the regular

Saturday, November 4th, 2006

The number of capture groups in the regular expression pattern is returned by the groupCount() method. This value derives from the original Pattern object and is immutable. Group numbers must be positive and less than the value returned by groupCount(). Passing a group number out of range will result in a java.lang.IndexOutOfBoundsException. A capture group number can be passed to start() and end() to determine the subsequence matching the given capture group subexpression. It’s possible for the overall expression to successfully match but one or more capture groups not to have matched. The start() and end() methods will return a value of -1 if the requested capture group is not currently set. As mentioned earlier, the entire regular expression is considered to be group zero. Invoking start() or end() with no argument is equivalent to passing an argument of zero. Invoking start() or end() for group zero will never return -1. You can extract a matching subsequence from the input CharSequence using the values returned by start() and end() (as shown previously), but the group() methods provide an easier way to do this. Invoking group() with a numeric argument returns a String that is the matching subsequence for that particular capture group. If you call the version of group() that takes no argument, the subsequence matched by the entire regular expression (group zero) is returned. This code: String match0 = input.subSequence (matcher.start(), matcher.end()).toString(); String match2 = input.subSequence (matcher.start (2), matcher.end (2)).toString(); is equivalent to this: String match0 = matcher.group(); String match2 = matcher.group(2); Finally, let’s look at the methods of the Matcher object that deal with modifying a character sequence. One of the most common applications of regular expressions is to do a search-and-replace. The replaceFirst() and replaceAll() methods make this very easy to do. They behave identically except that replaceFirst() stops after the first match it finds, while replaceAll() iterates until all matches have been replaced. Both take a String argument that is the replacement value to substitute for the matched pattern in the input character sequence. package java.util.regex; public final class Matcher{ // This is a partial API listing public String replaceFirst (String replacement) public String replaceAll (String replacement) } 179
Note: If you are looking for cheap and inexpensive provider to host and run your tomcat application check Actions tomcat hosting services

The number of capture groups in the regular

Saturday, November 4th, 2006

As mentioned earlier, capture groups can be back-referenced within the regular expression. They can also be referenced from the replacement string you provide to replaceFirst() or replaceAll(). Capture group numbers can be embedded in the replacement string by preceding them with a dollar sign character. When the replacement string is substituted into the result string, each occurrence of $g is replaced by the value that would be returned by group(g). If you want to use a literal dollar sign in the replacement string, you must precede it with a backslash character ($). To pass through a backslash, you must double it (\). If you want to concatenate literal numeric digits following a capture group reference, separate them from the group number with a backslash, like this: 123$2456. See Table 5-4 for some examples. See also Example 5-6 for sample code. Table 5-4. Replacement of matched patterns Regex pattern Input Replacement replaceFirst() replaceAll() a*b aabfooaabfooabfoob –fooaabfooabfoob -foo-foo-foo p{Blank} fee fiefoe fum _ fee_fiefoe fum fee_fie_foe_fum ([bB])yte Byte forbyte $1ite Bite forbyte Bite forbite dddd([- ]) card #1234-5678-1234 xxxx$1 card #xxxx-5678-1234 card #xxxx-xxxx-1234 (up|left)( *)(right|down) leftright, up down $3$2$1 rightleft, up down rightleft, down up ([CcPp][hl]e[ea]se) I wantcheese. Please. $1 I want cheese . Please. I want cheese . Please . Example 5-6. Regular expression replacement package com.ronsoft.books.nio.regex; import java.util.regex.Pattern; import java.util.regex.Matcher; /** * Exercise the replacement capabilities of the java.util.regex.Matcherclass. * Run this code from the command line with three or more arguments. * 1) First argument is a regular expression * 2) Second argument is a replacement string, optionally with capture group * references ($1, $2, etc) * 3) Any remaining arguments are treated as input strings to which the * regular expression and replacement strings will be applied. * The effect ofcalling replaceFirst() and replaceAll() for each input string * will be listed. * * Be careful to quote the commandline arguments if they contain spaces or * special characters. * * @author Ron Hitchens (ron@ronsoft.com) */ public class RegexReplace 180
Note: If you are looking for cheap and inexpensive provider to host and run your tomcat application check Actions tomcat hosting services

// Compile the email address detector pattern Pattern

Friday, November 3rd, 2006

The lookingAt() method is similar to matches() but does not require that the entire sequence be matched by the pattern. If the regular expression pattern matches the beginning of the character sequence, then lookingAt() returns true. The lookingAt() method always begins scanning at the beginning of the sequence. The name of this method is intended to indicate if the matcher is currently “looking at” a target that starts with the pattern. If it returns true, then the start(), end(), and group() methods can be called to determine the extent of the matched subsequence (more about those methods shortly). The find() method performs the same sort of matching operation as lookingAt(), but remembers the position of the previous match and resumes scanning after it. This allows successive calls to find() to step through the input and find embedded matches. On the first call following a reset, scanning begins at the first character of the input sequence. On subsequent calls, it resumes scanning at the first character following the previously matched subsequence. For each invocation, true is returned if the pattern was found; otherwise, false is returned. Typically, you’ll use find() to iterate over some text to find all the matching patterns within it. The version of find() that takes a positional argument does an implicit reset and begins scanning the input at the provided index position. Afterwards, no-argument find() calls can be made to scan the remainder of the input sequence if needed. Once a match has been detected, you can determine where in the character sequence the match is located by calling start() and end(). The start() method returns the index of the first character of the matched sequence; end() returns the index of the last character of the match plus one. These values are consistent with CharSequence.subsequence() and can be used directly to extract the matched subsequence. CharSequence subseq; if (matcher.find()) { subseq = input.subSequence (matcher.start(), matcher.end()); } Some regular expressions can match the empty string, in which case start() and end() will return the same value. The start() and end() methods return only meaningful values only if a match has previously been detected by matches(), lookingAt(), or find(). If no match has been made, or the last matching attempt returned false, then invoking start() or end() will result in a java.lang.IllegalStateException. To understand the forms of start() and end() that take a group argument, we first need to understand expression capture groups. (See Figure 5-2.) Figure 5-2. start(), end(), and group() values 177
Note: If you are looking for cheap and reliable provider to host and run your servlet application check Vision servlet hosting services

// Compile the email address detector pattern Pattern

Friday, November 3rd, 2006

Regular expressions may contain subexpressions, known as capture groups, enclosed in parentheses. During the evaluation of the regular expression, the subsequences of the input matching these capture group expressions are saved and can be referenced later in the expression. Once the full matching operation is complete, these saved snippets can be retrieved from the Matcher object by specifying a corresponding group number. Capture groups can be nested and are numbered by counting their opening parens from left to right. The entire expression, whether or not it has any subgroups, is always counted as capture group zero. For example, the regular expression A((B)(C(D))) would have capture groups numbered as in Table 5-3. Table 5-3. Regular expression capture groups of A((B)(C(D))) Group number Expression group 0 A((B)(C(D))) 1 ((B)(C(D))) 2 (B) 3 (C(D)) 4 (D) There are exceptions to this grouping syntax. A group beginning with (? is a pure, or noncapturing, group. Its value is not saved, and it’s not counted for purposes of numbering capture groups. (See Table 5-7 for syntax details.) Let’s look in more detail at the methods for working with capture groups: package java.util.regex; public final class Matcher{ // This is a partial API listing public int start() public int start (int group) public int end() public int end (int group) public int groupCount() public String group() public String group (int group) } 178
Note: If you are looking for cheap and reliable provider to host and run your servlet application check Vision servlet hosting services

// Compile the email address detector pattern Pattern

Friday, November 3rd, 2006

// Compile the email address detector pattern Pattern pattern = Pattern.compile ( “([a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}\.[0-9]” + “{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))” + “([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)”, Pattern.MULTILINE); // Make a Matcher object for the pattern Matcher matcher = pattern.matcher (”"); // Loop through the args and find the addrs in each onefor (int i = 0; i < argv.length; i++) { boolean matched = false; System.out.println (""); System.out.println ("Looking at " + argv [i] + " ..."); // Reset the Matcher to look at the current arg string matcher.reset (argv [i]); // Loop while matches are encountered while (matcher.find()) { // found one System.out.println ("t" + matcher.group()); matched = true; } if ( ! matched) { System.out.println ("tNo email addresses found"); } } } } Here's the output from EmailAddressFinder when run on some typical addresses: Looking at Ron Hitchens ,ron@ronsoft.com., fred@bedrock.com, barney@rubble.org, Wilma … ron@ronsoft.comfred@bedrock.combarney@rubble.orgwflintstone@rockvegas.com The next group of methods return boolean indications of how the regular expression applies to the target character sequence. The first, matches(), returns true if the entire character sequence is matched by the regular expression pattern. If the pattern matches only a subsequence, false is returned. This can be useful to select lines in a file that fit a certain pattern exactly. This behavior is identical to the convenience method matches() on the Pattern class. 176
Note: If you are looking for cheap and reliable provider to host and run your servlet application check Vision servlet hosting services

, " " 5.2.3 The Matcher Class The

Friday, November 3rd, 2006

BufferedReader br = null; String line; try { br = new BufferedReader (new FileReader (file)); } catch (IOException e) { System.err.println (”Cannot read ‘” + file + “‘: ” + e.getMessage()); continue; } while ((line = br.readLine()) != null) { matcher.reset (line); if (matcher.find()) { System.out.println (file + “: ” + line); } } br.close(); } } } Example 5-5 demonstrates a more sophisticated use of the reset() method to allow a Matcher to work on several different character sequences. Example 5-5. Extracting matched expressions package com.ronsoft.books.nio.regex; import java.util.regex.Pattern; import java.util.regex.Matcher; /** * Validates email addresses. * * Regular expression found in the Regular Expression Library * at regxlib.com. Quoting from the site, * “Email validator that adheres directly to the specification * for email address naming. It allows for everything from * ipaddress and country-code domains, to very rare characters * in the username.” * * @author Michael Daudel (mgd@ronsoft.com) (original) * @author Ron Hitchens (ron@ronsoft.com) (hacked) */ public class EmailAddressFinder{ public static void main (String[] argv) { if (argv.length < 1) { System.out.println ("usage: emailaddress ..."); } 175
Note: If you are looking for top 10 and very good webhost to host and run your jsp application check Actions jsp hosting services

, " " 5.2.3 The Matcher Class The

Friday, November 3rd, 2006

, " " 5.2.3 The Matcher Class The Matcher class provides a rich API for matching regular expression patterns against character sequences. A Matcher instance is always created by invoking the matcher() method of a Pattern object, and it always applies the regular expression pattern encapsulated by that Pattern: package java.util.regex; public final class Matcher{ public Pattern pattern() public Matcher reset() public Matcher reset (CharSequence input) public boolean matches() public boolean lookingAt() public boolean find() public boolean find (int start) public int start() public int start (int group) public int end() public int end (int group) public int groupCount() public String group() public String group (int group) public String replaceFirst (String replacement) public String replaceAll (String replacement) public StringBuffer appendTail (StringBuffer sb) 173
Note: If you are looking for top 10 and very good webhost to host and run your jsp application check Actions jsp hosting services

, " " 5.2.3 The Matcher Class The

Friday, November 3rd, 2006

public Matcher appendReplacement (StringBuffer sb, String replacement) } Instances of the Matcher class are stateful objects that encapsulate the matching of a specific regular expression against a specific input character sequence. Matcher objects are not thread-safe because they hold internal state between method invocations. Every Matcher instance is derived from a Pattern instance, and the pattern() method of Matcher returns a back reference to the Pattern object that created it. Matcher objects can be used repeatedly, but because of their stateful nature, they must be placed in a known state to begin a new series of matching operations. This is done by calling the reset() method, which prepares the object for pattern matching at the beginning of the CharSequence associated with the matcher. The no-argument version of reset() will reuse the last CharSequence given to the Matcher. If you want to perform matching against a new sequence of characters, pass a new CharSequence to reset(), and subsequent matching will be done against that target. For example, as you read each line of a file, you could pass it to reset(). See Example 5-4. Example 5-4. Simple file grep package com.ronsoft.books.nio.regex; import java.util.regex.Pattern; import java.util.regex.Matcher; import java.io.FileReader; import java.io.BufferedReader; import java.io.IOException; /** * Simple implementation of the ubiquitous grep command. * First argument is the regular expression to search for (remember to * quote and/or escape as appropriate). All following arguments are * filenames to read and search for the regular expression. * * @author Ron Hitchens (ron@ronsoft.com) */ public class SimpleGrep{ public static void main (String [] argv) throws Exception { if (argv.length < 2) { System.out.println ("Usage: regex file [ ... ]"); return; } Pattern pattern = Pattern.compile (argv [0]); Matcher matcher = pattern.matcher (""); for (int i = 1; i < argv.length; i++) { String file = argv [i]; 174
Note: If you are looking for top 10 and very good webhost to host and run your jsp application check Actions jsp hosting services

an initial input target when creating a Matcher,

Friday, November 3rd, 2006

*/ private static void generateTable (String input, Pattern [] patterns, int [] limits) { System.out.println (”“); System.out.println (”

“); System.out.println (”t“); System.out.println (”ttInput: ” + input + ““); for (int i = 0; i < patterns.length; i++) { Pattern pattern = patterns [i]; System.out.println ("ttRegex: ” + pattern.pattern() + ““); } System.out.println (”t“); for (int i = 0; i < limits.length; i++) { int limit = limits [i]; System.out.println ("t“); System.out.println (”ttLimit: ” + limit + ““); for (int j = 0; j < patterns.length; j++) { Pattern pattern = patterns [j]; String [] tokens = pattern.split (input, limit); System.out.print ("tt“); for (int k = 0; k < tokens.length; k++) { System.out.print ("” + tokens [k] + ““); } System.out.println (”“); } System.out.println (”t“); } System.out.println (”

“); } /** * If command line args were given, compile all args after the * first as a Pattern. Return an array of Pattern objects. */ private static Pattern [] collectPatterns (String [] argv) { List list = new LinkedList(); for (int i = 1; i < argv.length; i++) { list.add (Pattern.compile (argv [i])); } 171
Note: If you are looking for high quality webhost to host and run your jsp application check Vision jsp hosting services

an initial input target when creating a Matcher,

Friday, November 3rd, 2006

Pattern [] patterns = new Pattern [list.size()]; list.toArray (patterns); return (patterns); } } Example 5-2 outputs an XML document describing the result matrix. The XSL stylesheet in Example 5-3 converts the XML to HTML for display in a web browser. Example 5-3. Split matrix styelsheet


172
Note: If you are looking for high quality webhost to host and run your jsp application check Vision jsp hosting services

an initial input target when creating a Matcher,

Friday, November 3rd, 2006

an initial input target when creating a Matcher, but different input can be provided later (discussed in Section 5.2.3). 5.2.2.1 Splitting strings with the Pattern class Example 5-2 generates a matrix of the result of splitting the same input string with several different regular expression patterns and limit values. Example 5-2. Splitting strings with Pattern package com.ronsoft.books.nio.regex; import java.util.regex.Pattern; import java.util.List; import java.util.LinkedList; /** * Demonstrate behavior of splitting strings. The XML output created * here can be styled into HTML or some other useful form. * See poodle.xsl. * * @author Ron Hitchens (ron@ronsoft.com) */ public class Poodle{ /** * Generate a matrix table of how Pattern.split() behaves with * various regex patterns and limit values. */ public static void main (String [] argv) throws Exception { String input = “poodle zoo”; Pattern space = Pattern.compile (” “); Pattern d = Pattern.compile (”d”); Pattern o = Pattern.compile (”o”); Pattern [] patterns = { space, d, o }; int limits [] = { 1, 2, 5, -2, 0 }; // Use supplied args, if any. Assume that args are good. // Usage: input pattern [pattern …] // Don’t forget to quote the args. if (argv.length != 0) { input = argv [0]; patterns = collectPatterns (argv); } generateTable (input, patterns, limits); } /** * Output a simple XML document with the results of applying * the list of regex patterns to the input with each of the * limit values provided. I should probably use the JAX APIs * to do this, but I want to keep the code simple. 170
Note: If you are looking for high quality webhost to host and run your jsp application check Vision jsp hosting services

a regular expression String argument. The returned Pattern

Friday, November 3rd, 2006

public Matcher matcher (CharSequence input) } The next two methods of the Pattern class API return information about the encapsulated expression. The pattern() method returns the String used to initially create the Pattern instance (the string passed to compile() when the object was created). The next, flags(), returns the flag bit mask provided when the pattern was compiled. If the Pattern object was created by the no-argument version of compile(), flags() will return 0. The returned value reflects only the explicit flag values provided to compile(); it does not include the equivalent of any flags set by embedded expressions within the regular expression pattern, as listed in the second column of Table 5-1. The instance method split() is a convenience that tokenizes a character sequence using the pattern as delimiter. This is reminiscent of the StringTokenizer class but is more powerful because the delimiter can be a multicharacter sequence matched by the regular expression. Also, the split() method is stateless, returning an array of string tokens rather than requiring multiple invocations to iterate through them: Pattern spacePat = Pattern.compile (”\s+”); String [] tokens = spacePat.split (input); Invoking split() with only one argument is equivalent to invoking the two-argument version with zero as the second argument. The second argument for split() denotes a limit on the number of times the input sequence will be split by the regular expression. The meaning of the limit argument is overloaded. Nonpositive values have special meanings. If the limit value provided for split() is negative (any negative number), the character sequence will be split indefinitely until the input is exhausted. The returned array could have any length. If the limit is given as zero, the input will be split indefinitely, but trailing empty strings will not be included in the result array. If the limit is positive, it sets the maximum size of the returned String array. For a limit value of n, the regular expression will be applied at most n-1 times. These combinations are summarized in Table 5-2, and the code that generated the table is listed in Example 5-2. Table 5-2. Matrix of split() behavior Input: poodle zoo Regex = ” ” Regex = “d” Regex=”o” Limit = 1 “poodle zoo” “poodle zoo” “poodle zoo” Limit = 2 “poodle”, “zoo” “poo”, “le zoo” “p”, “odle zoo” Limit = 5 “poodle”, “zoo” “poo”, “le”, “oo” “p”, , “dle z”, , Limit = -2 “poodle”, “zoo” “poo”, “le”, “oo” “p”, , “dle z”, , Limit = 0 “poodle”, “zoo” “poo”, “le”, “oo” “p”, , “dle z” Finally, matcher() is a factory method that creates a Matcher object for the compiled pattern. A matcher is a stateful matching engine that knows how to match a pattern (the Pattern object it was created from) against a target character sequence. You must provide 169
Note: If you are looking for top 10 and very good webhost to host and run your jsp application check Actions jsp hosting services

a regular expression String argument. The returned Pattern

Friday, November 3rd, 2006

a regular expression String argument. The returned Pattern object contains that regular expression translated to a compiled internal form. The compile() factory methods may throw the java.util.regex.PatternSyntaxException if the regular expression you provide is malformed. This is an unchecked exception, so if you’re not confident that the expression you’re using will work (because it’s a variable passed to you, for example), wrap the call to compile() in a try/catch block. The second form of compile() accepts a bit mask of flags that affect the default compilation of the regular expression. These flags enable optional behaviors of the compiled pattern, such as how line boundaries are handled or case insensitivity. Each of these flags (except CANON_EQ) can also be enabled by an embedded sub-expression within the expression itself. Flags can be combined in a boolean OR expression, like this: Pattern pattern = Pattern.compile (”[A-Z][a-zA-Z]*”, Pattern.CASE_INSENSITIVE | Pattern.UNIX_LINES); All flags are off by default. The meaning of each compile-time option is summarized in Table 5-1. Table 5-1. Flag values affecting regular expression compilation Flag name Embedded expression Description UNIX_LINES (?d) Enables Unix lines mode. In this mode, only the newline character (n) is recognized as the line terminator. This affects the behavior of ., ^, and $. If this flag is not set (the default), then all of the following are considered to be line terminators: n, r, rn, u0085 (next line), u2028 (line separator), and u2029 (paragraph separator). Unix line mode can also be specified by the embedded expression (?d). CASE_INSENSITIVE (?i) Enables case-insensitive pattern matching and may incur a small performance penalty. Use of this flag presupposes that only characters from the US-ASCII character set are being matched. If you’re working with character sets of other languages, specify the UNICODE_CASE flag as well to enable Unicode-aware case folding. UNICODE_CASE (?iu) Unicode-aware, case-folding mode. When used in conjunction with the CASE_INSENSITIVE flag, case-insensitive character matching is done in accordance with the Unicode standard. This ensures that upper- and lowercase characters are treated equally in all the languages encoded by the Unicode charset. This option may incur a performance penalty. 167
Note: If you are looking for top 10 and very good webhost to host and run your jsp application check Actions jsp hosting services