正则表达式

Java 中的正则表达式

正则表达式定义了字符串的模式，正则表达式可以用来搜索、编辑或处理文本；正则表达式并不仅限于某一种语言，但是在每种语言中有细微的差别。

一个字符串其实就是一个简单的正则表达式，例如 Hello World 正则表达式匹配 “Hello World” 字符串。.（点号）也是一个正则表达式，它匹配任何一个字符如：“a"或"1”。下表列出了一些正则表达式的实例及描述：

正则表达式	描述
this is text	匹配字符串 “this is text”
this\s+is\s+text	注意字符串中的 \s+。匹配单词 “this” 后面的 \s+ 可以匹配多个空格，之后匹配 is 字符串，再之后 \s+ 匹配多个空格然后再跟上 text 字符串。可以匹配这个实例：this is text
^\d+(.\d+)?	^ 定义了以什么开始\d+ 匹配一个或多个数字? 设置括号内的选项是可选的. 匹配 “.“可以匹配的实例：“5”, “1.5” 和 “2.21”

Java 正则表达式和 Perl 的是最为相似的，java.util.regex 包主要包括以下三个类：

Pattern 类：pattern 对象是一个正则表达式的编译表示。Pattern 类没有公共构造方法。要创建一个 Pattern 对象，你必须首先调用其公共静态编译方法，它返回一个 Pattern 对象。该方法接受一个正则表达式作为它的第一个参数。
Matcher 类：Matcher 对象是对输入字符串进行解释和匹配操作的引擎。与 Pattern 类一样，Matcher 也没有公共构造方法。你需要调用 Pattern 对象的 matcher 方法来获得一个 Matcher 对象。
PatternSyntaxException：PatternSyntaxException 是一个非强制异常类，它表示一个正则表达式模式中的语法错误。

基础使用

正则表达式的基础用法如下：

String content = "I am test " +
"from test.com.";

String pattern = ".*test.*";

boolean isMatch = Pattern.matches(pattern, content);
System.out.println("字符串中是否包含了 'test' 子字符串? " + isMatch);

捕获组

捕获组是把多个字符当一个单独单元进行处理的方法，它通过对括号内的字符分组来创建。例如，正则表达式 (dog) 创建了单一分组，组里包含"d”，“o”，和"g”。捕获组是通过从左至右计算其开括号来编号。例如，在表达式 ((A)(B(C)))，有四个这样的组：

((A)(B(C)))
(A)
(B(C))
(C)

可以通过调用 matcher 对象的 groupCount 方法来查看表达式有多少个分组。groupCount 方法返回一个 int 值，表示 matcher 对象当前有多个捕获组。还有一个特殊的组（group(0)），它总是代表整个表达式。该组不包括在 groupCount 的返回值中。

// 按指定模式在字符串查找
String line = "This order was placed for QT3000! OK?";
String pattern = "(\\D*)(\\d+)(.*)";

// 创建 Pattern 对象
Pattern r = Pattern.compile(pattern);

// 现在创建 matcher 对象
Matcher m = r.matcher(line);
if (m.find( )) {
    System.out.println("Found value: " + m.group(0) );
    System.out.println("Found value: " + m.group(1) );
    System.out.println("Found value: " + m.group(2) );
    System.out.println("Found value: " + m.group(3) );
} else {
    System.out.println("NO MATCH");
}

根据 Java Language Specification 的要求，Java 源代码的字符串中的反斜线被解释为 Unicode 转义或其他字符转义。因此必须在字符串字面值中使用两个反斜线，表示正则表达式受到保护，不被 Java 字节码编译器解释。例如，当解释为正则表达式时，字符串字面值 “\b” 与单个退格字符匹配，而 “\b” 与单词边界匹配。字符串字面值 “(hello)” 是非法的，将导致编译时错误；要与字符串 (hello) 匹配，必须使用字符串字面值 “\(hello\)"。

Matcher

索引方法

索引方法提供了有用的索引值，精确表明输入字符串中在哪能找到匹配：

序号	方法及说明
1	public int start() 返回以前匹配的初始索引
2	public int start(int group) 返回在以前的匹配操作期间，由给定组所捕获的子序列的初始索引
3	public int end() 返回最后匹配字符之后的偏移量
4	public int end(int group) 返回在以前的匹配操作期间，由给定组所捕获子序列的最后字符之后的偏移量

// 对单词 "cat" 出现在输入字符串中出现次数进行计数的例子
public class RegexMatches
{
    private static final String REGEX = "\\bcat\\b";
    private static final String INPUT =
                                    "cat cat cat cattie cat";

    public static void main( String args[] ){
       Pattern p = Pattern.compile(REGEX);
       Matcher m = p.matcher(INPUT); // 获取 matcher 对象
       int count = 0;

       while(m.find()) {
         count++;
         System.out.println("Match number "+count);
         System.out.println("start(): "+m.start());
         System.out.println("end(): "+m.end());
      }
   }
}

/**
Match number 1
start(): 0
end(): 3
Match number 2
start(): 4
end(): 7
Match number 3
start(): 8
end(): 11
Match number 4
start(): 19
end(): 22
**/

可以看到这个例子是使用单词边界，以确保字母 “c” “a” “t” 并非仅是一个较长的词的子串。它也提供了一些关于输入字符串中匹配发生位置的有用信息。start 方法返回在以前的匹配操作期间，由给定组所捕获的子序列的初始索引，end 方法最后一个匹配字符的索引加 1。

研究方法

研究方法用来检查输入字符串并返回一个布尔值，表示是否找到该模式：

序号	方法及说明
1	public boolean lookingAt() 尝试将从区域开头开始的输入序列与该模式匹配
2	public boolean find() 尝试查找与该模式匹配的输入序列的下一个子序列
3	public boolean find(int start）重置此匹配器，然后尝试查找匹配该模式、从指定索引开始的输入序列的下一个子序列
4	public boolean matches() 尝试将整个区域与模式匹配

matches 和 lookingAt 方法都用来尝试匹配一个输入序列模式。它们的不同是 matches 要求整个序列都匹配，而 lookingAt 不要求。lookingAt 方法虽然不需要整句都匹配，但是需要从第一个字符开始匹配。这两个方法经常在输入字符串的开始使用。

public class RegexMatches
{
    private static final String REGEX = "foo";
    private static final String INPUT = "fooooooooooooooooo";
    private static final String INPUT2 = "ooooofoooooooooooo";
    private static Pattern pattern;
    private static Matcher matcher;
    private static Matcher matcher2;

    public static void main( String args[] ){
       pattern = Pattern.compile(REGEX);
       matcher = pattern.matcher(INPUT);
       matcher2 = pattern.matcher(INPUT2);

       System.out.println("Current REGEX is: "+REGEX);
       System.out.println("Current INPUT is: "+INPUT);
       System.out.println("Current INPUT2 is: "+INPUT2);


       System.out.println("lookingAt(): "+matcher.lookingAt());
       System.out.println("matches(): "+matcher.matches());
       System.out.println("lookingAt(): "+matcher2.lookingAt());
   }
}

/**
Current REGEX is: foo
Current INPUT is: fooooooooooooooooo
Current INPUT2 is: ooooofoooooooooooo
lookingAt(): true
matches(): false
lookingAt(): false
**/

替换方法

替换方法是替换输入字符串里文本的方法：

序号	方法及说明
1	public Matcher appendReplacement(StringBuffer sb, String replacement) 实现非终端添加和替换步骤
2	public StringBuffer appendTail(StringBuffer sb) 实现终端添加和替换步骤
3	public String replaceAll(String replacement) 替换模式与给定替换字符串相匹配的输入序列的每个子序列
4	public String replaceFirst(String replacement) 替换模式与给定替换字符串匹配的输入序列的第一个子序列
5	public static String quoteReplacement(String s) 返回指定字符串的字面替换字符串。这个方法返回一个字符串，就像传递给 Matcher 类的 appendReplacement 方法一个字面字符串一样工作

replaceFirst 和 replaceAll 方法用来替换匹配正则表达式的文本。不同的是，replaceFirst 替换首次匹配，replaceAll 替换所有匹配。下面的例子来解释这个功能：

public class RegexMatches
{
    private static String REGEX = "dog";
    private static String INPUT = "The dog says meow. " +
                                    "All dogs say meow.";
    private static String REPLACE = "cat";

    public static void main(String[] args) {
       Pattern p = Pattern.compile(REGEX);
       // get a matcher object
       Matcher m = p.matcher(INPUT);
       INPUT = m.replaceAll(REPLACE);
       System.out.println(INPUT);
   }
}

// The cat says meow. All cats say meow.

Matcher 类也提供了 appendReplacement 和 appendTail 方法用于文本替换：

public class RegexMatches
{
   private static String REGEX = "a*b";
   private static String INPUT = "aabfooaabfooabfoobkkk";
   private static String REPLACE = "-";
   public static void main(String[] args) {
      Pattern p = Pattern.compile(REGEX);
      // 获取 matcher 对象
      Matcher m = p.matcher(INPUT);
      StringBuffer sb = new StringBuffer();
      while(m.find()){
         m.appendReplacement(sb,REPLACE);
      }
      m.appendTail(sb);
      System.out.println(sb.toString());
   }
}
// -foo-foo-foo-kkk

最近更新于 0001-01-01