C/C++ Regular Expressions - Regex/Regexec

Aim
The aim of this example is to demonstrate the regex functions available in C/C++. This example is not using Boost. This code snippet shows how you can use regcomp and regexec to match a string with grouping and backreferences. Please use this as an introduction into regular expressions for C/C++ (For a more sophisticated function that does replacement aswell pleas see C/C++ regexreplace).

 1. #include <stdio.h>
 2. #include <string.h>
 3. #include <regex.h>
 4. 
 5. int main(int vv, char** c){
 6. 
 7.     regex_t pregx;
 8.     regmatch_t pmatch[10];
 9.     int i;
10.     const char* pattern = "\\([0-9][0-9]*\\)_\\([0-9][0-9]*\\)";
11.     char tomatch[256] = "This is hello_100_200_";
12.     char tmpvalue[256];
13. 
14.     if(regcomp(&pregx, pattern, 0)!=0){
15.         printf("Pattern failed to compile\n");
16.         return -1;
17.     }
18.     if(regexec(&pregx, tomatch, 10, pmatch,0)!=0){
19.         printf("Could not match pattern %s  %s\n",pattern, tomatch);
20.         return -1;
21.     }
22. 
23.     for(i = 0 ; i < 10 && pmatch[i].rm_so!=-1 ; i++){
24.         memset(tmpvalue, 0, 256);
25.         strncpy(tmpvalue, &tomatch[pmatch[i].rm_so], pmatch[i].rm_eo-pmatch[i].rm_so);
26.         printf("matched pmatch[%d].rm_so: %d    pmatch[%d].rm_eo: %d  -   \"%s\"\n",
27.                 i, pmatch[i].rm_so, i, pmatch[i].rm_eo, tmpvalue);    
28.     }
29.     return 0;
30. }
Hide line numbers

One of the drawbacks of regexec is that it doesn't match multiple times. For example if my string was '
This is hello_100_200_ and 700_600' it would only match the 100_200.


..workspace\>gcc -c reg.c
..workspace\>./a.out
matched pmatch[0].rm_so: 14 pmatch[0].rm_eo: 21 - "100_200"
matched pmatch[1].rm_so: 14 pmatch[1].rm_eo: 17 - "100"
matched pmatch[2].rm_so: 18 pmatch[2].rm_eo: 21 - "200"
As you can see \\0 matches 100_200