For more examples
please refer to Regular Expression Examples In TCL.
What are Regular Expressions?
A regular expression, or RE,
describes strings of characters (words or phrases or any arbitrary text). It's
a pattern that matches certain strings and doesn't match others. For example,
you could write an RE to tell you if a string contains a URL (World Wide Web
Uniform Resource Locator, such as http://somehost/somefile.html).
Regular expressions can be either broad and general or focused and precise.
A regular expression uses metacharacters (characters
that assume special meaning for matching other characters) such
as *, [], $ and.. For example, the
RE [Hh]ello!* would match Hello and hello and Hello! (and hello!!!!!).
The RE [Hh](ello|i)!* would match Hello and Hi and Hi! (and
so on). A backslash (\) disables the
special meaning of the following character, so you could match the string [Hello] with
the RE \[Hello\]
Regular Expressions:
Regular expressions can be expressed in just a few rules.
.
|
Match any single character (e.g.,
m.d matches mad, mod, m3d, etc.) |
[]
|
Bracket
expression: Match any one of the enclosed characters
(e.g.,
[a-z0-9_] matches a lowercase ASCII letter, a digit,
or an underscore) |
^
|
Start-of-string
anchor: Match only at the start of a string
(e.g.,
^hi matches hi and his but not this) |
$
|
End-of-string
anchor: Match only at the end of a string (e.g.,
hi$ matches hi and chi but not this) |
*
|
Zero-or-more quantifier:
makes the previous part of the RE match zero or more times (e.g.,
M.*D matches MD, MAD, MooD, M.D,
etc.) |
?
|
Zero-or-one
quantifier: makes the previous part of the RE match
zero or one time (e.g.,
hi!? matches hi or hi!) |
+
|
One-or-more
quantifier: makes the previous part of the RE match
one or more times (e.g.,
hi!+ matches hi! or hi!! or hi!!! or
...) |
|
|
Alternation (vertical
bar): Match just one alternative (e.g.,
this|that matches this or that) |
()
|
Sub pattern:
Group part of the RE. Many uses, such as:
|
\
|
Escape:
Disables meaning of the following metacharacter (e.g.,
a\.* matches a or a. or a.. or
etc.). Note that \ also
has special meaning to the Tcl interpreter (and to applications, such as C
compilers)
Eg: Set TestingDuts
1/2
[regexp {\/}
$TestingDuts] }
We want to match if the there is a / or not in the above string
[1/2]
Since / has a
different meaning so we need to add \ to remove the meaning of / in match. If
want to match \n then we have to give /\n to match \n
NOTE: regexp
{([^\/]+)/(.*)} $port -- devNum port1
In the above regular expression --is used in the case if we
don’t want to match the entire string. First () match will store devNum and
second () match will store the second match.
In regular expression parsing, the * symbol matches zero or
more occurrences of the character immediately preceding the *. For example a*
would match a, aaaaa, or a blank string. If the character directly before the *
is a set of characters within square brackets, then the * will match any
quantity of all of these characters. For example, [a-c]* would match aa, abc,
aabcabc, or again, an empty string.
The + symbol behaves roughly the same as the *, except that
it requires at least one character to match. For example, [a-c]+ would match a,
abc, or aabcabc, but not an empty string.
Regular expression parsing also includes a method of
selecting any character not in a set. If the first character after the [ is a
caret (^), then the regular expression parser will match any character not in
the set of characters between the square brackets. A caret can be included in
the set of characters to match (or not) by placing it in any position other than
the first.
The regexp command is similar to the string match command in
that it matches an exp against a string. It is different in that it can match a
portion of a string, instead of the entire string, and will place the
characters matched into the matchVar variable.
If a match is found to the portion of a regular expression
enclosed within parentheses, regexp will copy the subset of matching characters
is to the subSpec argument. This can be used to parse simple strings.
Regsub will copy the contents of the string to a
new variable, substituting the characters that match exp with the characters in
subSpec. If subSpec contains a & or \0, then those characters will be
replaced by the characters that matched exp. If the number following a
backslash is 1-9, then that backslash sequence will be replaced by the
appropriate portion of exp that is enclosed within parentheses
Note that the exp argument to regexp or regsub is processed
by the Tcl substitution pass. Therefore quite often the expression is enclosed
in braces to prevent any special processing by Tcl.
Simple Examples:
All Examples tested on TCL 8.4
========================================
EXAMPLE 1.
set sample
"Where there is a will, There is a way."
set result [regexp {[a-z]+} $sample match]
puts $match ---prints here as output
puts
$result ---prints 1 as output
In the above regular expression here is matched and stored in
match string.
If we want to match here there is a will in the above
string the regular expression will
be as below:
set
result [regexp {[a-z ]+} $sample match]
--prints here there is a will
stored
in match[space added]
To match here there is
a will, in the above string the
regular expression will be as below:
set result [regexp {[a-z , ]+ } $sample
match][comma added]
To match Where there is a will, There is a way.
the regular expression will be as below:
set result [regexp {[A-Za-z ,\. ]+} $sample
match]
To match “Where there”
and store “where” and “there” as separate substrings:
set
result [regexp {([A-Za-z]+) +([a-z]+)} $sample match sub1 sub2 ]
puts $match
--- Where there
puts $sub1
---- Where
puts $sub2
----there
In the Above regular expression match will
have complete match i.e Where there
And the match between first () will match to
Where and store in sub1 and second ()
match will match there and store in sub2
NOTE:
If we don’t want to store the complete match in variable match we can use “--”
Command which only save first and second match in sub1 and sub 2.
Below regular expression does the same:
set result [regexp {([A-Za-z]+) +([a-z]+)}
$sample -- sub1 sub2 ]
puts $sub1 ---- Where
puts $sub2 ----there
To match “here there is a will, There is a way” and to match
“here there is a will” and
“There is a way” and store it in sub1 and sub2 respectively.
set result [regexp {([a-z ]+), +([A-Za-z ]+)}
$sample match sub1 sub2]
puts $match : here there is a will, There is a way
puts $sub1 : here there is a will
puts $sub2 : There is a way
EXAMPLE: 2
set out "Tcl Tutorial"
regexp
{([A-Z,a-z]*).([A-Z,a-z]*)} $out a b c
puts "Full Match: $a"
puts "Sub Match1: $b"
puts "Sub Match2: $c"
Output:
Full Match: Tcl Tutorial
Sub Match1: Tcl
Sub Match2: Tutorial
set out "Tcl Tutorial"
regexp
{([A-Z,a-z]*.([A-Z,a-z]*))} $out a b c
puts "Full Match: $a"
puts "Sub Match1: $b"
puts "Sub Match2: $c"
Output:
Full Match: Tcl Tutorial
Sub Match1: Tcl Tutorial
Sub Match2: Tutorial
Switches for Regex Command
The list of switches available in Tcl are,
nocase − Used to ignore
case.
indices − Store location of matched sub patterns instead of matched
characters.
line − New line sensitive matching. Ignores the characters after
newline.
start index − Sets the offset of start of search pattern.
In the above examples, I have deliberately used [A-Z,
a-z] for all alphabets, you can easily use -nocase instead of as shown below:
set out "Tcl Tutorial"
regexp -nocase
{([A-Z]*.([A-Z]*))} $out a b c
puts "Full Match: $a"
puts "Sub Match1: $b"
puts "Sub Match2: $c"
Output:
Full Match: Tcl Tutorial
Sub Match1: Tcl Tutorial
Sub Match2: Tutorial
regexp -nocase
-line -- {([A-Z]*.([A-Z]*))} "Tcl \nTutorial" a b
puts "Full Match: $a"
puts "Sub Match1: $b"
regexp -nocase -start 4 -line -- {([A-Z]*.([A-Z]*))}
"Tcl \nTutorial" a b
puts "Full Match: $a"
puts "Sub Match1: $b"
Output:
Full Match: Tcl
Sub Match1: Tcl
Full Match: Tutorial
Sub Match1: Tutorial
REGSUB:
Syntax: regsub exp
string subSpec var
Searches string for substring that match the regular
expression exp and replaces them with subSpec.
The resulting string is copied into var
Eg: 1
set sample
"Where there is a will, There is a way."
regsub "way" $sample
"lawsuit" sample2
puts $sample
: Where there is a will, There is a
lawsuit.
The above regular expression replaces the
string “way” to “lawsuit” in stores it in sample.
Eg: 2
set sample
"eer dfgdfgf trt dfsdf sfdsf ree"
regsub -all { +} $sample " " var
puts
$sample : eer dfgdfgf trt dfsdf sfdsf
ree - Removes tab and inserts spaces.
?: Command Usage
Usage:?: is used
in sub patterns in a regexp
Whenever you don’t want a particular subpattern to be
included as a sub-pattern use “?:” in front of the sub-pattern
Example:
set string "Names: Manish Ajay Aman"
regexp "Names: (Manish|Ajay) (?:Aman|Raj|Ajay) (Aman|Raj)"
$string match sub1 sub2 sub3
puts "$match\n$sub1\n$sub2\n$sub3\n"
For the above
example, the output will be:
Names: Manish Ajay Aman
Manish
Aman
The Above regular expression will escape the condition
followed by ?: so match will have full match
And sub1:Manish sub2:Aman and Sub3: is null the second
condition (?:Aman|Raj|Ajay) is escaped
So here sub 3 acts as a dummy variable.
|
No comments:
Post a Comment
Note: only a member of this blog may post a comment.