I'll try to make a more interesting pazzle. This time, we test whether or not a string matches another string, say pattern.
In order to be useful, we import some characters with special meaning into patterns. The following are special characters.
[ ] range specification. (e.g., [a-z] means a letter
in range of from a to z)
\w letter or digit. same as [0-9A-Za-z_]
\W neither letter nor digit
\s blank character. same as [ \t\n\r\f]
\S non-space character.
\d digit character. same as [0-9].
\D non digit character.
\b word boundary (outside of range specification).
\B non word boundary.
\b back spage (0x08) (inside of range specification)
* zero or more times repetition of followed expression
+ zero or one times repetition of followed expression
{m,n} at least n times, but not more than m timesrepetition
of followed expression
? at least 0 times, but not more than 1 timesrepetition
of followed expression
| eather followed or leaded expression
( ) grouping
For example, `^f[a-z]+' means "repetition of letters in range from `a' to `z' which is leaded by `f'" Special matching characters like these are called `reguler expression'. Regular expressions are useful for string finding, so it is used very often in UNIX environment. A typical example is `grep'.
To understand regular expressions, let's make a little
program. Store the following program into a file named
`regx.rb' and then execute it.
Note: This program works only on UNIX because this uses
reverse video escape sequences.
st = "\033[7m"
en = "\033[m"
while TRUE
print "str> "
STDOUT.flush
str = gets
break if not str
str.chop!
print "pat> "
STDOUT.flush
re = gets
break if not re
re.chop!
str.gsub! re, "#{st}\\&#{en}"
print str, "\n"
end
print "\n"
This program requires input twice and reports matching in first input string to second input regular expression by reverse video displaying. Don't mind details now, they will be explained.
str> foobar pat> ^fo+ foobar ~~~
# foo is reversed and ``~~~'' is just for text-base brousers.
Let's try several inputs.
str> abc012dbcd555
pat> \d
abc012dbcd555
~~~ ~~~
This program detect multiple muchings.
str> foozboozer pat> f.*z foozboozer ~~~~~~~~
`fooz' isn't matched but foozbooz is, since a regular expression maches the longest substring.
This is too diffucult of a pattern to recognize at a glance.
str> Wed Feb 7 08:58:04 JST 1996
pat> [0-9]+:[0-9]+(:[0-9]+)?
Wed Feb 7 08:58:04 JST 1996
~~~~~~~~
In ruby, a regular expression is quoted by `/'. Also, some methods convert a string into a regular expression automatically.
ruby> "abcdef" =~ /d/ 3 ruby> "abcdef" =~ "d" 3 ruby> "aaaaaa" =~ /d/ FALSE ruby> "aaaaaa" =~ "d" FALSE
`=~' is a matching operator with respected to regular expression; it returns the position when matched.