Strings are not limited to just displaying text.
They are used to load assets or store data. You can use strings to make decisions and run commands. Oftentimes you want to take text input from players or some other source during runtime to configure a game.
This opens up a world of possibilities for scripting but the proper tools are also needed for working with them.
Since strings can often also contain irrelevant or extraneous data, pattern matching is the process of finding and extracting the data that is of importance to the task at hand.
For instance: “The treasure chest contained 5 gold pieces.”
The extraneous content structures the sentence for human readability only. But the relevant data is the fragment ‘5 gold pieces.’
string.find & string.match
The string.find function looks for the matching pattern in a string and returns the index for the first match, or nil if there is not a match.
local text = "Hello world!"
local targetString = "world"
local indexStart, indexEnd = string.find(text, targetString)
print(indexStart, indexEnd) --7 11
print(string.sub(text, indexStart, indexEnd)) --world
This function is useful for quickly checking if a string contains a word. The script looks for the target string ‘world’ and then returns the position if there is a match.
local string1 = "apple, orange, watermelon"
local string2 = "cat, dog, horse"
local string3 = "red, green, blue"
if string.find(string1, 'green') then
print(string1)
end
if string.find(string2, 'green') then
print(string2)
end
if string.find(string3, 'green') then
print(string3)
end
-- red, green, blue
There’s only one limitation with using pattern matching in this manner.
It requires you already know what the target string is ahead of time, which is oft the very thing we are looking for.
Fortunately, pattern matching provides more capability than just simple word matching.
Notice that this is called ‘pattern matching’ and not ‘string matching.’
Rather than searching for hard-coded strings, pattern matching is the search for sequences of characters that contain a pattern signature.
Because data embedded in strings are usually formatted in some consistent manner, the ability to perform a generic pattern search makes the find, match, gsub, and gmatch functions very powerful.
For instance:
local stringsList = {
'http://www.roblox.com/asset/?id=12222208',
'3.14159',
'rbxassetid://16647579',
'12345678',
'lorem ipsum'}
Say we had this list of strings and wanted to find the ones containing an asset ID and then extract the ID.
We already know that assets contain the signature ‘rbxassetid://{number}‘ or ‘http://www.roblox.com/asset/?id={number}.’
Pattern matching can be used to look for the signature and then extract the asset ID.
Similar to the date format specifier we saw earlier, pattern matching uses one or more combination of characters to describe the pattern signature to look for.
The most basic component of pattern matching is the character class.
Referring to the API, we see that a period represents any character. This means that when used in a search, the period will match any character. Sometimes, this is also referred to as a ‘wildcard’ character.
local text = "Hello world!"
local pattern = "."
local resultFound = string.find(text, pattern)
local resultMatched = string.match(text, pattern)
print(resultFound, resultMatched) --1 H
Because ‘H’ is the first character and the wildcard matches any character, it counts as a match and the result is returned.
The character class by itself can only match a single character so stringing a series of them together will yield a longer match.
local text = "Hello world!"
local pattern = "........"
local result = string.match(text, pattern)
print(result) --Hello wo
Still, this is quite limiting given that we might not know the length of the string ahead of time.
What we can do is use a modifier to tell the search to automatically match several characters. The ‘+’ modifier will make the previous token match as many characters as possible.
local text = "Hello world!"
local pattern = ".+"
local result = string.match(text, pattern)
print(result) --Hello world!
Because the wildcard matches all characters, the entire string gets returned which sort of defeats the purpose of pattern matching in the first place.
We need to use a class that will selectively match a character or pattern and omit the rest.
Since each word is separated by a space, let’s look for a character class that matches letters but does not match spaces.
The ‘%l’ and ‘%u’ classes match letters but only lowercase and upper case letters respectively. Since ‘Hello’ contains both upper and lowercase letters, these classes will fail to match the word.
The ‘%a’ class will match any letter, upper or lower, making it the one we need. The ‘%w’ class also matches upper and lower case letters in addition to numbers.
Either will work for our target string. Replacing the wildcard with the ‘%w’ token now matches the entirety of the first word.
local text = "Hello world!"
local pattern = "%w+"
local result = string.match(text, pattern)
print(result) --Hello
Because space is not an alphanumeric character, the sequence matches up until that position.
But what about matching the second word? What if we wanted to get the second word instead?
Like the software development, there isn’t one correct way to do it. One way you might go about it is to match the ‘w’ character and then proceed from there.
local text = "Hello world!"
local pattern = "w%w+"
local result = string.match(text, pattern)
print(result) -- world
While this looks cryptic, the pattern just looks for a literal ‘w’ and then matches any proceeding ‘%w’ characters.
Okay, what if we don’t know that the second word begins with a ‘w.’
A different approach might be to match the space right before the ‘w’ and then use the alphanumeric class from above.
local text = "Hello world!"
local pattern = "%s%w+"
local result = string.match(text, pattern)
print(result) -- world
This accomplishes almost the same thing but also happens to return the space and we might not want that. Sure we could remove it later, but why not just integrate that into the match sequence?
We can do this by creating a “capture group.”
Capture groups only return a subpart of a larger match and can be created by enclosing the desired pattern within a parenthesis.
local text = "Hello world!"
local pattern = "%s(%w+)"
local result = string.match(text, pattern)
print(result) --world
The match function can also return several groups from a string, provided you know the general signature of the target match.
local text = "The game will launch on Jan 15, 2024"
local pattern = "(%w+)%s(%d+),%s(%d+)"
local month, day, year = string.match(text, pattern)
print(month, day, year) --Jan 15 2024
This looks for a word, a space in between, then a number of any length followed by a comma and space, then finally another number of any length. Three groups are defined and are returned in that order.
When there are an arbitrary number of possible matches, use the gmatch function. This returns an iterator function that you can then use a loop to cycle though.
local text = "bead reed mead heard stead tread"
local pattern = "%w+ead"
local matches = string.gmatch(text, pattern)
for out in matches do
print(out)
end
-- bead
-- mead
-- stead
-- tread
The token looks for a word of any length that ends in the string literal ‘ead.’
Sometimes the default character classes don’t fit your requirements or maybe you need only a subset of a character class. You can specify your own using a set. This is done so by using brackets to define the set.
local numbersList = {
'0x3159f82f',
'0x46b3v90f',
'0x713c5094',
'0xtb686i64',
'0x962o42b0',
'0xca4f643e',
'0x8e51rd06',
'0xc9f65991'
}
local pattern = "0x[0-9a-f]+$"
for _, entry in ipairs(numbersList) do
local match = string.match(entry, pattern)
if match then
print(match)
end
end
-- 0x3159f82f
-- 0x713c5094
-- 0xca4f643e
-- 0xc9f65991
In the hexadecimal system, numbers go from 0-9 and then a-f. Meaning it uses the full number set but only a subset of letters.
Since there is not a default character class that matches only a-f, we will create our own.
To find hex numbers, we create a set containing all the numbers and the letters a-f.
The API tells us that the ‘$’ symbol anchors the pattern to the end of the string.
For the search, we look for the pattern ‘0x’ and then grab the characters up to the anchor. If all characters match the custom character set, then the value is a hexadecimal.
We’ve only scratched the surface of pattern matching in this brief introduction and is by no means exhaustive. For game development, it is rare that you will be required to write elaborate pattern matching sequences.
In more general field of programming, ‘regular expressions’ is the system used to define search for patterns in a string. And could also fill a course on its own.
Lua’s pattern matching system contains trivial differences in its syntax and utilizes only a subset of regular expressions. However, many of the same rules are still applicable.
If you move on to other fields of software development that require a lot of pattern matching or just want to improve your understanding, RegexOne is one of my favorite beginner resources for learning regular expressions.