Wildcards - Computerphile

### Understanding Wildcards: A Deep Dive into How They Work

Wildcards are a fundamental concept in computing, particularly when dealing with command-line interfaces, file management, and pattern matching. While they are often associated with files and directories, their principles can be applied in various other contexts. This article delves into the intricacies of wildcards, focusing on how they operate, their different types, and practical examples of their usage.

---

#### What Are Wildcards?

A wildcard is a character or symbol that represents one or more other characters in a string. The most common wildcards used in computing are:

1. **Star (*):** This matches any sequence of characters (including zero characters). For example, `*.docx` would match all files ending with the `.docx` extension.

2. **Question Mark (?):** This matches exactly one character. For instance, `m?m.dot` would match filenames like `mom.dot`, `mem.dot`, or `mmm.dot`.

These wildcards are not limited to file systems; they can also be used in programming, databases, and other areas where pattern matching is required.

---

#### How Wildcards Work

When you use a wildcard in a command-line interface (CLI), the system interprets it differently depending on whether you're using DOS/Windows or Unix-based systems. Here's how it works:

1. **File Matching Process:**

- The operating system iterates through each file in the directory.

- It compares each filename against the pattern provided by the user, character by character.

- If all characters match (except for wildcards), the file is included in the results.

2. **Example of Filename Matching:**

- Suppose you have files named `lecture01`, `lecture02`, `lab01`, and `notes` in a directory.

- If you use the pattern `lecture*`, the system will match all filenames starting with `lecture`.

- For each file, it compares characters until it finds a mismatch. If no mismatches are found before encountering a wildcard (like `*`), the file is included.

---

#### Differences Between DOS/Windows and Unix Wildcards

While both systems use wildcards like `*` and `?`, there are subtle differences in their implementation:

1. **DOS/Windows:**

- The `*` matches any sequence of characters, including none.

- The `?` matches exactly one character.

- DOS-based systems typically handle wildcard expansion within the command processor (e.g., `cmd.exe`).

2. **Unix/Linux:**

- Unix systems often use the shell to process wildcards. For example, Bash expands `*.docx` before passing it to commands like `ls`.

- Advanced patterns can be created using square brackets (e.g., `[a-z]` matches any lowercase letter from a to z).

---

#### Practical Examples of Wildcard Usage

1. **Finding Word Documents:**

- In DOS/Windows: Use `dir *.docx` to list all `.docx` files in the current directory.

- In Unix/Linux: Use `ls *.docx` or `find . -name "*.docx"` for a more comprehensive search.

2. **Matching Specific Patterns:**

- To find files starting with "pre" and ending with ".txt": Use `pre*.txt`.

- To find files containing exactly four characters after "lab": Use `lab???.txt`.

3. **Advanced Matching in Unix:**

- To match files between "a.txt" and "z.txt": Use `a-z.txt` (within square brackets).

- This would include files like `b.txt`, `c.txt`, etc., but not `A.txt` or `1.txt`.

---

#### Under the Hood: How Wildcards Are Processed

When you execute a command with wildcards, the operating system processes each file in the directory:

- **Step 1:** The system reads the list of files in the current directory.

- **Step 2:** For each filename, it compares it against the pattern provided by the user.

- **Step 3:** If the filename matches the pattern (considering wildcards), the file is added to the result set.

- **Step 4:** The results are then displayed or used as needed.

This process can be optimized in modern operating systems, but the underlying mechanism remains the same.

---

#### Conclusion

Wildcards are a powerful tool for efficiently managing files and performing searches. Understanding how they work allows users to save time by matching patterns rather than individual filenames. Whether you're working in DOS/Windows or Unix/Linux, mastering wildcards can significantly enhance your productivity at the command line.

"WEBVTTKind: captionsLanguage: enso wild cards i mean the classic example of this is your you wanted to list files and you say i want to find all the word documents on my system so if you're at a command line prompt on windows you would say something like dir star dot doc or docx the same sort of thing works in unix you can do ls star dot dot x and then the output you'll get there is a list of all the files that end with the extension dot doc x so what's going on here how does this actually work we generally talk about wild cards when we're thinking about files and things but similar things are used in other places so i'll talk about in files but the same approach can be used elsewhere so what a wild card is it's just the idea that certain characters rather than being compared literally against the actual file name or the directory name are interpreted as having a special meaning so the the obvious one is star which means anything we tend to think about it as anything so what we're saying here is show me all the files dir ls depending on using dos or linux all the files is anything dot docx so what's actually going on here well let's first of all think about the different wild cards we can get because there are various things the two main ones you come across are star which means match anything and you get question mark which means match any character so for example if we wanted to find all the letters to uh mothers then we might do a search for um question mark m dot dot x that would then find mum dot dot x and it would also find mom dot dot x it would also find mam dot dot x and it would find m three m dot dot x so it'll match any character instead of the question mark so it's basically saying at this point i'm not going to specify what's here you match any character that is there you could also combine this so you can search for m question mark question mark m dot dot x which of course won't match any of these because it has to have two characters there but would match for example dot dot x like that so it'd match something like that because it's two there and again this could be m14m dot dot x or if you're a hacker you might have doc x etc so what does this actually match what about star then so star means match anything and this one actually varies slightly between windows and unix or something between dos and unix so for example let's say we have pre and then we match that to star dot dot and so this would match prefix dot doc it would also match prepare dot dot et cetera so anything that begins with pre it'll match there you could do star ix dot doc which would then match suffix dot dot and of course unix dot dot because this is mean anything that we match here and then we will find ix dot dot the stars any run of characters from zero or more this would also match of course ix dot doc and this would have matched pre dot dot and things because it's basically zero more any characters that we like you can get slightly more advanced ones so unix allow you to do something let's have a look you can say a dash zed in square brackets file dot dot and this is getting unix would allow you to match a file dot doc and it would actually match t file dot dot but it wouldn't match b file dot dot because what we've said here is a match any characters between uppercase a and uppercase z so this is lowercase b so this wouldn't match whereas these would so that's basically what wildcards are how does it do that so how does it do that well this is can you imagine you've got i've got a list of files because it literally going through every file going yeah so if you think back we talked about if you watched the video on zero size files we talked about how our computer has a directory of all the files in it so let's just create a simple directory and let's say we have come on come on let's just say we have uh lecture zero one lecture zero two we have lab zero one you can tell what i've been doing this week lab zero two and we've got a file called notes in there and so these are all separate files and these are in that catalog for the current directory so let's say i wanted to match all the lectures well then i could use a wild card of lecture star to match all the lectures so how would that actually be matched against each catalog well that's let's first of all let's see how it would do it if we just specified an actual file name so what would happen if we wanted to search for lab01 because it's a bit further down so we want to search for the file lab zero one what actually happens well so it's gotta find whether that's actually your file in the directory or not and you have you have a series of api calls in the operating system that allow you to search through directories so what's going to happen well we're going to come to the first file in the directory so we've got lecture 0 1 and we've got lab everyone so we're looking for the file lab 1 and the computer is going to compare the two characters so it's going to compare l and l and they match and it's then going to compare e and a they don't match so we know that it doesn't match this one we then do the same for lecture two and because it begins the same way we get to the same point we then come to compare lab zero one and lab zero one so we're looking for five lab zero one we've come to catalog entry lab zero one so we compare the first two characters they match we compare the second two characters they match we compare the third two characters they match we compare the last but one characters and we compare the last two characters and they match and so we found the file it would actually then keep going and compare with lab zero two and they would not match on the last character and so it wouldn't match and the same for notes and then it would finish so what the first thing we do is we check every file in the catalog each time we do it and you can probably build up systems in place to make it faster whether they do that another is an implementation so we won't go into that so how do we handle the wild card well let's think about traditional window style where we just mean anything that follows will match with the star so let's say we're going to look for lecture again and we'll do the same thing so we're searching the catalogue so at lecture zero one and we're going to compare this with lecture star so we compare the first two they match compare the second two they match compare the third two they match and so on until we get to this point they've all matched if any of these didn't match then we'd stop because they don't match just like we did before there now in the windows dos world star means anything after here matches so as soon as we get to this point and it's a star here we say well okay it doesn't matter what that is we've matched so this is a big tick we can use this file do the same with lecture two and of course the same thing happens with the others when we get there it doesn't match and it carries on so it builds up a list of results or you have things that you can go through to iterate through them and so on you get the first and you get the next and the next and the next until you come to the end interesting point is where is this actually done in the system and that depends on the system using so if you're using a unix system this is often done by the shell so actually when the program gets called it actually gets called with each of the different files of different arguments passed into the unix executable to save it having to do it itself so of course you can see how this would then generalize with say the question mark when we come to the question well let's say we're searching for files match lecture zero and then some numbers so like zero four well let's say we use the pattern lecture zero question mark we compare each of these just as we did before they all match and when we get to this point we see that this is a question mark and we write the code that says oh question mark that matches with anything of course this wouldn't match with the file lecture file name lecture zero because we'd have the question mark here and this would be the end of string and so wouldn't there so we just have to write the string comparison algorithm to know about question marks and to know about stars and the other character codes that we can use to represent other things at first glance it looks plausible when you actually read into the comments they're very bizarre let's pick a couple and see what's all right so so this guy by now that might be a real person but it probably is just a completely made-up username i was no pretty puzzling the code gaming so they would have found out something tread the larger stop information that's a bit odd this guy hassell200 has replied i find your own difference profound certainly hahaso wild cards i mean the classic example of this is your you wanted to list files and you say i want to find all the word documents on my system so if you're at a command line prompt on windows you would say something like dir star dot doc or docx the same sort of thing works in unix you can do ls star dot dot x and then the output you'll get there is a list of all the files that end with the extension dot doc x so what's going on here how does this actually work we generally talk about wild cards when we're thinking about files and things but similar things are used in other places so i'll talk about in files but the same approach can be used elsewhere so what a wild card is it's just the idea that certain characters rather than being compared literally against the actual file name or the directory name are interpreted as having a special meaning so the the obvious one is star which means anything we tend to think about it as anything so what we're saying here is show me all the files dir ls depending on using dos or linux all the files is anything dot docx so what's actually going on here well let's first of all think about the different wild cards we can get because there are various things the two main ones you come across are star which means match anything and you get question mark which means match any character so for example if we wanted to find all the letters to uh mothers then we might do a search for um question mark m dot dot x that would then find mum dot dot x and it would also find mom dot dot x it would also find mam dot dot x and it would find m three m dot dot x so it'll match any character instead of the question mark so it's basically saying at this point i'm not going to specify what's here you match any character that is there you could also combine this so you can search for m question mark question mark m dot dot x which of course won't match any of these because it has to have two characters there but would match for example dot dot x like that so it'd match something like that because it's two there and again this could be m14m dot dot x or if you're a hacker you might have doc x etc so what does this actually match what about star then so star means match anything and this one actually varies slightly between windows and unix or something between dos and unix so for example let's say we have pre and then we match that to star dot dot and so this would match prefix dot doc it would also match prepare dot dot et cetera so anything that begins with pre it'll match there you could do star ix dot doc which would then match suffix dot dot and of course unix dot dot because this is mean anything that we match here and then we will find ix dot dot the stars any run of characters from zero or more this would also match of course ix dot doc and this would have matched pre dot dot and things because it's basically zero more any characters that we like you can get slightly more advanced ones so unix allow you to do something let's have a look you can say a dash zed in square brackets file dot dot and this is getting unix would allow you to match a file dot doc and it would actually match t file dot dot but it wouldn't match b file dot dot because what we've said here is a match any characters between uppercase a and uppercase z so this is lowercase b so this wouldn't match whereas these would so that's basically what wildcards are how does it do that so how does it do that well this is can you imagine you've got i've got a list of files because it literally going through every file going yeah so if you think back we talked about if you watched the video on zero size files we talked about how our computer has a directory of all the files in it so let's just create a simple directory and let's say we have come on come on let's just say we have uh lecture zero one lecture zero two we have lab zero one you can tell what i've been doing this week lab zero two and we've got a file called notes in there and so these are all separate files and these are in that catalog for the current directory so let's say i wanted to match all the lectures well then i could use a wild card of lecture star to match all the lectures so how would that actually be matched against each catalog well that's let's first of all let's see how it would do it if we just specified an actual file name so what would happen if we wanted to search for lab01 because it's a bit further down so we want to search for the file lab zero one what actually happens well so it's gotta find whether that's actually your file in the directory or not and you have you have a series of api calls in the operating system that allow you to search through directories so what's going to happen well we're going to come to the first file in the directory so we've got lecture 0 1 and we've got lab everyone so we're looking for the file lab 1 and the computer is going to compare the two characters so it's going to compare l and l and they match and it's then going to compare e and a they don't match so we know that it doesn't match this one we then do the same for lecture two and because it begins the same way we get to the same point we then come to compare lab zero one and lab zero one so we're looking for five lab zero one we've come to catalog entry lab zero one so we compare the first two characters they match we compare the second two characters they match we compare the third two characters they match we compare the last but one characters and we compare the last two characters and they match and so we found the file it would actually then keep going and compare with lab zero two and they would not match on the last character and so it wouldn't match and the same for notes and then it would finish so what the first thing we do is we check every file in the catalog each time we do it and you can probably build up systems in place to make it faster whether they do that another is an implementation so we won't go into that so how do we handle the wild card well let's think about traditional window style where we just mean anything that follows will match with the star so let's say we're going to look for lecture again and we'll do the same thing so we're searching the catalogue so at lecture zero one and we're going to compare this with lecture star so we compare the first two they match compare the second two they match compare the third two they match and so on until we get to this point they've all matched if any of these didn't match then we'd stop because they don't match just like we did before there now in the windows dos world star means anything after here matches so as soon as we get to this point and it's a star here we say well okay it doesn't matter what that is we've matched so this is a big tick we can use this file do the same with lecture two and of course the same thing happens with the others when we get there it doesn't match and it carries on so it builds up a list of results or you have things that you can go through to iterate through them and so on you get the first and you get the next and the next and the next until you come to the end interesting point is where is this actually done in the system and that depends on the system using so if you're using a unix system this is often done by the shell so actually when the program gets called it actually gets called with each of the different files of different arguments passed into the unix executable to save it having to do it itself so of course you can see how this would then generalize with say the question mark when we come to the question well let's say we're searching for files match lecture zero and then some numbers so like zero four well let's say we use the pattern lecture zero question mark we compare each of these just as we did before they all match and when we get to this point we see that this is a question mark and we write the code that says oh question mark that matches with anything of course this wouldn't match with the file lecture file name lecture zero because we'd have the question mark here and this would be the end of string and so wouldn't there so we just have to write the string comparison algorithm to know about question marks and to know about stars and the other character codes that we can use to represent other things at first glance it looks plausible when you actually read into the comments they're very bizarre let's pick a couple and see what's all right so so this guy by now that might be a real person but it probably is just a completely made-up username i was no pretty puzzling the code gaming so they would have found out something tread the larger stop information that's a bit odd this guy hassell200 has replied i find your own difference profound certainly haha\n"