The Power of Functions: A Guide to Writing Effective Code
In programming, functions are a crucial aspect of writing efficient and readable code. A function is essentially a piece of code that performs a specific task, such as selecting columns from a data frame or filtering rows. The key characteristic of functions is that they perform actions, which makes them seem like verbs rather than nouns. This unique property allows us to think of functions as tools that we can use to manipulate our data.
When it comes to naming functions, it's essential to follow good practice. A verb-based name makes the code more readable and self-explanatory. For example, the function `select` performs the action of selecting columns from a data frame, while the function `filter` performs the action of filtering rows. This naming convention helps other developers understand the purpose of each function and use them effectively in their own code.
One of the most high-profile examples of poorly named functions is `LM`, which stands for Linear Model. Despite its acronymic nature, it doesn't contain a verb, making it difficult to determine its purpose without consulting the documentation. In contrast, a more literal name like `run linear regression` would be more informative and easier to understand.
In addition to naming conventions, there are other aspects of function design that can impact code readability. For instance, when assigning functions to variables, it's essential to put the arguments in a specific order. There are two types of function arguments: data arguments and detail arguments. Data arguments refer to the things that you compute on, while detail arguments tell the function how to perform the computation. Understanding this distinction is crucial for writing effective code.
Another important aspect of function design is how they interact with other parts of the code. For example, some functions may not play nicely with pipe operators, leading to errors. In such cases, it's essential to write wrapper functions or modify existing ones to ensure seamless integration.
Lastly, it's worth noting that modern code editors can help simplify the process of writing and using functions. Many editors will autocomplete function names as you type them, making it easier to select the correct option without having to type out the entire function name.
By following these guidelines and best practices, developers can write more effective and readable code. It's also a good idea to take inspiration from existing libraries and frameworks, such as those used in data manipulation, to ensure that your code is efficient and easy to understand.
"WEBVTTKind: captionsLanguage: enmost variables represent objects that is they are nouns functions are a little different because they perform actions you can think of functions as verbs let's consider some functions from D plier for manipulating data frames select performs the action of selecting columns and filter performs the action of filtering rows notice that the words select and filter are verbs in fact it's good practice that all function names should contain a verb if you're stuck for ideas this list should get you started LM is perhaps the most high-profile badly named function first of all is an acronym so you have to read the documentation to determine that LM means linear model secondly it doesn't contain a verb thirdly there are lots of types of linear model and it isn't obvious from the name that this runs a linear regression I prefer a more literal name like run linear regression one possible counter-argument is that run linear regression is more effort type than LM that's true but there are seven good reasons why that doesn't really matter firstly the amount of time spent reading and understanding code is almost always longer than the time spent to type it I'm a slow typist but I can type run linear regression quicker than opening a help page for LM and reading it to figure out what the function does secondly every modern code editor will autocomplete function names in the data camp script pane start by typing the name of a function and press tab you can select an option without having to type the whole function name thirdly you can assign functions just like any other variable type I found myself calling head a lot so I define H to be vehicle to head and it saves me a few keystrokes as well as naming your function to make it easy to use you have to put the arguments in a sense order there are two types of function arguments data arguments are the things that you compute on and detail arguments tell the function how to perform the computation for example looking at the arguments of core for calculating correlations x and y are data arguments and use and method our detail arguments LM has another problem formula is a detail argument so data should precede it since data isn't first LM doesn't play nicely with a pipe operator and this code will throw an error of course LM has an excuse since it was written several decades before the pipe operator existed since we've established that LM has problems we can write a wrapper function run linear regression has a clear name the data argument comes first and it works with pipes time to name your own frommost variables represent objects that is they are nouns functions are a little different because they perform actions you can think of functions as verbs let's consider some functions from D plier for manipulating data frames select performs the action of selecting columns and filter performs the action of filtering rows notice that the words select and filter are verbs in fact it's good practice that all function names should contain a verb if you're stuck for ideas this list should get you started LM is perhaps the most high-profile badly named function first of all is an acronym so you have to read the documentation to determine that LM means linear model secondly it doesn't contain a verb thirdly there are lots of types of linear model and it isn't obvious from the name that this runs a linear regression I prefer a more literal name like run linear regression one possible counter-argument is that run linear regression is more effort type than LM that's true but there are seven good reasons why that doesn't really matter firstly the amount of time spent reading and understanding code is almost always longer than the time spent to type it I'm a slow typist but I can type run linear regression quicker than opening a help page for LM and reading it to figure out what the function does secondly every modern code editor will autocomplete function names in the data camp script pane start by typing the name of a function and press tab you can select an option without having to type the whole function name thirdly you can assign functions just like any other variable type I found myself calling head a lot so I define H to be vehicle to head and it saves me a few keystrokes as well as naming your function to make it easy to use you have to put the arguments in a sense order there are two types of function arguments data arguments are the things that you compute on and detail arguments tell the function how to perform the computation for example looking at the arguments of core for calculating correlations x and y are data arguments and use and method our detail arguments LM has another problem formula is a detail argument so data should precede it since data isn't first LM doesn't play nicely with a pipe operator and this code will throw an error of course LM has an excuse since it was written several decades before the pipe operator existed since we've established that LM has problems we can write a wrapper function run linear regression has a clear name the data argument comes first and it works with pipes time to name your own from\n"