20 Going further
20.1 Q&A: How do I read the scikit-learn documentation?
In order to become truly proficient with scikit-learn and go beyond what I’ve covered in this book, you need to be able to read the documentation. In this lesson, I’m going to walk through the five main pages and page types that you need to be familiar with.
The first page is the API reference, which you can get to by clicking on API from the top navigation bar. The API Reference gives you a high-level view of everything available in scikit-learn. Namely, it lists all of the classes and functions, organized by module.
For example, this is the compose module. There’s a brief description of the module, a link to the relevant section of the User Guide, and descriptions of the two classes and two functions contained in the module. If we wanted to learn more about ColumnTransformer, we could click on it and be taken to the class documentation.
The class documentation is the second type of page you need to be familiar with. It gives you a detailed view of a class, in this case the ColumnTransformer class. At the top is the class signature, which lists the parameters and their default values. Starting in version 0.23, everything after the asterisk needs to be passed as a keyword argument, also known as a named argument, rather than by position. There’s also a link to the source code for the class.
Next is the class description, a link to the relevant section of the User Guide, and sometimes the version in which the class was introduced. Next you’ll see the same parameters from the class signature, except here it provides a detailed description of each parameter and the expected data type. Below the parameters are the attributes. As a reminder, attributes will end with an underscore if they are learned or estimated from the data during the fit step.
Next are links to any related functions or classes, any important notes, and some simple usage examples. The next big section is a list of the methods. It starts with a description of each method, and below that you can see the parameters and return values for each method. Again, you can click on “source” to view the source code for any method. Finally, there are links to examples that use this class.
Let’s scroll back to the top of the page, and then click over to the User Guide for ColumnTransformer. The User Guide is the third type of page you need to be familiar with, and you’ll know you’re in the User Guide when you see the numbered sections. The User Guide is more like a tutorial, because it explains why you might want to use a particular class and advice for how to use that class properly. It often discusses related functions and classes, and includes additional usage examples.
Let’s go back to the class documentation for ColumnTransformer, scroll to the bottom, and click on an example. Examples are the fourth type of page you need to be familiar with. Examples vary in structure and length, but their distinguishing feature is that they demonstrate how to solve a particular problem from start to finish. You’ll always see imports at the top, and buttons on the right to run the example online or download the example to your computer.
Let’s go back to the class documentation for ColumnTransformer one more time, and then scroll up to parameters. Sometimes in the documentation, you’ll see highlighted terms that are not classes or functions, such as fit and transform in this case. If you click on it, you’ll be taken to the Glossary, which is the fifth page you need to be familiar with. The Glossary defines almost all of the important terms used in the scikit-learn documentation. If you just want to browse through it, you can access it through the More menu in the top navigation bar.
Let me now summarize the five pages and page types we walked through, and describe how I use them.
The first page is the API reference. I go here when I want to see what classes or functions are in a particular module. For example, if I wanted to review all of the classes and functions that are available for preprocessing, the API reference is the fastest way to do this.
The second page type is the class documentation. I go here any time I need to understand a particular class in-depth, especially all of its parameters and attributes.
The third page type is the User Guide. I go here when I need more context about a particular class or function, or advice about how to use it properly.
The fourth page type is Examples. I go here when I need to see a more complex usage example of a particular class, since the examples in the class documentation and User Guide are purposefully simple.
The fifth page is the Glossary. I go here when I need to understand a particular term and it’s not covered in the User Guide.
If you’re ever unsure where to start, you can click on User Guide in the top navigation bar and just start browsing the section that seems most relevant, or you can do a search in the search box.
20.2 Q&A: How do I stay up-to-date with new scikit-learn features?
After every major release of scikit-learn, I recommend reviewing the Release Highlights, which are linked from the top of the home page. This page summarizes a small subset of updates that the scikit-learn developers have judged to be especially important or exciting.
I also recommend that you review the detailed release notes, which are linked from the Release Highlights and are also linked from the More menu under “Release History”. This will give you the most comprehensive look at all of the new features, enhancements, and API changes. Even if you aren’t upgrading right away, it can help you to decide whether to upgrade and warn you about future API changes so that you can start preparing your code now.
Reading this page might seem like an overwhelming task given its length, but my recommendation is to skip past any modules that you never import and only read through the modules that you actually use.
While reading about a particular change, you can click through to the class documentation for more details. If you still have questions after reading the class documentation, you can click on the number on the right to read through the GitHub pull request that introduced this feature or change. Often, the pull request will include a long conversation about the reasons for the change and why it’s designed in a certain way. These kinds of details are not always captured in the documentation, and they can help you to build a more in-depth knowledge of scikit-learn.
20.3 Q&A: How do I improve my Machine Learning skills?
If you want to get better at Machine Learning, my top recommendation is to practice what you’ve learned in this book as much as possible. I recommend choosing different types of problems and datasets so that you can learn how to adapt these skills to all different types of situations. During that process, you are likely to learn about other topics and scikit-learn modules that we didn’t cover in this book, which will expand your Machine Learning and scikit-learn fluency.
One particular area that might benefit you is to study Machine Learning models in more depth. This will help you in a number of ways:
- It will help you to choose which models are worth trying for a given problem.
- It will help you to efficiently and properly tune those models.
- And it will help you to better interpret the results of those models.
There are many resources on this topic, but my favorite resource is the book, An Introduction to Statistical Learning. It will help you to gain a practical understanding of many of the most important and widely used Machine Learning models. It’s available in hardcover, or you can download it as a free PDF. Although initially the book was written for the R language, there is now a second version that uses Python.
20.4 Q&A: How do I learn Deep Learning?
As I mentioned at the beginning of this book, there are some specialized problems for which a Deep Learning library will provide results that are superior to what you could achieve with scikit-learn. As such, you may decide that you’d like to spend your time learning how to use Deep Learning.
Deep Learning is a fast-moving field, but one course that has stayed up-to-date and is taught in an approachable way is Practical Deep Learning for Coders. It’s available for free and is complemented by a book, a deep learning library called fastai, and a huge online community.
Although Deep Learning has a higher learning curve than Machine Learning, it is based on many of the same fundamental principles that you’ve learned in this book, and so you’re well-positioned to begin learning Deep Learning.