When you work with large datasets, you run the risk of running out of memory.
An out of memory error is particularly frustrating because your programme suddenly crashes after you’ve patiently waited for it to load all of your data just to reach the point where it can’t load any more.
Fortunately, there are plenty of best practices when working with Python and Pandas to overcome this hurdle, not least within this excellent reference by Itamar Turner-Trauring.
This article focuses on the simple but effective technique of changing the data types of your pandas DataFrame to make it more memory…
When it comes to running a clustering/segmentation project, one of the most challenging tasks is determining how many clusters exist.
The bad news is that these techniques are rarely conclusive. The reason machine learning courses use examples such as the Iris flower data set is that the number of clusters is known in advance, and they are quite easy to find.
When you finish studying and start working as a data…
Hypothesis testing has been around for decades, with well-established methods of determining whether the results that have been observed are significant. Yet sometimes it can be easy to lose track of which testing approach to use or whether it can be reliably applied to your situation.
Companies often use surveys to track their NPS (Net Promoter Score), a measure that’s designed to reflect customer loyalty and potential for growth.
The calculation is simple, ask customers how likely they are to recommend your business on a scale of 0–10, then subtract the percentage of negative responses from the percentage of positive.
Do you ever wish that you could just manage a clone of yourself? Messages would never get lost in translation, you would know exactly what work your line report is capable of, and they would find your jokes hilarious.
But we can’t do that, and notwithstanding an inflated opinion of myself, it’s not particularly desirable. Teams instead work best when they have a diversity of background and thinking. Plus, it can be a useful exercise in humility to know that your manager is probably having the same thoughts.
So in this clone-less world, you need to embrace the challenges and…
George Orwell’s novella Animal Farm includes the memorable line…
all animals are equal, but some animals are more equal than others ¹
Orwell may have been referring to hypocrisy, power, and privilege in society, but if you replace the word animals with errors, it starts to become very relevant to machine learning.
Now that I’ve finished pretending to be well-read, let’s get more specific.
Explaining the concept of false positives and negatives is a popular interview question because they are so important when applying a classification algorithm in practice.
I still sometimes find myself hastily consulting Wikipedia just before a…
Data Scientist. All views are my own, but not all of them are interesting.