Focus on these five areas to break into data science

career data science Aug 22, 2022

This is my answer to a question I found online:

As an aspiring student who wants to break into AI/ML and Data Science, what should I focus on?

Data science is a complex and rapidly evolving field. It is a daunting task to break into it, especially if you are self-taught like I am. I had no structured curriculum, so identifying the areas of focus was a bit of a guessing game for me.

So, to make it easy for anyone, here are the five areas you should focus on to land your first data science job.

Looking to break into data science? Take my free data science fundamentals course, and get started with regression and classification projects!

Data manipulation

First, you need to know how to get your data and manipulate it.

Chances are that you must access a database to extract your dataset. It is primordial that you know your SQL basics. At the very least, you should know how to:

  • query data between two dates
  • join tables together
  • filter your table

Of course, mastering window functions and common table expressions (CTEs) is a plus.

Then, once you have your data, you must know how to manipulate it. Real-life datasets are nasty. They have missing values, outliers, and columns have obscure labels. Therefore, you must know how to tackle these different issues and extract meaningful information.

For any data manipulation, Pandas and NumPy are essential.

Know your algorithms

Second, you must know your algorithms.

Regression, classification and clustering tasks are common for data scientists. You must know how to implement algorithms for each task. You should also have some intuition about which algorithm works best in general.

Here is a list of the basic algorithms you must know:

Regression

  • Linear regression
  • Lasso and Ridge regression
  • Decision Trees (random forest, bagging, boosting)

Classification

  • Logistic regression
  • Decision trees
  • SVM

Clustering

  • K-means clustering
  • Hierarchical clustering

I did not include any deep learning in this list, because entry-level jobs rarely require it. My recommendation is to focus on the basics first before moving on to deep learning, if you ever need to.

Interpret your evaluation metrics

Third, you must have a deep understanding of your evaluation metrics.

Knowing how to interpret your evaluation metrics is key to landing your first job. We must realize that, as data scientists, we communicate with non-technical people. To them, recall, precision, F1-Score, or MAE have no meaning. It is part of our job to translate our scientific results into layman’s terms.

Also, you must know how to choose the appropriate metric for your problem. For example, I have interviewed candidates that used the accuracy to evaluate a classification model on an imbalanced dataset. Here, using an absolute metric is a huge mistake.

Thus, you must train your skills to choose the right metric for the right situation.

Hone the scientific method

The job title says it: were are scientists. It is essential to show that we follow the scientific method and that we conduct experiments.

There are many aspects to this area. First, let’s consider the creation of a robust test set. First of all, your test set should remain constant throughout your experiments. It should also be representative of the real-life application of your solution.

For example, suppose that you are working on a chatbot. Then, your test set should not contain only perfectly written sentences, with no grammar or punctuation errors. In reality, people make mistakes, use contractions, and sometimes don’t use punctuation at all! Your test set must reflect the kind of data your algorithm will have to work with.

Another way you can show that you follow the scientific method is by modifying one variable at a time. For example, do not engineer a new feature and change your algorithm at the same time. If your evaluation metric improves or degrades, you won’t be able to tell if it’s due to the new feature or the new algorithm.

Always keep everything constant and change one variable at a time. Of course, track your experiments and evaluate each one of them.

Be ready to explain your work

Finally, you must be able to explain each step of your work.

Again, I interviewed candidates that could not justify the steps they took in a project! Making the right decision is not enough, you must be able to justify it.

At every step of a project, you should ask yourself: why am I doing this?

This will help you to deeply understand your work and justify your thinking.

You must explain why you dropped that column.

You must explain why you filled the missing values using the mean instead of the median.

You must know why you engineered this new variable.

You have to explain why you chose that evaluation metric.

You must explain how you chose your champion model.

If you find yourself carrying out a step out of intuition with no data backing it up, then it’s probably a bad idea.

It is better to fail an experiment and know why, than have a successful experiment and not know why.

Plus, you can be sure that this type of question will come up in an interview! I know I like to ask them, because it separates the good from the mediocre data scientists.


There you have it! Focus on these five areas, and you are guaranteed to break into the field of data science.

I hope you found this article helpful!

Stay connected with news and updates!

Join the mailing list to receive the latest articles, course announcements, and VIP invitations!
Don't worry, your information will not be shared.

I don't have the time to spam you and I'll never sell your information to anyone.