Learning Experience

3 Beginner Mistakes I’ve Made in My Data Science Career

Writing this feels like career suicide

Arunn Thevapalan

Apr 29, 2021 • 7 min read

“You have been talking a lot about your learnings and successes; why not spill some of your mistakes and regrets too?”

That was my sister after I had lost a bet yesterday.

Now she’s forcing me into writing this article. I know what you’re thinking — she has a fair point, but maybe I’m not comfortable (yet) with the world seeing my mistakes.

Writing this feels like career suicide.

Will you follow my work even after this? I don’t know. Maybe that’s why I don’t see many experts talk about their early career mistakes. C’mon, surely they should have committed some too?

So guys, here I am, pressured to be honest about the mistakes I made when I started a career in data science. I’ll dive deep into revealing causes for every mistake, its importance for beginner data scientists, and how we can eventually avoid them.

Let’s see how this goes, shall we?

1. Believing Complex Algorithms Always Result in Better Solutions

“So what are the characteristics of these clustered residents?” my manager asked.

We had used the most advanced, recently released model to segment the residents of a smart city. The whole model was a black box, so we have no idea how it does the segmentation but gave the highest accurate clusters.

I thought for a minute; I couldn’t come up with an answer. Our model had no interpretability.

I hadn’t learned the lesson, though. At a later time, the client turned down our proof of concept for a potential project.

“This solution looks promising but let us get back to you. The investment of deploying this solution might be a bit too high.”

We had proposed a computer vision system to estimate the mass of fishes, using state-of-the-art object detection and depth estimation models. Still, we hadn’t accounted for the expensive GPU-based computation that came along with that.

Whenever I have been presented with a problem to solve, my brain is used to thinking of neural networks and complex algorithms.

Computer vision? Convolutional Neural Networks. NLP? Transformers. Synthetic data generation? GANs. Tabular data? XGBoost. In a nutshell, I’d opt for the most complex algorithm out there because I believed, more the complexity, the better the solution.

To some extent, this idea is true, especially when you want to win Kaggle competitions. That’s how these algorithms got popular in the first place.

Here’s the twist: In the real world, a 2% improvement in accuracy need not be as significant as it is in hackathons and competitions. The interpretability of the solutions, the operational cost of the solutions matter much more.

How to avoid this:

I have only one piece of advice to give you, which has worked for me ever since I’ve understood how the real world and businesses work. This trick has made our clients the happiest and my life the easiest.

Are you ready for it?

Start simple.

You heard me. Start with simple machine learning algorithms. There’s no point in complicating things upfront. Start with simpler solutions, which are more interpretable and are cost-efficient. There’s no harm in experimenting with linear or logistic regression.

Christoph Molnar has written a gem of a book on how to use interpretable machine learning techniques. Always a good idea to keep you educated on these topics.

If the performance is satisfactory, you’re good to go. If not, level up to a slightly complex one while accounting for the trade-off in interpretability and operational costs. This way, everyone’s expectations are bound to meet 100% of the time. Win-win.

You still with me? (Photo by JJ Jordan on Unsplash)

2. Not Building Strong Data Visualization Skills From the Beginning

It’s time I come clean; I was focused only on the cool stuff — building models and making predictions. For years into my career, I hadn’t properly mastered how to visualize or tell a story through data.

I was dependant on our Business Intelligence team to build every dashboard and communicate the insights to the decision-makers. Knowing to use libraries such as matplotlib doesn’t count — I’m talking about what’s beyond that.

Data Visualizations make the data more natural for the human minds to comprehend and helps makes everybody’s life much easier. Here are few instances I’ve seen data visualization make all the difference:

It helps us, the data scientists, to build an intuitive understanding and uncover patterns from the data. This intuitive understanding we build sets the foundation for better feature engineering, model development, and feature selection eventually.
It helps the end-users. Most projects built on GitHub and published in journals never get used. Why? Because not all end-users know how to use it from GitHub. In such scenarios, you’re better off building dashboards and apps that make it easy for them to use.
It helps the stakeholders of the project. AI projects need massive investments. More often than not, we need to convince them of the value we add to the businesses to fund our projects. In such scenarios, dashboards with all the value-adding business metrics help the stakeholders make faster and informed decisions.

With time, my crucial realization was that every phase of a data science lifecycle serves different but integral purposes and needs to be treated with equal importance. Ignoring a phase for too long will bite you in the back sometime later.

How to avoid this:

First things first, understand the importance of every phase in the data science lifecycle. You can refer to this end-to-end project guide to get a holistic picture of different phases in data science. Once you understand the different phases and skills relevant to them, start building each skill as soon as possible.

Talking specifically about data viz, my simple action plan was:

Learn the basics of data story-telling and building impactful dashboards.
Practice the skills in some sample data to get better at it.
Use these skills on a real-world project and share them for feedback.
Learn the advanced techniques slowly as I apply them to projects.

There’s much more to learn beyond plotting a graph using matplolib or seaborn; most of us stop our learning curve there.

This course from Coursera helped me understand the basics of data story-telling, essential design principles, building dashboards, and more. The most common tools in this space are PowerBI and Tableau, and the course uses the latter. This is the data viz course I wish I enrolled in much earlier than later.

Then you pick any dataset and build a dashboard to tell a story uncovering insights from the data. At this point, remember, you don’t need to be perfect; you’re only practicing your skills. Only you get to see what you’ve built, so feel free to experiment.

Finally, volunteer to help with data visualization elements of your next project and see how it goes. You can master the advanced techniques as you go; there’s no hurry.

3. Didn’t Bother Improving My Software Engineering Skills

“What’s Software Engineering got to do with Data Science anyway?”

During my undergraduate studies, I did a Software Engineering internship. I used to help the team in software development, write technical documentation and perform tests to assess the quality of development.

Interns get to work on the entire stack to gain exposure and a taste of what it’s like to work as a software engineer. I disliked the experience so much that I knew software engineering wasn’t meant for me.

Luckily, I found my passion in data science and left behind software engineering for good. I learned all the basic machine learning, volunteered for projects, built a portfolio, and eventually broke into the industry.

Whenever something on the lines of software engineering pops up in the process, I ignore them. “What’s Software Engineering got to do with Data Science anyway? There’ll be software engineers in the team,” I thought.

The need to write unit tests, clean codes, version control, building web applications, docker containers, object-oriented classes kept popping up constantly.

In hindsight, I ignored a crucial component of the popular data science Venn diagram because of a single bad experience I had during my internship.

Eventually, I had to give in and embrace software engineering. When I did, it wasn’t bad at all. I realized we don’t need to master everything, but only some instrumental skills. They call it software engineering for data scientists!

How to avoid this:

If you’re a beginner learning data science, please come with a blank slate and an open mind, hungry for knowledge. There’s a lot to learn, and you need some software engineering skills regardless of your background.

There’s no need to panic; here are some resources I used which will be helpful to you:

Software Engineering for Data Scientists will teach you everything about writing clean code, auto-documentation, and creating packages, just enough to get started.
How I build Machine Learning Apps in Hours
How to Dockerize Any Machine Learning Application
Nine Simple Steps for Better-looking Python Code

I have friends who even came from Physics, Economics, or even Mechatronics background and learned these basics and are doing fine. So if they can do it, you surely can too.

Just don’t be ignorant like me. That’s all.

Sometimes You Need to Do the Right Thing

Yes, I was initially pressured into writing this, but I’m so glad I put it out.

I’m so glad I wrote this. (Photo by Dan on Unsplash)

If you’ve been reading so far, you might wonder, this guy didn’t want to write this in the first place but then shared everything in-depth. True true true! I never intended to. As thoughts started pouring in, I kept writing because it felt like the right thing to do.

You deserve to see the complete picture — we all deserve it.

People assume everything’s great and experts don’t make any mistakes. I know you think so because I thought so too. But that’s not true.

Most focus on sharing their successes and how they did it, hoping to inspire you to become better. There’s nothing wrong with that idea. I’ve been doing it too.

But isn’t giving a complete picture of reality the right thing to do?

I couldn’t have become a Senior Data Scientist overnight, nor can anybody. I learned from the experts, and their courses, acquired the skills, applied them, and eventually got better at it.

In that process, I made mistakes; you will make some too. We all make mistakes, and that’s okay. If you don’t learn anything from your mistakes, that’s not okay.

Learn from your mistakes and progress towards your goals. Everyone can do this. You definitely can, trust me, trust you.

As a note of disclosure, this article may have some affiliate links to share the best resources I’ve used at no extra cost to you. Thanks for your support!

For more helpful insights on breaking into data science, honest experiences, and learnings, consider joining my private list of email friends.