Beginner's Guide

The First Book You Need to Succeed as an Aspiring Data Scientist

Spoiler: No prerequisites, but Joel will take you from zero to a hero.

Arunn Thevapalan

19 Jul 2021 • 4 min read

“In the beginner’s mind there are many possibilities, but in the expert’s there are few”― Shunryu Suzuki

More often than not, being a beginner could be one’s strength. I love beginners — I was one of them, after all.

They have big dreams. They want to “make it” as a data scientist. They know many have made it in the past, and there’s nothing to stop them. They’re “blank slates” hungry for knowledge and to grow. They know it’s not going to be easy and are willing to put in the effort.

If you have a similar “can-do” mindset, half the battle is won. You only need a bit of guidance, and everything else is out there on the internet. Of course, there’s too much noise out there, so if you ask me where to look for, I’d say start with these data science courses and supplement them with books.

Data science books, especially from renowned publishers, have packed tons of useful information in one place. Most beginners focus on courses and forget about the books. Please don’t make that mistake.

I’m thankful I found this particular book in my early days — it shaped my knowledge and instilled confidence. In this article, I’ll break down the first book every aspiring data scientist should read.

Data Science From Scratch by Joel Grus

Since you’re reading this, I know you already have some sort of understanding of data science. You might have heard about it from YouTube, LinkedIn posts, or even through blog posts like mine.

But for a moment, if I assume that you know nothing — Data Science from Scratch by Joel Grus will hold your hand and take you from zero to a hero. The best thing is that you don’t need any prerequisites to get started with this book.

You can grab the book from Amazon and use this GitHub repository for all the codes used in the book.

No, this isn’t a sponsored post — it’s just I’ve benefitted from the book tremendously in my early days. If you’re not convinced enough, here’s why you, as an aspiring data scientist, absolutely need it.

You’re put into the most realistic data scientist’s role.

You’re hired as the data scientist (this will definitely happen, my friend).
You’re expected to lead the data science efforts at DataSciencester — the social network for data scientists (again, bound to happen sooner or later.)

DataSciencester hasn’t invested much in building the data science practice in the past, so doing this would be your job. Sometimes you’d have little help at your new organization if the data science transformation is recent and still in progress. But worry not — Joel and his book is your guide.

Throughout the book, you’d be learning about data science concepts by solving real problems you encounter at DataSciencester. You’ll be using the data supplied by the users of the platform, the data based on user interactions, or based on the experiments you’d be designing.

I personally loved this unconventional approach. Now that I know what problems we solve at a real organization, I can vouch that the scenario is realistic. You’ll be ready to apply the skills at any real-world organization when the time comes. Trust me in this.

I’m yet to find an important topic that is missing.

I have constantly advocated being language agnostic, meaning that data science is beyond the languages we use.

But Joel takes an opinionated approach, claims Python is the best language to get started. So if you are more comfortable with R, this book isn’t for you. Barring that caveat, the table of contents took me by surprise. I couldn’t find any important topic that was missing from the list.

It starts from “what is data science” and goes on to basic python, linear algebra, statistics, data visualization, probability, databases, machine learning, clustering, neural networks, networks, recommender systems, NLP, and finally, big data.

He really has covered the range of topics sufficient for a beginner without overwhelming you with unnecessary textbook theories and complicated code blocks. In Joel’s own words:

“It’s got math, but only as much as is totally necessary. It’s got scraping and cleaning and munging. It’s got machine learning. It’s got databases and MapReduce. Necessarily it doesn’t go deep into any of these, but I like to think it establishes a broad, solid foundation.”

As a beginner, you want everything in one place.

I’ve mentored several students and aspiring data enthusiasts. The growing issue they all face is that “information overload.” There are plenty of resources, too many, which overwhelm them and derail them from the journey altogether.

The solution to this is to have a simpler path and stick to fewer but comprehensive resources. And that is the biggest benefit of this book — it has everything a beginner needs in ONE place.

So basically, you grab this book, learn the concepts, apply them to projects. You don’t need to feel bad bothering someone or pay for more and more online training courses; just use this book as your guide in the early days.

Now, with time you’ll feel comfortable with the book's contents and understand them quite easily.

This is expected — and is a subtle sign that you’ve progressed to a slightly intermediate level. When that happens, pat yourself on the back and congratulate yourself, that was always the goal.

More exciting things will be waiting for you then, but to get there, shall we get started today?

As a note of disclosure, this article may have some affiliate links to share the best resources I’ve used at no extra cost to you. Thanks for your support!

For more helpful insights on breaking into data science, honest experiences, and learnings, consider joining my private list of email friends.