Your Data Science Journey Kickstarts Here
A beginner’s guide entirely based on first-hand experience
You’re here because you have been wanting to get started with Data Science and Machine Learning for a while now.
You are overwhelmed with all the resources available on the internet.
You’re unsure how to break into this field and secure a job. Yet, you believe in yourself and want to succeed.
I’ve been there a couple of years back, and now I have put together a beginner’s guide hoping this would help you. This guide is around 2000 words long, but I can summarize it into 3 words. Yes! Three. Words.
There, I said it. But before we go on, the right thing for me to do would be to introduce myself to show you that I speak from personal experience. So at the point of writing this, I am a Machine Learning Engineer at Trabeya and a Mentor at OpenMined. After completing my bachelor’s, I started working in the field of Data Science as a fresh graduate despite many advising me to go get a Master’s first. Instead, I continued to do everything that’s in this guide.
To make the best use of this guide, I urge you to continue reading till the end, absorb the approach, and create a customized plan that works for you. Today is a good day to kickstart, I promise!
You should start learning before you do anything else. Don’t listen to critics who say online courses/certifications won’t get you the job. As a beginner, how else will you acquire the relevant knowledge and skills? We start with learning but we won’t stop there. So by all means, keep learning.
But wait, what should I learn? Where should I learn from? I understand the questions you have because I had them too. Many people will claim that you can’t become a data scientist until you master programming, statistics, databases, machine learning, deep learning, visualizations, and more. That’s simply not true, you only need a basic understanding to get started, and mastering them is an eventual process.
The first task at hand is to evaluate yourself — your strengths and your weak spots. I was fairly comfortable with math, programming, and databases, however, I had very little knowledge of statistics. It’s okay to be bad at something important, the key is to acknowledge it first and work towards getting better at it.
Don’t waste time getting into the war of R vs Python, pick R if you come from a statistics/mathematics related background, in all other cases pick Python. I picked Python but eventually, like many others, I ended up using both.
Here are some of the useful resources I recommend to get started.
You are allowed to skip some of these according to your current knowledge and skills or use it as some sort of reference material.
- Python for Everybody by the University of Michigan on freeCodeCamp (13h course — for free) or as a 5-course specialization on Coursera: This course covers from data structures to databases to visualizations using Python and was really useful when I started out.
- Linear Algebra and Multivariable Calculus from Khan Academy: Though I was fairly familiar with the required math concepts, often I end up referring to a topic on Khan Academy for a better intuition. Feel free to navigate through Khan Academy and learn more for free.
- Statistics Specialization by the University of Michigan on Coursera: Worth all the time I patiently invested in this, slowly I started understanding all the statistical concepts. Supplement this with Khan Academy’s Statistics and Probability.
The idea is to move on to the next phase sooner where you’ll be applying everything you’ve learned. So you don’t have to complete all the below courses upfront, but they are must-do in the journey. When you feel familiar with some concepts go on to the next phase, and come back later to continue learning. It’ll be an ongoing cycle.
- Andrew Ng’s Machine Learning Lectures by Stanford on YouTube: I shouldn’t have to tell you this but this is a game-changer! When you advance through this course, you’ll feel breaking into data science and machine learning slowly. To date, I go back to these lectures to refer to concepts!
- Deep Learning Specialization by deeplearning.ai on Coursera: Andrew Ng started revolutionizing Deep Learning Education with this. It covers every fundamental concept in deep learning from scratch to advanced Computer Vision and NLP techniques. The content of the 3rd course (Structuring Machine Learning Projects) can only be created with years of rich experience of working in the industry as well as being in academia. I can go on and on talking about how useful this was!
- The Missing Semester of your CS Education by MIT: This wasn’t available when I started out, and I had to learn these at various websites, the hard way. Do yourself a favor and learn this, it’s called the missing semester for a reason. Things you’ll need in the industry and not taught is taught here.
Please note that I recommended only what I have used and benefited first-hand to get started, so if you came across other beginner resources that you have used first-hand and benefited from, do drop it in the comments section. It’ll be useful to me and many others. Soon I’ll put together the complete curriculum I followed for the past 2 years, in more detail and with reviews, so stay tuned!
Create, basically anything valuable. Anything that you can do with the knowledge and skills you have acquired so far. I created a simple day-night classifier, an exploration of remote-working culture, and some more. All of them were pretty much basic.
It is important to understand that you need to get to this phase as soon as possible. A common mistake beginners do is to try to master several skills first. Don’t do that. You will be surprised when you realize how much you can learn while building projects, all while you create a demonstrable portfolio.
Here are some recommendations I have on what you can work on, but this is really up to you. You need to choose based on your interests.
- Kaggle (Past/Current) Competitions — This is where I started, here to be specific. From the little knowledge I had, and with lots of help from the incredible Kaggle notebooks from the community, I did well. I wasn’t trying to win the competition, I was trying to apply the little I had learned so far. I was actually asked to present this to the team at an AI Startup, turns out they liked my approach, and a couple of interviews later — I was hired. So here’s the takeaway — head over to Kaggle and pick any competition that sparks your interest and start applying whatever you learned so far and don’t hesitate to look for help from the public Kaggle notebooks. The most basic one which Kaggle recommends to start is the traditional Titanic Machine Learning from Disaster. This is a good starting point and helps you understand the complete machine learning pipeline.
- Udacity Projects — These are originally part of some Nanodegrees but are opensource on their GitHub, so nobody is stopping you from navigating through the projects, and completing them to add to your portfolio projects. Recruiters love to hear how you went out of your way to complete projects that sparked your interest! In my eyes, Udacity has mastered the art of project-based learning and the best way to complete a Nanodegree is through scholarships. Always keep an eye on their scholarships page!
- Omdena Collaborative Projects — In the real world, data is not handed to you in competition or repositories. You need to be able to define the problem, collect your data, design your viable approach, and deploy the solutions. Omdena is a bottom-up collaborative platform where beginner-to-expert AI practitioners work towards solving end-to-end real-world problems. I had recently worked on using AI to analyze domestic violence and online harassment during COVID19 and the experience is one of a kind. I will continue to work on more ethical projects with them. Joining is pretty straightforward with an application and a short-interview, do reach out to me if you need any help on this.
Please note that these are to get you started, and are intentionally kept simple. Once you’re there, I’m sure you’ll figure out how to complete more advanced and complicated projects on your own. Let’s head over to the last phase, shall we?
Sharing is the game-changer.
By now you have learned some concepts, acquired relevant knowledge and skills, and even created something that you can be proud of. But nobody knows what you’re capable of unless you share them with the world.
Make a conscious effort to share your learning, and the projects you create. Personally, I have learned better and created better work when I knew someone is going to have a look at this and possibly benefit from it later.
Here are some tips based on what worked for me.
- Share your accomplishments on LinkedIn and always have an updated LinkedIn profile. Recruiters love to see this.
- Every project you build, every notebook you create should be on your GitHub. For me, it’s the collection of everything I create. (some prefer to share everything in Kaggle, mainly because of the engaging community, so feel free to explore that option too)
- Share content that can add value to the community. If you’re a student — present on some interesting topics to your class. If you’re working full-time — present to your colleagues. Sharing your thoughts on LinkedIn or any platform you like, works too. Also, try your best to be consistent.
- Start blogging. Share your learning, thoughts, and stories via a blog. For starters publish an article that takes the reader through a project you worked on. I’m not the best person to advocate for this since I’m just starting out but Rachel Thomas (Co-founder Fast.ai) can. She convinced me and I’ll let her tell you why you should start blogging. I’ll link my first blog post to give you an idea of how basic it was.
If I’m allowed to change something about my journey, it would be to start sharing earlier. I thought I hadn’t done anything significant to share, but I was wrong. When I started, many appreciated and even benefited.
So here’s the magic spell. Learn. Create. Share. I did all this step by step, over and over again. And guess what?
You can do it too. Wait, you can do even it better.
When you do this right, you’ll start developing relevant skills and master them with time. You will start approaching any project fearlessly. As you share your work with your community you’ll build a strong network of like-minded enthusiasts. Communicating your work wouldn’t be something new to you. You will keep on getting better at what you do. Everything will fall into place.
Trust me on this.
As a closing note, I need you to know something.
There exists more than one approach to get started with data science, and if something else works for you I’m happy to hear and learn from your journey. If you want me to write more topics like these, please share your feedback in the comments. Feel free to reach out to me if I could be of any help. I can’t wait to see you succeed!
As a note of disclosure, some affiliate links have been used in this article to share the best resources I’ve used and at no extra cost to you.
For more helpful insights on breaking into data science, interesting collaborations, and mentorships, consider joining my private list of email friends.