Want to Break into Data Science? Start Here.
A complete data science roadmap for beginners (updated for 2023)
It's been 5 years since I embarked on my data science journey.
I remember how uncertain I felt when I started.
After deciding to pursue a career in data science, everyone I spoke to told me to go for a Masters from a top US university. In 2017, that seemed to be the only way.
University pathways were too expensive and put international students under a lot of pressure to get a job to be able to repay their student loans amidst the visa regulations.
So, instead, I did what I'm about to present in this article.
5 years later, after becoming a senior professional in the data science industry, you can tell it worked.
The Fundamental Mindset Shift to Break Into Data Science
One common mistake beginners make is to overthink and compare options instead of just getting started. To be honest, it's not your fault. I was overwhelmed with all the resources when I started.
Every other week, there's a new blog published about becoming a data scientist - and how can we not expect you to overthink and not take action at all?
It's my duty to simplify this guide (and I will do so) but first, repeat after me: There are multiple paths to becoming a data scientist or a machine learning engineer.
So instead of wasting time arguing which course is the best or if R is better than Python - I suggest you follow this guide and start your journey.
Here's the mindset shift I want you to take before we get into the curriculum:
- There exists more than one path to succeed in data science.
- Too much information, i.e., Information Overload, overwhelms you and derails you from all paths.
- To become a data scientist, you must consistently focus on at least one path.
- The path you choose must be a simple one that helps you take action.
My Self-Created Data Science Curriculum
I recommend these courses and books for your data science curriculum, assuming you're an absolute beginner. There's no one-fit path for everyone, so feel free to customize this and create your own curriculum.
Give or take, I expect dedicated learning to take about a year.
- Applied Data Science Specialization with Python by the University of Michigan: Python is widely used across the data industry, and this course is a great hands-on start to better understand the overall machine learning workflow.
- SQL for Data Analysis: I have done a few other SQL courses, but this one teaches everything you need to know in the context of data analysis and is 100% directly used at my work. Some companies even have entire interview rounds focused on SQL.
- The Missing Semester of your CS Education by MIT: You'll learn version control, git, IDEs, and command-line environment, which will help you get proficient with the tools you'd be using at work.
- Linear Algebra by Khan Academy: While these concepts are again taught in the machine learning course listed below, it's helpful for you to understand the maths behind machine learning because.... machine learning is just mathematics!
- Multivariable Calculus by Khan Academy: an excellent website when you feel you need to brush up any forgotten concepts in maths
- Statistics Specialization by the University of Michigan: when applying data science to business problems, statistics is non-negotiable. This course teaches everything you'd need to know with hands-on exercises. Period.
Machine Learning & Deep Learning:
- Machine Learning Specialization by deeplearning.ai: The OG of machine learning was released in 2012, and most were hooked to this field from this course, including me. The course was updated by Andrew Ng in July 2022, you'll feel breaking into data science as you progress on the course.
- Deep Learning Specialization by deeplearning.ai: This specialisation has everything from the basics of deep learning to advanced computer vision and natural language processing. Shoutout to the "Structuring Machine Learning Projects" course within this specialization, which is a gem that only greats like Andrew Ng can teach.
- Data Science From Scratch by Joel Grus: Let this be the first book you buy in your early days because interesting approach where you’re hired as a data scientist and navigate through how you tackle each task handed to you. An ideal-beginner-friendly book for sure!
- Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow by Aurélien Géron: One of the OG books of data science learning, I read this book during my commute, keep a copy at my work desk, and even read this to revisit forgotten concepts.
- Build a Career in Data Science by Emily Robinson and Jacqueline Nolis: Different to the books above, this is a piece of industry-focused career advice you won’t find anywhere else. If your goal is to get a job in the industry, there are so many useful tips in the book for your interview process and for your first 90 days at work.
I have done many other courses from platforms such as Coursera, Udacity and DataCamp. All 3 of them are amazing platforms to learn and helped me upskill. I also have read a bunch of other useful machine learning books. You'd hear about everything in my past or future articles on this website.
Listing down everything here, which is meant to be a beginner's roadmap, will only overwhelm you. When you reach an intermediate level - reach out, and I'll point you to these resources.
Sometimes less is more.
How to Gain Experience Before Your First Job
Most of us lose confidence when applying for data science roles because most job descriptions ask for 1-2+ years of experience.
This leads to our frustrating loop: you can’t gain employment without experience, and you can’t gain experience without employment.
Look, I understand - I was in your position a few years ago too. I've mentored and coached people like you, so I know what it feels like.
After speaking to colleagues who joined the industry through different non-traditional paths and reflecting on my journey, the solution to this loop was evident.
There was one thing in common for all of us on the other side: we all had demonstrable skills through something I call a "data portfolio."
Here's how to build one for yourself:
- Kaggle: I know people say that Kaggle datasets are generally clean and in no way represent real-world projects. This can be true, but as a beginner, you got to start somewhere, right? Kaggle has an amazing beginner-friendly community and plenty of guided tutorials to get you started. Some companies even use this platform for hiring (mine did), so that's a bonus reason for you to get familiarized with the platform.
- Omdena: You asked for real-world experience - here's it for you. I've volunteered for a couple of projects at Omdena, where beginner-to-expert AI practitioners work towards solving end-to-end real-world problems. Volunteering was straightforward, with an application and a short interview with the Founder.
- Udacity: Nobody is paying me to say this, but in my opinion, Udacity has mastered the art of project-based learning. It's not just a learning platform but also one where you can build projects that resemble real-world problems and receive feedback on your work. If you find their programs expensive (try scholarships!), all of their projects are open source on their GitHub, waiting for you to be completed and added to your data portfolio.
While there could be other ways of gaining experience, these three are the ones I've personally used and can recommend.
The key is to create your data portfolio while you're still learning data science. Not once you feel ready. Not just before applying for jobs. It needs to happen while learning data science.
Hiring managers love to hear how you went out of your way to complete projects that sparked your interest. Data-portfolio FTW, my friend!
Standing Out From the Crowd
In 2017, when I started to learn data science, nobody knew me.
I was in my hostel room, binge-watching the online courses I had outlined above.
It was only when I started creating and sharing data science content online publicly that people noticed me. I kept going and got multiple job offers, freelance opportunities and a loyal audience to read what I wrote.
Here's how I did it (and you can too):
- Create all of your projects on GitHub. You'll be using Git at your work, so get a headstart here.
- Share your accomplishments (and how you achieved them) on LinkedIn. Your LinkedIn profile is often screened before calling you in for an interview, so ensure it's continuously updated.
- Start blogging on Medium. You can start by writing about the projects you've built in your data portfolio. If you need some convincing, here's the article that motivated me to start blogging as a data scientist.
If there's something I want to change about my journey, I should have started sharing my work much earlier. Will people listen to what I say, and what credibility do I have to share my journey publicly? My imposter syndrome kicked in, and I constantly doubted myself in my early days.
Eventually, when I did share my work, many appreciated and thanked me for helping them break into data science. My personal brand was built in the process, I like to believe I stood out from the crowd (which is why you're here 😉).
The Approach is to: Learn. Create. Share
Your first year of learning data science is going to be the hardest. Creating the data portfolio is fun and curiosity-driven. Sharing your journey is most rewarding.
Sadly most quit their data science quest within 6 months. That's okay, too, data science need not be for everyone.
Here's a how my timeline looks like for now:
- Year 1: Learning data science online, working on projects
- Year 2: Getting my first data science job, continuing to upskill
- Year 3: Promoted as a Machine Learning Engineer, still continuing to upskill
- Year 4: Leading teams as a Senior Data Scientist, starting to share my learnings, mistakes and experiences from my journey
- Year 5: Exploring niche topics in AI that spiked my interest and doubling down on writing as a form of sharing my knowledge
There's nothing fancy up there, it's similar to the progression of most data professionals. As I always say, if a confused undergrad from a tiny island called Sri Lanka 🇱🇰 can do this - you definitely can too.
I'm rooting for you to become the next DataGrad.✌️
P.S: How much do you need to invest in the above curriculum? $2000? $1000? $500? It will still be worth it, but the answer is ZERO! Everything I've listed is available for free to watch the videos and learn. You literally have no excuses now!
You pay a small amount only if you want the certificates (like I wanted). If you use my affiliate links, I get a small portion of it at no extra cost to you. Thank you for supporting me in running this website and creating more valuable resources.
For continuous helpful insights on breaking into data science, honest experiences, and learnings from my data science journey, let's keep in touch?