9 Resources for Data Science Projects

The most novice engineer can start on a path towards data science mastery in this new age where data science skills will be needed at every level.

Dan BarkerData science, machine learning, artificial intelligence, and deep neural nets are all hot topics these days (and key terms that might help this post with some SEO, unless the AI sees through my attempts). Below I've shared several of the resources I use regularly while working on data science projects over the last few years. I don't read many books, so that I've shared even one is evidence of how important it is.

There are enough resources here to get even the most novice engineer started on a path towards data science mastery in this new age where data science skills will be needed at every level. There is a tool for performing the work, a class taught by a renowned Stanford professor, websites with tutorials to give you real-life experience, and a site dedicated to making the latest research available to all for free so you can learn more if you want.

Enjoy the journey!

Book

Weapons of Math Destruction by Cathy O'Neil

If you want to be able to trust your AI outputs, then you need to read this book. It explains some of the different avenues by which bias can infiltrate your data and algorithms and what you can do about it.


Online course

Andrew Ng's free machine learning class on Coursera

This course makes it easy to get started in Machine Learning with very little prior knowledge. Andrew is an excellent instructor and provides helpful explanations for understanding complex concepts.


Tools

Data Set Search by Google (beta)

If you want to search a lot of public datasets to include what's in kaggle, then you need to check out this beta project from Google. You can use a lot of the common advanced search syntax you're already used to using in Google Search like specifying the site to search. This is where I go when looking for a dataset to use when I need one.

Colaboratory, a free Jupyter Notebook

This tool provides a Jupyter notebook implementation that allows you to collaborate with others similar to other Google Apps. If you're short on cash or just want a tool that's available from any internet-connected computer, then this will help you a lot. I use it almost exclusively just because it helps me avoid the issues of managing local dependencies.


Videos

Andrej Karpathy's Stanford class videos on YouTube

Recommended by Kartik Subbarao

These are great. Andrej gives you an intuitive understanding of neural networks in a way that's friendly for how programmers think about things. He's also got some great blog posts on the subject as well.


Websites

Arxiv.org

This is a site everyone should have saved if they're interested in data science. All of the latest research is published here to ensure the researchers can claim "first" in their findings before the papers are officially published. In data science, the field is moving so fast, that it's important to stay current in order to have the most effective and efficient algorithm.

Kdnuggets.com

Don't let this site's appearance fool you, it has a ton of high-quality content. It will also republish articles from other sites with the permission of the author. This often helps highlight articles that wouldn't necessarily get as much traffic. This is one of the best websites for data science content.

Kaggle.com

Anyone in data science will know this website. This site has a lot of datasets available, but these are mostly focused around data science competitions and projects. It's a great way to learn and begin interacting with some of the many public datasets. They have some project templates to help you get started and learn how all of this data science stuff works.

Towardsdatascience.com

This whole site has been an excellent resource for me. They constantly have great content covering both practical and theoretical topics in data science.

About the author

Dan Barker - Dan spent 12 years in the military as a fighter jet mechanic before transitioning to a career in technology as a software engineer and then a manager. He was the Chief Architect at the National Association of Insurance Commissioners leading their technical and cultural transformation. He's now leading RSA Archer as their Chief Architect in their cloud migration and conversion to SaaS. Dan is also an organizer of DevOps KC and the DevOpsDays KC conference.

9 resources for data science projects was authored by Dan Barker and published in Opensource.com. It is republished by Open Health News under the terms of the Creative Commons Attribution-ShareAlike 4.0 International License (CC BY-SA 4.0). The original copy of the article can be found here.