TensorFlow for the under-fitted Data Scientist.
We all know that there is a huge demand for qualified data scientists in the industry. So how are we exposing the world of data science to students?
I think it’s important for students to be aware of industry standard tools from the beginning to help shape the way they learn and think about problems.
With this in mind, I want to think about how students can get started with learning about these tools without having the hands-on machine learning experience that is often assumed knowledge when coming across these tools for the first time.
So, I wanted to write an example of how a student could potentially start to investigate certain tools without having prior experience.
Today, I’ve chosen to do a high-level, student-oriented overview on…
Let’s first start off with understanding the name: TensorFlow.
Why has TensorFlow included ‘Tensor’ in their name?
Put simply, a tensor is an n-dimensional object. Vague? Let me share a visualisation that I found helpful when learning about tensors:
I think most of us have a pretty intuitive understanding of what a scalar is (a number!) and even vectors and matrices, but what isn’t clear from the above image is that all of these things are tensors of different dimensions!
So, generally speaking, a tensor is like a matrix but it doesn’t just have to have rows and columns, it can have depth as well (aisles) and more (which we can’t visualise)!
This means we can have one object that stores references of n features. This is something that comes up a lot in machine learning, and tensors, in particular, feature heavily in deep learning.
Funnily enough, TensorFlow allows you to work with machine learning and deep learning (neural net) models and algorithms in a synchronised way, so you can start to see where the ‘tensor’ part of the name comes in.
As I’ll touch on more later, TensorFlow is a workflow tool that ties together a bunch of industry standard tools in order to deploy seamless machine learning and deep learning projects.
In short, TensorFlow gives you your machine learning flow.
“TensorFlow is an end-to-end open source platform for machine learning. It has a comprehensive, flexible ecosystem of tools, libraries and community resources that lets researchers push the state-of-the-art in ML and developers easily build and deploy ML powered applications.”
This is the official description as copied and pasted from the TensorFlow website.
Let’s break this description down and not assume any knowledge:
“TensorFlow is an end-to-end open source —
Wait — what does “open source” mean??
Well, while we’re at it, let’s hone another critical data science skill:
Thanks Google! So “open source” is a widely used phrased in the tech community meaning to make code available for anyone to contribute to and distribute.
Speaking of Google, TensorFlow was created by the Google Brain team!
Why would Google want to make TensorFlow open source?
Well, as with most open source projects, by not being open source, TensorFlow would be putting themselves at a huge disadvantage since rival deep-learning frameworks like Theano, Keras and PyTorch are open source.
It essentially allows TensorFlow to get free input from some of the best minds in the world to keep them competitive with state-of-the art technology. (And don’t worry, TensorFlow has the last say of what contributions are adopted so that the worst minds in the world can’t add dog sh*t — thus is the nature of open source projects).
As we were with our description…
… platform for machine learning —
Okay, okay I know I’m trying to become a data scientist but… what actually is machine learning?!
To save a lengthy discussion…
Notice the distinction here between Machine learning and artificial intelligence. These two terms are often used interchangeably but as you see here, machine learning is actually an application of artificial intelligence — but we’ll leave that discussion for another blog post.
It [TensorFlow] has a comprehensive, flexible ecosystem of tools, libraries and community resources —
Ecosystem of tools? Libraries? Community resources?
This one is a bit of a pandora’s box and will often highlight more tools that need to be learnt. But for now, we can understand that a data scientist uses a lot of different software tools, from cleaning and processing data to developing and training models for machine learning.
Community resources goes with the nature of the open source environment. TensorFlow also offers extensive tutorials and resources on how to best use their platform as well as community forums and help requests via GitHub and Stack Overflow. You can see more on this here.
Libraries. If you haven’t come across libraries before or have just never understood what they are —
So simply put, libraries are just a collection of functions that do things that we need to do all the time.
There’s a coding principle that if you’re writing out the same of code over and over, you probably need to put it into a function. With this in mind, think of libraries as all the functions people have written over and over and, finally, someone thought “If I need this function, someone else probably needs this function too”.
Data scientists obviously have very repetitive parts of their jobs, and so there are many libraries available that contain all the necessary functionality to perform the job. Here are some of the libraries and extensions that are easily accessible with TensorFlow.
…that lets researchers push the state-of-the-art in ML and developers easily build and deploy ML powered applications.”
While we might not have any experience with machine learning yet, we now know that when the time comes, we have platforms like TensorFlow ready to support us in our endeavours.
As you learn more about the industry and machine learning itself, TensorFlow’s use and applications will become more apparent. But to sum up for now, TensorFlow is a platform that ties a lot of other industry standard tools together to create a workflow that enables us to deploy end-to-end machine learning projects.
I encourage you to keep Googling and compare other platforms such as PyTorch, Keras and Theano. Eventually, you will be able to form your own opinion on which tools you like to work with most.