Data Science using AWS SageMaker : Introduction

Build, train, and deploy machine learning (ML) models for any use case with fully managed infrastructure, tools, and workflows

Feb 02, 2022

AWS SageMaker is a service that can enable you to build, train and deploy machine learning applications on the cloud.

It is a proprietary and a platform as a service offered by AWS. It can be used by various types of users like business analysts, data scientists, ML ops engineers.

A lot of use cases can be implemented using the visual interfaces by business analysts in AWS SageMaker or if you're a citizen data scientist who plays with data day in and day out and models, you can use the SageMaker studio. For MLOps engineers. It offers a lot of extensive tooling to support the creation of deployments, generation of API, and observing machine learning and data.

It is an extremely accessible service. It is built natively on top of the AWS ecosystem. So it offers very nice integration with anything in AWS systems. For example, S3 buckets, RDS databases, and things like that. It enables a lot of high-performance computing because underlying AWS SageMaker, you can utilize different compute engines.

You can use GPU instances. You can spin down your machine. You can create on-demand infrastructure. So, effectively it does, leads to a lot of productivity boosts and the TCO is also lower. There are certain interesting services, for example, it enables automated data labeling.

So you can save up to 40% reduction in time when it comes to Data labeling costs. And it supports a wide variety of frameworks and toolings, for example, it is built on Jupyter. There's Tensorflow PyTorch MXNet you can use Scikit-Learn or any of the hugging faces library. Needless to say, Python is available as a standard programming language on AWS SageMaker.

You can collect and prepare yourself. You can build your pipelines. You can do training and hyper-parameter tuning jobs. And finally, you can deploy. There are different kinds of interfaces available. The overall interface is extremely easy and you can get into more purpose-based built interfaces like canvas, which is more of a visual design.

And we'll look at that in the next video. Alternatively, you can look to AWS SageMaker studio, which is an all-inclusive Jupiter, which will enable you to work in a very similar kind of setup. You can connect to all kinds of data stores, for example, S3 buckets, RDS databases, and they can become your primary source.

You can have features stores, which can be a purpose-built feature store for ML serving features and both in real-time. And in batch, there is automated data labeling. There's a service called Amazon SageMaker ground truth, which can make your life relatively easy when you want to label data.

And this could come a lot handy in vision or NLP use cases around model creation. There are a lot of built-in algorithms, which you can just use out of the box. You have one-click Jupiter notebooks, and there are a lot of auto ML features also available. For example, there's a service which is called AWS SageMaker autopilot, which is going to automatically build a train and tune the machine learning lifestyle.

You can utilize AWS spot instances, and that will enable you to train on machines, which are probably lying around in data centers though. Their reliability is not guaranteed, or they can be shut down in case of any severe demands, but for all general-purpose computing, specifically in development environment and on-demand, nature of machine learning, it could lead to a significantly high-cost reduction when it comes to machine learning.

You can employ strategies such as check, model saves, and checkpoints, to be fairer with your process and avoid any sort of cleaning losses in terms of if your computer is scared or worried, it can be an extremely efficient tool to save money on your machine. Learning training, like seconds.

Please continue in the video to understand and get a general idea of the interface live.

CafeIO

Discussion about this post