Amazon SageMaker Ground Truth Plus is a managed data labeling service that simplifies data labeling for machine learning (ML) applications. One common use case is semantic segmentation, an ML technique in computer vision that involves assigning class labels to individual pixels in an image. For example, in video footage captured by a moving vehicle, class tags may include vehicles, pedestrians, roads, traffic signals, buildings, or background. It provides a high-precision representation of the location of various objects in an image and is often used to create perception systems for autonomous vehicles or robotics. To build an ML model for semantic segmentation, it is first necessary to label a large volume of data at the pixel level. This labeling process is complex. It requires skilled taggers and considerable time; Some images may take up to 2 hours or more to be tagged accurately.
In 2019, we released an interactive ML-powered labeling tool called Auto-segment for Ground Truth that allows you to quickly and easily create high-quality segmentation masks. For more information, see Self-Segmentation Tool. This function works by allowing you to click on the top, left, bottom, and rightmost “extremes” of an object. An ML model running in the background will incorporate this user input and return a high-quality segmentation mask that is immediately displayed in the Ground Truth labeling tool. However, this feature only allows for four clicks. In some cases, the mask generated by ML may inadvertently miss parts of the image, such as around the boundary of an object where the edges are indistinct, or where color, saturation, or shadows blend into the surroundings.
Extreme point compression with a flexible number of corrective compressions
We have now improved the tool to allow additional clicks on boundary points, which provides real-time feedback to the ML model. This allows you to create a more accurate segmentation mask. In the following example, the initial segmentation result is inaccurate due to weak boundaries near the shadow. Importantly, this tool operates in a mode that allows real-time feedback; it doesn’t require you to tick all the points at once. Instead, you can first perform four mouse clicks, which will cause the ML model to produce a segmentation mask. You can then inspect this mask, find any possible inaccuracies, and then apply additional compressions as needed to “force” the model to the correct output.
Our previous labeling tool allowed us to insert exactly four mouse clicks (red dots). The initial segmentation result (shaded red area) is inaccurate due to weak boundaries near the shadow (bottom left of the red mask).
With our enhanced tagging tool, the user is again the first to make four mouse clicks (red dots in the image above). You then have the option to check the resulting segmentation mask (shaded red area in the image above). You can make additional mouse clicks (green dots in the image below) to make the model adjust the mask (shaded red area in the image below).
Compared to the original version of the tool, the extended version provides improved results when objects are deformable, non-convex, and vary in shape and appearance.
We simulated the performance of this improved tool on sample data by first running the baseline tool (with only four extreme clicks) to generate a segmentation mask and evaluated its mean intersection over unity (mIoU), a common measure of the accuracy of segmentation masks. We then applied simulated corrective compressions and evaluated the improvement in mIoU after each simulated compression. The following table summarizes these results. The first line shows the mIoU and the second line shows the error (which is given by 100% minus the mIoU). With just five extra mouse clicks, we can reduce the error of this task by 9%.
|.||.||Number of corrective clicks||.|
Integration with Ground Truth and Performance Profiling
To integrate this model with Ground Truth, we follow a standard architecture pattern as shown in the following diagram. First, we build the ML model into a Docker image and deploy it to Amazon Elastic Container Registry (Amazon ECR), a fully managed Docker container registry that makes it easy to store, distribute, and deploy container images. Using the SageMaker Inference Toolkit when building a Docker image allows us to easily leverage best practices for serving models and achieve low-latency inference. Next, we create an Amazon SageMaker real-time endpoint to host the model. We introduce an AWS Lambda function as a proxy in front of the SageMaker endpoint to offer different types of data transformation. Finally, we use Amazon API Gateway as a way to integrate with our front-end, the Ground Truth tagging application, to provide secure authentication to our back-end.
You can follow this general pattern for your own use cases to custom-build ML tools and integrate them into custom Ground Truth task interfaces. For more information, see Building a Custom Data Labeling Workflow with Amazon SageMaker Ground Truth.
After provisioning this architecture and deploying our model using the AWS Cloud Development Kit (AWS CDK), we evaluated the latency characteristics of our model with different types of SageMaker. This is very simple to do because we use SageMaker real-time inference endpoints to serve our model. SageMaker’s real-time invocation endpoints integrate seamlessly with Amazon CloudWatch and output metrics such as memory usage and model latency with no configuration required (see SageMaker Endpoint Invocation Metrics for details).
In the following figure, we show the ModelLatency metric emitted by SageMaker’s real-time inference endpoints. We can easily use different metric math functions in CloudWatch to show latency percentages like p50 or p90 latency.
The following table summarizes these results for our enhanced extreme compression tool for three semantic segmentation examples: p2.xlarge, p3.2xlarge, and g4dn.xlarge. Although the p3.2xlarge instance provides the lowest latency, the g4dn.xlarge instance provides the best cost-to-performance ratio. The g4dn.xlarge instance is only 8% (35 milliseconds) slower than the p3.2xlarge instance, but on an hourly basis it is 81% cheaper than p3.2xlarge (for SageMaker instance types and other details, see Amazon SageMaker Pricing: pricing).
|The type of SageMaker instance||p90 latency (ms)|
|2:||p3.2x large||424 year|
In this post, we presented an extension to Ground Truth automatic segmentation for semantic segmentation annotation tasks. While the original version of the tool allows you to perform exactly four mouse clicks, which forces the model to provide a high-quality segmentation mask, the extension allows you to perform corrective clicks and thereby update and guide the ML model to make better predictions. We’ve also presented a basic architectural pattern that you can use to deploy and integrate interactive tools into Ground Truth tagging UIs. Finally, we summarized model latency and showed how using SageMaker’s real-time inference endpoints makes it easy to monitor model performance.
To learn more about how this tool can reduce labeling cost and increase accuracy, visit Amazon SageMaker Data Labeling to start a consultation today.
About the authors
Jonathan Buck is a software engineer at Amazon Web Services working at the intersection of machine learning and distributed systems. His work includes building machine learning models and developing new software applications powered by machine learning to put the latest capabilities in the hands of customers.
Lee Erran Lee is an applied science manager for human-in-the-loop services, AWS AI, at Amazon. His research interests are 3D deep learning and vision and language representation learning. Previously, he was Senior Scientist at Alexa AI, Head of Machine Learning at Scale AI, and Chief Scientist at Pony.ai. Prior to that, he was with Uber ATG’s perception team and Uber’s machine learning platform team, working on machine learning for autonomous driving, machine learning systems and strategic AI initiatives. He began his career at Bell Labs and was an associate professor at Columbia University. He has co-taught tutorials at ICML’17 and ICCV’19 and organized several workshops at NeurIPS, ICML, CVPR, ICCV on Autonomous Driving, 3D Vision and Robotics, Machine Learning Systems and Adversarial Machine Learning. : He holds a PhD in computer science from Cornell University. He is a fellow of ACM and IEEE.