Public Cloud Data Streaming Comparison: Alibaba Cloud, AWS, Azure, Google Cloud, IBM Cloud

April 25, 2020 0 comments
Public Cloud Data Streaming Comparison: Alibaba Cloud, AWS, Azure, Google Cloud, IBM Cloud

Streaming data is becoming the next wave in the data analytics and machine learning landscape. The key reason behind it is that processing only large volumes of data is not sufficient but the ability to process it in a short period of time and making real-time insights out of it is essential so that a business can react to the changing environment in real-time.

The trend of cloud computing requires the streaming data processing engines to be highly scalable and robust towards faults. Cloud-based data stream processing systems, in particular, are made to scale dynamically to hundreds of computing nodes and cope with diverse workloads automatically.

Understanding the importance of data streaming with the increasing variety of different use cases, organizations are adopting hybrid platforms so that they can leverage the advantages of both – batch and streaming data analytics.

To help enterprises in determining the best data streaming services, we have compiled a list of the most-feature-rich tools for you and your business.

Alibaba Cloud

Alibaba Cloud DataHub is a real-time data distribution platform designed to process streaming data. It offers features such as publish, subscribe, and distribute streaming data. It helps to easily analyze and create applications based on streaming data.

Based on Alibaba Cloud’s Apsara platform, DataHub delivers high availability, low latency, high scalability, and high throughput. Seamlessly connected to Alibaba Cloud’s stream computing engine, StreamCompute, DataHub allows you to use SQL to analyze streaming data. It can also distribute streaming data to various cloud products, such as MaxCompute (formerly known as ODPS) and OSS.

See architecture of Alibaba big data demo system.


Source: Alibaba Cloud

In the figure, the architecture comprises a data source system, a data warehouse, a big data platform, a web/app platform, process scheduling, data processing and a real-time data streaming platform. Here, real-time data is processed through DataHub + StreamCompute.

With this, varied data processing results are produced on real-time basis, involving real-time charts, statistics, and other information. Overall, Alibaba’s DataHub is great if you want to stream complex data.

ConceptsAlibaba Cloud DataHub
Data WarehouseMaxCompute
Data RetentionDefault – 24 hours
SDK SupportMaxCompute Tunnel SDK
ConfigurationWriter plug-in
Real-time StoreApsaraDB

Read reviews of Alibaba Cloud.


AWS Kinesis processes data in real-time. The key feature built-in Kinesis is its potential to process hundreds of terabytes of data streams in high volume per hour. It has the power to simplify the process of development of certain apps through real-time decision making on business operations with streaming data.

it consists of key concepts for stream storage and an API to implement data producers and data consumers. The data producer sends the data as they are generated, and the data consumer retrieves the data in a stream as it is generated.

AWS charges are based on per hour basis of each stream work partition and per volume of data that flows through the stream.

See the diagram below summarizing key concepts of Amazon Kinesis.

Source: AWS

When it comes to features, Amazon Kinesis supports Android, Java, Go and .NET. When it comes to performance, it writes each message synchronously to three different machines. However, it allows only days/shards for configuration.

ConceptsAWS Kinesis
Data WarehouseAthena, Redshift
Data RetentionDefault – 24 hours, 1-7 days (maximum 7 days)
SDK SupportAWS SDK supports Android, Java, Go, .NET
Real-time StoreAmazon DynamoDB
CostPay and use

Read reviews of AWS Kinesis data streams.


Stream Analytics by Azure is a fully managed, event processing engine for real-time analytics, be it a data stream or multiple streams from sources such as social media, sensors, web data sources, and other applications. It delivers low latency, high throughput, and high scalability.

Stream Analytics is designed on a pull-based communication model that offers built-in recovery and checkpointing abilities. The service can also protect data from downstream failure. It supports input types: Stream and Reference data and source types: Azure Event Hubs and Azure Blob Storage.

The diagram summarizes how data is received, analyzed and sent for other actions in Stream Analytics.


Source: Microsoft

The Event Hubs in Stream Analytics can integrate millions of events per second of various formats. Blob Storage can also store data and direct it to Stream Analytics for operations. Currently, Stream Analytics is charged on the basis of volume of data processed and the number of stream units used.

ConceptsAzure Stream Analytics
Data WarehouseAzure SQL
Data Retention
SDK SupportManagement .Net SDK
Real-time StoreAzure CosmosDB

Read reviews of Azure Streaming Analytics.

Google Cloud

Cloud Dataflow is a managed, data processing service that uses data pipelines to ingest, transform and analyze both real-time and batch data. Based on Apache Beam, the service supports Python and Java jobs.

In Dataflow, the events pass through three steps: validation, enrichment, and ingestion. This service streams, processes and stores over 120,000 events per second with a very low latency. Every incoming event is validated and written in partitioned tables in BigQuery.

See the process of dataflow stream and batch processing below.


Source: Google

Google Cloud Dataflow is a great choice for organizations willing to do production-level data processing in the cloud. Users are charged in per-second increments which is based on the actual use of the service. Any other additional Google Cloud resource consumption is billed per that service.

ConceptsGoogle Dataflow
Data WarehouseBigQuery
Data Retention
SDK SupportApache Beam SDK
Real-time StoreCloud Bigtable
CostBased on the actual use of Dataflow batch or streaming workers

Read reviews of Google Cloud Dataflow.

IBM Cloud

IBM Streaming Analytics can manage high data rates and perform analysis with low latency. It can be used to ingest, analyze and monitor data coming from real-time data sources. With IBM Streams, companies can view information and events as they unfold.

The image below summarizes IBM’s Streaming Analytics’ architecture.


Source: IBM

The architecture offers dynamic approach to resource allocation, i.e. organizations can define the maximum number of nodes required to use in their environment, and the service will scale up or down accordingly. This ensures that a company pays only for the resource it uses, while effortlessly monitoring, managing and making informed decisions.

ConceptsIBM Streaming Analytics
Data WarehouseIBM Db2 Warehouse
Data Retention
SDK SupportEclipse SDK
Real-time StoreIBM Cloud Object Storage
CostBased on instance per hour

Read reviews of IBM Streaming Analytics.

The time is NOW!

The streaming data architecture is in a constant evolution phase. So, before running off to pick any of these solutions, it is important to get a deep understanding of the existing systems and get a clear picture of it. It would be best to note that all of them are great at what they do in their way.

The question however is which one is right for you. To answer this, you must go through the features of all of them and see which one suits best according to your use case and available resources.

Brief comparison: Alibaba Cloud vs AWS vs Azure vs Google Cloud vs IBM Cloud

ConceptsAlibaba CloudAWSAZUREGoogle CloudIBM Cloud
Data WarehouseMaxComputeAthena, RedshiftAzure SQLBigQueryIBM Db2 Warehouse
Data RetentionDefault – 24 hoursDefault – 24 hours, 1-7 days (maximum 7 days)
SDK SupportMaxCompute Tunnel SDKAWS SDK supports Android, Java, Go, .NETManagement .Net SDKApache Beam SDKEclipse SDK
ConfigurationWriter plug-inDays/Shards
Real-time StoreApsaraDBAmazon DynamoDBAzure CosmosDBCloud BigtableIBM Cloud Object Storage
CostPay-As-You-GoPay and usePay-As-You-GoBased on the actual use of Dataflow batch or streaming workersBased on instance per hour


Ashok kuikel

Hi, I am Ashok Kuikel, WordPress Developer for WordPress Community. While Cloud Computing Associate and Alibaba MVP and ACA for Cloud Professional.

You can follow me on Social Media, GitHub, and via my Blog Channels.

Leave a Reply

Your email address will not be published. Required fields are marked *

Articles and Tutorials

We love writing about WordPress and latest plugins tutorials, WooCommerce stats, and much more.