Alibaba Cloud DataWorks – Data Integration

The Alibaba Cloud Data Integration is a data synchronization platform that provides stable, efficient, and elastically scalable services.

The Alibaba Cloud Data Integration is a data synchronization platform that provides stable, efficient, and elastically scalable services. Data integration is designed to implement fast and stable data migration and synchronization between multiple heterogeneous data sources in complex network environments.

Offline (batch) data synchronization

The offline (batch) data channel provides a set of abstract data extraction plug-ins (Readers) and data writing plug-ins (Writers) by defining the source and target databases and datasets. Also, it designs a set of simplified intermediate data transmission formats based on the framework to transfer data between any structured and semi-structured data sources.

DataWorks

Supported data source types

Data integration supports diverse data sources as follows:

  • Text storage (FTP, SFTP, OSS, Multimedia files),
  • Database (RDS,DRDS,MySQL,PostgreSQL),
  • NoSQL (Memcache,Redis,MongoDB,HBase),
  • Big data (MaxCompute,AnalyticDB,HDFS),
  • MPP database (HybridDB for MySQL).

 

Data Integration is a stable, efficient, and elastically scalable data synchronization platform that Alibaba Group provides to external users. It provides offline (batch) data access channels for Alibaba Cloud’s big data computing engines, including MaxCompute, AnalyticDB for MySQL 2.0, and Object Storage Service (OSS).

The following table lists data source types supported by data integration:

 
Data source category Data source type Extraction (reader) Import (writer) Supported methods Supported types
Relational databases MySQL Yes. Yes. Wizard and script Alibaba Cloud and on-premise
Relational databases SQL Server Yes. Yes. Wizard and script Alibaba Cloud and on-premise
Relational database PostgreSQL Yes. Yes. Wizard and script Alibaba Cloud and on-premise
Relational databases Oracle Yes. Yes. Wizard and script On-premise
Relational databases DRDS Yes. Yes. Wizard and script Alibaba Cloud
Relational databases- DB2 Yes. Yes. Script On-premise
Relational databases DM Yes Yes Script On-premise
Relational databases RDS for PPAS Yes Yes Script Alibaba Cloud
MPP HybridDB for MySQL Yes Yes Wizard and script Alibaba Cloud
MPP HybridDB for PostgreSQL released Yes Yes Wizard and script Alibaba Cloud
Big data storage MaxCompute (Corresponding data source name: MaxCompute) Yes. Yes. Wizard and script Alibaba Cloud
Big data storage DataHub No Yes. Script Alibaba Cloud
Big data storage ElasticSearch No Yes. Script Alibaba Cloud
Big data storage AnalyticDBAnalyticDB for MySQL 2.0 Yes Yes Wizard and script Alibaba Cloud
Unstructured storage OSS Yes. Yes. Wizard and script Alibaba Cloud
Unstructured storage HDFS Yes Yes. Script On-premise
Unstructured storage FTP Yes. Yes. Wizard and script On-premise
Message queue LogHub Yes. Yes. Wizard and script Alibaba Cloud
NoSQL HBase Yes. Yes. Script Alibaba Cloud and on-premise
NoSQL MongoDB Yes Yes. Script Alibaba Cloud and on-premise
NoSQL Memcache No Yes. Script Alibaba Cloud and on-premise Memcache
NoSQL Table Store (corresponding data source name: OTS) Yes Yes. Script Alibaba Cloud
NoSQL OpenSearch No Yes. Script Alibaba Cloud
NoSQL Redis No Yes. Script Alibaba Cloud and on-premise
Performance testing Stream Yes. Yes. Script
Note: The data sources configured information varies greatly from each other, and the parameter configuration information must be queried in detail based on the actual scenario. For this reason, the detailed parameter descriptions are available on the data source configuration and job configuration pages, which can be queried and used as needed.

Synchronous development description

Synchronous development provides both wizard and script modes.

  • Wizard: Provides a visualized development guide and comprehensive details about data sync task configuration. This mode is cost-effective, but lacks certain advanced functions.
  • Script: Allows you to directly write a data sync JSON script for completing the data sync development. It is suitable for advanced users, but has a high learning cost. It also provides diverse and flexible functions for delicacy configuration management.

Note

  • The code generated in wizard mode can be converted to script mode code. The code conversion is unidirectional, and cannot be converted back to wizard mode format. This is because the script mode capabilities are a superset of the wizard mode.
  • Always configure the data source and create the target table before writing codes.

Data Integration

Description of network types

The networks can be classified as classic network, VPC network, and local IDC network (planning).

  • Classic network: A network that is centrally deployed on the Alibaba Cloud public infrastructure network planned and managed by Alibaba Cloud. This network type suits customers that have ease-of-use requirements.
  • VPC network: An isolated network environment created on Alibaba Cloud. In this network type, you have full control over the virtual network, including customizing the IP address range, partitioning network segments, and configuring routing tables and gateways.
  • Local IDC network: The network environment of your server room, which is isolated from the Alibaba Cloud network.

Note:

  • The public network access is supported. The public network access only selects the classic network as the network type. Note the public network bandwidth speed and relevant network traffic charges when using this network type. We do not recommend this configuration except in special cases.
  • Network connections are planned for data synchronization, you can use the locally added resource + Script Mode scheme for synchronous data transfer, you can also use the Shell + DataX scheme.
  • The Virtual Private Cloud (VPC) creates an isolated network environment that allows you to customize the IP address range, network segments, and gateways. The VPC applications have expanded the scope of VPC security, as a result data integration provides RDS for MySQL, RDS for SQL Server, and RDS for PostgreSQL and eliminates the need to purchase extra ECSs that reside on the same network as the VPC. Instead, the system guarantees interconnectivity by detecting devices automatically through the reverse proxy. The VPC supports other Alibaba Cloud databases including PPAS, OceanBase, Redis, MongoDB, Memcache, TableStore, and HBase. For any non-RDS data sources, an ECS on the same network is required for configuring data integration synchronization tasks on the VPC network and ensuring interconnectivity.

Limits

  • Supports the following data synchronization types: structured (such as RDS and DRDS), semi-structured, and non-structured, such as OSS and TXT.The specified synchronization data must be abstracted as structured data. That is, data integration supports data synchronization that can transmit data that can be abstracted to a logical two-dimensional table, other fully unstructured data, such as a MP3 section stored in OSS. Data integration does not support synchronizing dataset to MaxCompute, which is still in development.
  • Supports data synchronization and exchange between single region and cross-region data storage.

    For certain regions, cross-region data transmission is supported, but not guaranteed by the classic network. If you need to use this function, while the tested classic network is disconnected, consider using the public network connection instead.

  • Only data synchronization (transmission) is performed and no consumption plans of data stream is provided.

Summary

In this blog, you’ve got to see a bit more about Alibaba Cloud DataWorks – Data Integration to take advantage of all of the features included in DataWorks to help kickstart your data processing and analytics workflow.

Subscribe to our newsletter
Sign up here to get the latest news, updates and special offers delivered directly to your inbox.
You can unsubscribe at any time

Leave A Reply

Your email address will not be published.

This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Accept Read More