Alibaba Cloud MaxCompute SQL

MaxCompute SQL is suitable for various scenarios. The massive data (GB, TB, or EB level) must be processed based on an offline batch calculation.

MaxCompute SQL is suitable for various scenarios. The massive data (GB, TB, or EB level) must be processed based on an offline batch calculation. It takes several seconds or even minutes to schedule after a job is submitted. Therefore, MaxCompute SQL is preferred for services that process tens of thousands of transactions per second.

The MaxCompute SQL syntax is similar to SQL and can be considered as a subset of standard SQL. However, the MaxCompute SQL must not be confused with a database. It does not have database characteristics including transactions, primary key constraints, indexes, and so on. The maximum size of SQL in MaxCompute is 2 MB.

Reserved words

MaxCompute SQL considers the keywords of SQL statement as reserved words. If you use keywords for name tables, columns, or partitions, you must escape the keywords with the `` symbol, otherwise an error is occurred. Reserved words are case insensitive and the most common words used are as follows:

 % & && ( ) * +
- . / ; < <= <>
ADD AFTER ALL
ALTER ANALYZE AND ARCHIVE ARRAY AS ASC
BEFORE BETWEEN BIGINT BINARY BLOB BOOLEAN BOTH DECIMAL 
BUCKET BUCKETS BY CASCADE CASE CAST CFILE
CHANGE CLUSTER CLUSTERED CLUSTERSTATUS COLLECTION COLUMN COLUMNS
COMMENT COMPUTE CONCATENATE CONTINUE CREATE CROSS CURRENT
CURSOR DATA DATABASE DATABASES DATE DATETIME DBPROPERTIES
DEFERRED DELETE DELIMITED DESC DESCRIBE DIRECTORY DISABLE
DISTINCT DISTRIBUTE DOUBLE DROP ELSE ENABLE END
ESCAPED EXCLUSIVE EXISTS EXPLAIN EXPORT EXTENDED EXTERNAL
FALSE FETCH FIELDS FILEFORMAT FIRST FLOAT FOLLOWING
FORMAT FORMATTED FROM FULL FUNCTION FUNCTIONS GRANT
GROUP HAVING HOLD_DDLTIME IDXPROPERTIES IF IMPORT IN
INDEX INDEXES INPATH INPUTDRIVER INPUTFORMAT INSERT INT
INTERSECT INTO IS ITEMS JOIN KEYS LATERAL
LEFT LIFECYCLE LIKE LIMIT LINES LOAD LOCAL
LOCATION LOCK LOCKS LONG MAP MAPJOIN MATERIALIZED
MINUS MSCK NOT NO_DROP NULL OF OFFLINE
ON OPTION OR ORDER OUT OUTER OUTPUTDRIVER
OUTPUTFORMAT OVER OVERWRITE PARTITION PARTITIONED PARTITIONPROPERTIES PARTITIONS
PERCENT PLUS PRECEDING PRESERVE PROCEDURE PURGE RANGE
RCFILE READ READONLY READS REBUILD RECORDREADER RECORDWRITER
REDUCE REGEXP RENAME REPAIR REPLACE RESTRICT REVOKE
RIGHT RLIKE ROW ROWS SCHEMA SCHEMAS SELECT
SEMI SEQUENCEFILE SERDE SERDEPROPERTIES SET SHARED SHOW
SHOW_DATABASE SMALLINT SORT SORTED SSL STATISTICS STATUS STORED
STREAMTABLE STRING STRUCT TABLE TABLES TABLESAMPLE TBLPROPERTIES
TEMPORARY TERMINATED TEXTFILE THEN TIMESTAMP TINYINT TO
TOUCH TRANSFORM TRIGGER TRUE TYPE UNARCHIVE UNBOUNDED UNDO
UNION UNIONTYPE UNIQUEJOIN UNLOCK UNSIGNED UPDATE USE
USING UTC UTC_TMESTAMP VIEW WHEN WHERE WHILE DIV

Type conversion

MaxCompute SQL allows conversion between data types. The conversion methods include explicit type conversion and implicit type conversion.

  • Explicit conversions: Uses CAST to convert a value type.
  • Implicit conversions: MaxCompute automatically performs implicit conversions while running based on the context environment and conversion rules. Implicit conversion scope includes various operators, built-in functions, and so on.

Explicit conversion

Explicit conversions use CAST to convert a value type to another. The following table lists the types that can be explicitly converted in MaxCompute SQL.

 
From/To BIGINT DOUBLE STRING DATETIME BOOLEAN DECIMAL FLOAT
BIGINT N/A Yes Yes No Yes Yes Yes
DOUBLE Yes N/A Yes No Yes Yes Yes
STRING Yes Yes N/A Yes Yes Yes Yes
DATETIME No No Yes N/A No No No
BOOLEAN Yes Yes Yes No N/A Yes Yes
DECIMAL Yes Yes Yes No Yes N/A Yes
FLOAT Yes Yes Yes No Yes Yes N/A

Y means can be converted. N means cannot be converted. – means conversion is not required.

Example:

select cast(user_id as double) as new_id from user;
select cast('2015-10-01 00:00:00' as datetime) as new_date from user;

Implicit conversion and scope

Implicit type conversion is an automatic type conversion performed by MaxCompute according to the usage context and type conversion rules. The following table lists the types that can be implicitly converted using MaxCompute.

 
BOOLEAN TINYINT SMALLINT INT BIGINT FLOAT double Decimal string varchar timestamp binary
boolean to Yes No No No No No No No No No No No
tinyint to No Yes Yes Yes Yes Yes Yes Yes Yes Yes No No
smallint to No No Yes Yes Yes Yes Yes Yes Yes Yes No No
int to No No Yes Yes Yes Yes Yes Yes Yes Yes No N/A
bigint to No No No No Yes Yes Yes Yes Yes Yes No No
float to No No No No Yes Yes Yes Yes Yes Yes No N/A
double to No No No No No No Yes Yes Yes Yes No No
decimal to No No No No No No No Yes Yes Yes No No
string to No No No No No No Yes Yes Yes Yes No No
varchar to No No No No Yes Yes Yes Yes No No N/A N/A
timestamp to No No No No No No No No Yes Yes Yes No
binary to No No No No No No No No No No No Yes

Y means can be converted. N means cannot be converted.

Common use:

select user_id+age+'12345',
concat(user_name,user_id,age)
from user;

 

  • Implicit conversions under special relational operators

    Special relational operators include LIKE, RLIKE, and IN.

    • The usage of LIKE and RLIKE is as follows:
      source like pattern; source rlike pattern;

      The following illustrates the notes for LIKE and RLIKE in implicit conversions:

      • The source and pattern parameters of LIKE and RLIKE can only be of the STRING type.
      • Other types can neither be involved in the operations nor be implicitly converted to the STRING type.
    • The usage of IN is as follows:

      key in (value1, value2, …)

      Implicit conversion rules of IN:

      • Data in the value column must be consistent.
      • To compare keys and values, if BIGINT, DOUBLE, and STRING types are compared, convert them to DOUBLE type. If the DATETIME and STRING types are compared, convert them to DATETIME type. Conversions between other types are not allowed.
  • Implicit conversions under arithmetic operators

    Arithmetic operators include addition (+), subtraction (-), multiplication (*), division (/), modulo (%), unary plus (+), and unary minus (-). Their implicit conversion rules are described as follows:

    • Only the STRING, BIGINT, DOUBLE, and DECIMAL types can be involved in the operation.
    • The STRING type are implicitly converted to the DOUBLE type before the operation.
    • When the BIGINT and DOUBLE types are involved in the operation, the BIGINT type is implicitly converted to the DOUBLE type.
    • The DATETIME and BOOLEAN types are not allowed in the arithmetic operation.
  • Logical operators include AND, OR, and NOT. Their implicit conversion rules are as follows:

    • Only the BOOLEAN type can be involved in the logical operation.
    • Other types are not allowed in the logical operation, and cannot be implicitly converted to other types.Implicit conversions under logical operators

 

Partitioned table

MaxCompute SQL supports partitioned tables. Specify the partition as it simplifies the operation. For example, improve SQL running efficiency, reduce the cost, and so on.

To improve MaxCompute’s processing efficiency, you can specify a partition when creating a table. Specifically, several fields in the table can be specified as partition columns. A partition is comparable in terms of functionality to a directory under a file system.

In MaxCompute, each field can be specified as a separate partition. Alternatively, you can specify multiple fields of the table as a partition whereby they function similarly to multi-level directories.

If the partitions to be accessed are specified when you use data, then only corresponding partitions are read and a full table scan is avoided, improving processing efficiency while reducing costs.

Partition Tables

MaxCompute 2.0 extends the support for partitioning types, currently, MaxCompute supports tinyint, smallint, Int, bigint, varchar, and string partition types.

Note In MaxCompute versions earlier than 2.0, only STRING partition type is supported. For historical reasons, although you can specify a partition type of bigint, however, except for the schema representation of the table as bigint, any other case is actually handled as a string. Examples:

create table parttest (a bigint) partitioned by (pt bigint);
insert into parttest partition(pt) select 1, 2 from dual;
insert into parttest partition(pt) select 1, 10 from dual;
select * from parttest where pt >= '2';

After the execution, the returned result is only one line, because 10 was treated as a STRING and compared with 2, meaning no result can be returned.

Restrictions

When using a partition, the following restrictions apply:

  • The maximum number of partition levels for a single table is 6 levels.
  • The maximum number of single table partitions is 60,000.
  • The maximum number of query partitions for a query is 10,000.

Examples:

-- create a two-level partition table with the date as the level one partition and the region as the level two partition
create table src (key string, value bigint) partitioned by (pt string,region string);

When querying, use the partition column as a filter condition in the WHERE condition filter:

select * from src where pt='20170601' and region='hangzhou'-- This example is the correct method of using WHERE conditional filter. When MaxCompute generates a query plan, only data of the region 'hangzhou' under the '20170601' partition is accessed.
select * from src where pt = 20170601;   -- This example is an incorrect method of using the WHERE conditional filter. In this example, the effectiveness of the partition filter cannot be guaranteed. pt is a STRING type. When the STRING type is compared with BIGINT type (20170601), MaxCompute converts both to DOUBLE type, and loss of precision occurs.

For some of the operation commands for MaxCompute, there is a difference in the syntax when you process the partition and non-partition tables.

UNION ALL

To be involved in a UNION ALL operation, the data type of columns, column numbers, and column names must be consistent, otherwise an error occurs.

 

Operators

Operators are used to perform program code operations. This article introduces four types of operators: relational operator, arithmetic operator, bit operator and logical operator.

Relational operators

 
Operator Description
A=B If A or B is NULL, NULL is returned. If A is equal to B, TRUE is returned; otherwise FALSE is returned.
A<>B If A or B is NULL, NULL is returned. If A is not equal to B, TRUE is returned; otherwise FALSE is returned.
A<B If A or B is NULL, NULL is returned. If A is less than B, TRUE is returned; otherwise FALSE is returned.
A<=B If A or B is NULL, NULL is returned. If A is not greater than B, TRUE is returned; otherwise FALSE is returned.
A>B If A or B is NULL, NULL is returned. If A is greater than B, TRUE is returned; otherwise FALSE is returned.
A>=B If A or B is NULL, NULL is returned; if A is not less than B, TRUE is returned; otherwise, FALSE is returned.
A IS NULL If A is NULL, TRUE is returned; otherwise, FALSE is returned.
A IS NOT NULL If A is NULL, TRUE is returned; otherwise FALSE is returned.
A LIKE B If A or B is NULL, NULL is returned. If String A matches the SQL simple regular B TRUE is returned; otherwise FALSE is returned. The ( %) character in B matches an arbitrary number of characters and the (_) character in B matches any character in A. To match (%) or(_), use by the escape characters (‘%’)’and (‘_’).

‘aaa’ like‘a_’= TRUE 
‘aaa’ like‘a%’ = TRUE
‘aaa’ like‘aab’= FALSE 
‘a%b’ like‘a\\%b’= TRUE 
‘axb’ like ‘a\\%b’= FALSE
A RLIKE B A is a string, and B is a string constant regular expression. If any substring of A matches the Java regular expression B, TRUE is returned; otherwise FALSE is returned. If expression B is empty, report an error and exit. If expression A or B is NULL, NULL is returned.
A IN B B is a set. If expression A is NULL, NULL is returned. If expression A is in expression B, TRUE is returned; otherwise FALSE is returned. If expression B has only one element NULL, that is, A IN (NULL), return NULL. If expression B contains NULL element, take NULL as the type of other elements in B set. B must be a constant and at least has one element; all types must be consistent.
BETWEEN AND The expression is A [NOT] BETWEEN B AND C. Empty if A, B, or C is empty. TRUE if A is larger than or equal to B and less than or equal to C; otherwise FALSE is returned.

The common use:

select * from user where user_id = '0001'; 
select * from user where user_name <> 'maggie'; 
select * from user where age > ‘50’; 
select * from user where birth_day >= '1980-01-01 00:00:00'; 
select * from user where is_female is null; 
select * from user where is_female is not null; 
select * from user where user_id in (0001,0010); 
select * from user where user_name like 'M%';

The Double values in MaxCompute are different in precision. For this reason, we do not recommend using the equal sign for comparison between two Double data. You can subtract two Double types, and then take the absolute value into consideration. When the absolute value is small enough, the two double values are considered equal.

Example:

abs(0.9999999999 - 1.0000000000) < 0.000000001
-- 0.9999999999 and 1.0000000000 have the precision of 10 decimal digits, while 0.000000001 has the precision of 9 decimal digits.
-- It is considered that 0.9999999999 is equal to 1.0000000000.

Arithmetic operators

 
Operator Description
A + B If expression A or B is NULL, NULL is returned; otherwise the result of A+B is returned.
A – B If expression A or B is NULL, NULL is returned; otherwise the result of A – B is returned.
A * B If expression A or B is NULL, NULL is returned; otherwise result of A * B is returned.
A / B If expression A or B is NULL, NULL is returned; otherwise the result of A / B is returned. If Expression A and B are BIGINT types, the result is DOUBLE type.
A % B If expression A or B is NULL, NULL is returned; otherwise the reminder result from dividing A by B is returned.
+A Result A is returned.
-A If expression A is NULL, NULL is returned; otherwise –A is returned.

The common use:

select age+10, age-10, age%10, -age, age*age, age/10 from user;

Bitwise operators

 
Operator Description
A & B Return the result of bitwise AND of A and B. For example: 1&2, return 0; 1&3, return 1; Bitwise AND of NULL and other values, all return NULL. Expression A and B must be BIGINT .
A | B Return the result of bitwise OR of A and B. For example: 1|2, return3. 1|3, return 3. Bitwise OR of NULL and other values, all return NULL. Expression A and B must be BIGINT type.

Logical operators

 
Operator Description
A and B TRUE and TRUE=TRUE
TRUE and FALSE=FALSE
FALSE and TRUE=FALSE
FALSE and NULL=FALSE
NULL and FALSE=FALSE
TRUE and NULL=NULL
NULL and TRUE=NULL
NULL and NULL=NULL
A or B TRUE or TRUE=TRUE
TRUE or FALSE=TRUE
FALSE or TRUE=TRUE
FALSE or NULL=NULL
NULL or FALSE=NULL
TRUE or NULL=TRUE
NULL or TRUE=TRUE
NULL or NULL=NULL
NOT A If A is NULL, NULL is returned.
If A is TRUE, FALSE is returned.
If A is FALSE, TRUE is returned.

 

Summary

In this blog, you’ve got to see a bit more about Alibaba Cloud MaxCompute SQL to take advantage of all of the features included in MaxCompute to help kickstart your data processing and analytics workflow.

Subscribe to our newsletter
Sign up here to get the latest news, updates and special offers delivered directly to your inbox.
You can unsubscribe at any time

Leave A Reply

Your email address will not be published.

This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Accept Read More