Alibaba Cloud MaxCompute SQL
MaxCompute SQL is suitable for various scenarios. The massive data (GB, TB, or EB level) must be processed based on an offline batch calculation.
MaxCompute SQL is suitable for various scenarios. The massive data (GB, TB, or EB level) must be processed based on an offline batch calculation. It takes several seconds or even minutes to schedule after a job is submitted. Therefore, MaxCompute SQL is preferred for services that process tens of thousands of transactions per second.
The MaxCompute SQL syntax is similar to SQL and can be considered as a subset of standard SQL. However, the MaxCompute SQL must not be confused with a database. It does not have database characteristics including transactions, primary key constraints, indexes, and so on. The maximum size of SQL in MaxCompute is 2 MB.
Reserved words
MaxCompute SQL considers the keywords of SQL statement as reserved words. If you use keywords for name tables, columns, or partitions, you must escape the keywords with the ``
symbol, otherwise an error is occurred. Reserved words are case insensitive and the most common words used are as follows:
% & && ( ) * +
- . / ; < <= <>
ADD AFTER ALL
ALTER ANALYZE AND ARCHIVE ARRAY AS ASC
BEFORE BETWEEN BIGINT BINARY BLOB BOOLEAN BOTH DECIMAL
BUCKET BUCKETS BY CASCADE CASE CAST CFILE
CHANGE CLUSTER CLUSTERED CLUSTERSTATUS COLLECTION COLUMN COLUMNS
COMMENT COMPUTE CONCATENATE CONTINUE CREATE CROSS CURRENT
CURSOR DATA DATABASE DATABASES DATE DATETIME DBPROPERTIES
DEFERRED DELETE DELIMITED DESC DESCRIBE DIRECTORY DISABLE
DISTINCT DISTRIBUTE DOUBLE DROP ELSE ENABLE END
ESCAPED EXCLUSIVE EXISTS EXPLAIN EXPORT EXTENDED EXTERNAL
FALSE FETCH FIELDS FILEFORMAT FIRST FLOAT FOLLOWING
FORMAT FORMATTED FROM FULL FUNCTION FUNCTIONS GRANT
GROUP HAVING HOLD_DDLTIME IDXPROPERTIES IF IMPORT IN
INDEX INDEXES INPATH INPUTDRIVER INPUTFORMAT INSERT INT
INTERSECT INTO IS ITEMS JOIN KEYS LATERAL
LEFT LIFECYCLE LIKE LIMIT LINES LOAD LOCAL
LOCATION LOCK LOCKS LONG MAP MAPJOIN MATERIALIZED
MINUS MSCK NOT NO_DROP NULL OF OFFLINE
ON OPTION OR ORDER OUT OUTER OUTPUTDRIVER
OUTPUTFORMAT OVER OVERWRITE PARTITION PARTITIONED PARTITIONPROPERTIES PARTITIONS
PERCENT PLUS PRECEDING PRESERVE PROCEDURE PURGE RANGE
RCFILE READ READONLY READS REBUILD RECORDREADER RECORDWRITER
REDUCE REGEXP RENAME REPAIR REPLACE RESTRICT REVOKE
RIGHT RLIKE ROW ROWS SCHEMA SCHEMAS SELECT
SEMI SEQUENCEFILE SERDE SERDEPROPERTIES SET SHARED SHOW
SHOW_DATABASE SMALLINT SORT SORTED SSL STATISTICS STATUS STORED
STREAMTABLE STRING STRUCT TABLE TABLES TABLESAMPLE TBLPROPERTIES
TEMPORARY TERMINATED TEXTFILE THEN TIMESTAMP TINYINT TO
TOUCH TRANSFORM TRIGGER TRUE TYPE UNARCHIVE UNBOUNDED UNDO
UNION UNIONTYPE UNIQUEJOIN UNLOCK UNSIGNED UPDATE USE
USING UTC UTC_TMESTAMP VIEW WHEN WHERE WHILE DIV
Type conversion
MaxCompute SQL allows conversion between data types. The conversion methods include explicit type conversion and implicit type conversion.
- Explicit conversions: Uses CAST to convert a value type.
- Implicit conversions: MaxCompute automatically performs implicit conversions while running based on the context environment and conversion rules. Implicit conversion scope includes various operators, built-in functions, and so on.
Explicit conversion
Explicit conversions use CAST to convert a value type to another. The following table lists the types that can be explicitly converted in MaxCompute SQL.
From/To | BIGINT | DOUBLE | STRING | DATETIME | BOOLEAN | DECIMAL | FLOAT |
---|---|---|---|---|---|---|---|
BIGINT | N/A | Yes | Yes | No | Yes | Yes | Yes |
DOUBLE | Yes | N/A | Yes | No | Yes | Yes | Yes |
STRING | Yes | Yes | N/A | Yes | Yes | Yes | Yes |
DATETIME | No | No | Yes | N/A | No | No | No |
BOOLEAN | Yes | Yes | Yes | No | N/A | Yes | Yes |
DECIMAL | Yes | Yes | Yes | No | Yes | N/A | Yes |
FLOAT | Yes | Yes | Yes | No | Yes | Yes | N/A |
Y means can be converted. N means cannot be converted. – means conversion is not required.
Example:
select cast(user_id as double) as new_id from user;
select cast('2015-10-01 00:00:00' as datetime) as new_date from user;
Implicit conversion and scope
Implicit type conversion is an automatic type conversion performed by MaxCompute according to the usage context and type conversion rules. The following table lists the types that can be implicitly converted using MaxCompute.
BOOLEAN | TINYINT | SMALLINT | INT | BIGINT | FLOAT | double | Decimal | string | varchar | timestamp | binary | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
boolean to | Yes | No | No | No | No | No | No | No | No | No | No | No |
tinyint to | No | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | No | No |
smallint to | No | No | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | No | No |
int to | No | No | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | No | N/A |
bigint to | No | No | No | No | Yes | Yes | Yes | Yes | Yes | Yes | No | No |
float to | No | No | No | No | Yes | Yes | Yes | Yes | Yes | Yes | No | N/A |
double to | No | No | No | No | No | No | Yes | Yes | Yes | Yes | No | No |
decimal to | No | No | No | No | No | No | No | Yes | Yes | Yes | No | No |
string to | No | No | No | No | No | No | Yes | Yes | Yes | Yes | No | No |
varchar to | No | No | No | No | Yes | Yes | Yes | Yes | No | No | N/A | N/A |
timestamp to | No | No | No | No | No | No | No | No | Yes | Yes | Yes | No |
binary to | No | No | No | No | No | No | No | No | No | No | No | Yes |
Y means can be converted. N means cannot be converted.
Common use:
select user_id+age+'12345',
concat(user_name,user_id,age)
from user;
- Implicit conversions under special relational operators
Special relational operators include LIKE, RLIKE, and IN.
- The usage of LIKE and RLIKE is as follows:
source like pattern; source rlike pattern;
The following illustrates the notes for LIKE and RLIKE in implicit conversions:
- The source and pattern parameters of LIKE and RLIKE can only be of the STRING type.
- Other types can neither be involved in the operations nor be implicitly converted to the STRING type.
-
The usage of IN is as follows:
key in (value1, value2, …)
Implicit conversion rules of IN:
- Data in the value column must be consistent.
- To compare keys and values, if BIGINT, DOUBLE, and STRING types are compared, convert them to DOUBLE type. If the DATETIME and STRING types are compared, convert them to DATETIME type. Conversions between other types are not allowed.
- The usage of LIKE and RLIKE is as follows:
- Implicit conversions under arithmetic operators
Arithmetic operators include addition (+), subtraction (-), multiplication (*), division (/), modulo (%), unary plus (+), and unary minus (-). Their implicit conversion rules are described as follows:
- Only the STRING, BIGINT, DOUBLE, and DECIMAL types can be involved in the operation.
- The STRING type are implicitly converted to the DOUBLE type before the operation.
- When the BIGINT and DOUBLE types are involved in the operation, the BIGINT type is implicitly converted to the DOUBLE type.
- The DATETIME and BOOLEAN types are not allowed in the arithmetic operation.
-
Logical operators include AND, OR, and NOT. Their implicit conversion rules are as follows:
- Only the BOOLEAN type can be involved in the logical operation.
- Other types are not allowed in the logical operation, and cannot be implicitly converted to other types.Implicit conversions under logical operators
Partitioned table
MaxCompute SQL supports partitioned tables. Specify the partition as it simplifies the operation. For example, improve SQL running efficiency, reduce the cost, and so on.
To improve MaxCompute’s processing efficiency, you can specify a partition when creating a table. Specifically, several fields in the table can be specified as partition columns. A partition is comparable in terms of functionality to a directory under a file system.
In MaxCompute, each field can be specified as a separate partition. Alternatively, you can specify multiple fields of the table as a partition whereby they function similarly to multi-level directories.
If the partitions to be accessed are specified when you use data, then only corresponding partitions are read and a full table scan is avoided, improving processing efficiency while reducing costs.
MaxCompute 2.0 extends the support for partitioning types, currently, MaxCompute supports tinyint, smallint, Int, bigint, varchar, and string partition types.
Note In MaxCompute versions earlier than 2.0, only STRING partition type is supported. For historical reasons, although you can specify a partition type of bigint, however, except for the schema representation of the table as bigint, any other case is actually handled as a string. Examples:
create table parttest (a bigint) partitioned by (pt bigint);
insert into parttest partition(pt) select 1, 2 from dual;
insert into parttest partition(pt) select 1, 10 from dual;
select * from parttest where pt >= '2';
After the execution, the returned result is only one line, because 10 was treated as a STRING and compared with 2, meaning no result can be returned.
Restrictions
When using a partition, the following restrictions apply:
- The maximum number of partition levels for a single table is 6 levels.
- The maximum number of single table partitions is 60,000.
- The maximum number of query partitions for a query is 10,000.
Examples:
-- create a two-level partition table with the date as the level one partition and the region as the level two partition
create table src (key string, value bigint) partitioned by (pt string,region string);
When querying, use the partition column as a filter condition in the WHERE condition filter:
select * from src where pt='20170601' and region='hangzhou'; -- This example is the correct method of using WHERE conditional filter. When MaxCompute generates a query plan, only data of the region 'hangzhou' under the '20170601' partition is accessed.
select * from src where pt = 20170601; -- This example is an incorrect method of using the WHERE conditional filter. In this example, the effectiveness of the partition filter cannot be guaranteed. pt is a STRING type. When the STRING type is compared with BIGINT type (20170601), MaxCompute converts both to DOUBLE type, and loss of precision occurs.
For some of the operation commands for MaxCompute, there is a difference in the syntax when you process the partition and non-partition tables.
UNION ALL
To be involved in a UNION ALL operation, the data type of columns, column numbers, and column names must be consistent, otherwise an error occurs.
Operators
Operators are used to perform program code operations. This article introduces four types of operators: relational operator, arithmetic operator, bit operator and logical operator.
Relational operators
Operator | Description |
---|---|
A=B | If A or B is NULL, NULL is returned. If A is equal to B, TRUE is returned; otherwise FALSE is returned. |
A<>B | If A or B is NULL, NULL is returned. If A is not equal to B, TRUE is returned; otherwise FALSE is returned. |
A<B | If A or B is NULL, NULL is returned. If A is less than B, TRUE is returned; otherwise FALSE is returned. |
A<=B | If A or B is NULL, NULL is returned. If A is not greater than B, TRUE is returned; otherwise FALSE is returned. |
A>B | If A or B is NULL, NULL is returned. If A is greater than B, TRUE is returned; otherwise FALSE is returned. |
A>=B | If A or B is NULL, NULL is returned; if A is not less than B, TRUE is returned; otherwise, FALSE is returned. |
A IS NULL | If A is NULL, TRUE is returned; otherwise, FALSE is returned. |
A IS NOT NULL | If A is NULL, TRUE is returned; otherwise FALSE is returned. |
A LIKE B | If A or B is NULL, NULL is returned. If String A matches the SQL simple regular B TRUE is returned; otherwise FALSE is returned. The ( %) character in B matches an arbitrary number of characters and the (_) character in B matches any character in A. To match (%) or(_), use by the escape characters (‘%’)’and (‘_’).
|
A RLIKE B | A is a string, and B is a string constant regular expression. If any substring of A matches the Java regular expression B, TRUE is returned; otherwise FALSE is returned. If expression B is empty, report an error and exit. If expression A or B is NULL, NULL is returned. |
A IN B | B is a set. If expression A is NULL, NULL is returned. If expression A is in expression B, TRUE is returned; otherwise FALSE is returned. If expression B has only one element NULL, that is, A IN (NULL), return NULL. If expression B contains NULL element, take NULL as the type of other elements in B set. B must be a constant and at least has one element; all types must be consistent. |
BETWEEN AND | The expression is A [NOT] BETWEEN B AND C . Empty if A, B, or C is empty. TRUE if A is larger than or equal to B and less than or equal to C; otherwise FALSE is returned. |
The common use:
select * from user where user_id = '0001';
select * from user where user_name <> 'maggie';
select * from user where age > ‘50’;
select * from user where birth_day >= '1980-01-01 00:00:00';
select * from user where is_female is null;
select * from user where is_female is not null;
select * from user where user_id in (0001,0010);
select * from user where user_name like 'M%';
The Double values in MaxCompute are different in precision. For this reason, we do not recommend using the equal sign for comparison between two Double data. You can subtract two Double types, and then take the absolute value into consideration. When the absolute value is small enough, the two double values are considered equal.
Example:
abs(0.9999999999 - 1.0000000000) < 0.000000001
-- 0.9999999999 and 1.0000000000 have the precision of 10 decimal digits, while 0.000000001 has the precision of 9 decimal digits.
-- It is considered that 0.9999999999 is equal to 1.0000000000.
Arithmetic operators
Operator | Description |
---|---|
A + B | If expression A or B is NULL, NULL is returned; otherwise the result of A+B is returned. |
A – B | If expression A or B is NULL, NULL is returned; otherwise the result of A – B is returned. |
A * B | If expression A or B is NULL, NULL is returned; otherwise result of A * B is returned. |
A / B | If expression A or B is NULL, NULL is returned; otherwise the result of A / B is returned. If Expression A and B are BIGINT types, the result is DOUBLE type. |
A % B | If expression A or B is NULL, NULL is returned; otherwise the reminder result from dividing A by B is returned. |
+A | Result A is returned. |
-A | If expression A is NULL, NULL is returned; otherwise –A is returned. |
The common use:
select age+10, age-10, age%10, -age, age*age, age/10 from user;
Bitwise operators
Operator | Description |
---|---|
A & B | Return the result of bitwise AND of A and B. For example: 1&2, return 0; 1&3, return 1; Bitwise AND of NULL and other values, all return NULL. Expression A and B must be BIGINT . |
A | B | Return the result of bitwise OR of A and B. For example: 1|2, return3. 1|3, return 3. Bitwise OR of NULL and other values, all return NULL. Expression A and B must be BIGINT type. |
Summary
In this blog, you’ve got to see a bit more about Alibaba Cloud MaxCompute SQL to take advantage of all of the features included in MaxCompute to help kickstart your data processing and analytics workflow.