十年網(wǎng)站開(kāi)發(fā)經(jīng)驗(yàn) + 多家企業(yè)客戶 + 靠譜的建站團(tuán)隊(duì)
量身定制 + 運(yùn)營(yíng)維護(hù)+專業(yè)推廣+無(wú)憂售后,網(wǎng)站問(wèn)題一站解決
基于DataLakeAnalytics 的數(shù)據(jù)湖實(shí)踐是怎樣的,相信很多沒(méi)有經(jīng)驗(yàn)的人對(duì)此束手無(wú)策,為此本文總結(jié)了問(wèn)題出現(xiàn)的原因和解決方法,通過(guò)這篇文章希望你能解決這個(gè)問(wèn)題。
專注于為中小企業(yè)提供網(wǎng)站設(shè)計(jì)制作、成都做網(wǎng)站服務(wù),電腦端+手機(jī)端+微信端的三站合一,更高效的管理,為中小企業(yè)綏化免費(fèi)做網(wǎng)站提供優(yōu)質(zhì)的服務(wù)。我們立足成都,凝聚了一批互聯(lián)網(wǎng)行業(yè)人才,有力地推動(dòng)了上1000+企業(yè)的穩(wěn)健成長(zhǎng),幫助中小企業(yè)通過(guò)網(wǎng)站建設(shè)實(shí)現(xiàn)規(guī)模擴(kuò)充和轉(zhuǎn)變。
在開(kāi)始之前我們要有一個(gè) DLA 的賬號(hào),目前 DLA 正在公測(cè),直接申請(qǐng)?jiān)囉镁秃昧?。試用審批成功之后,你?huì)獲得一個(gè)用戶名和密碼, 然后在控制臺(tái)登錄就可以使用:
cdn.nlark.com/lark/0/2018/png/4867/1535975617974-a6185b92-9e74-4e6e-9730-e23858a136ad.png">
或者如果你是極客,更偏愛(ài)命令行,你也可以使用普通的 MySQL 客戶端就可以連接 DLA 了:
mysql -hservice.cn-shanghai.datalakeanalytics.aliyuncs.com -P10000 -u-p
在這篇文章里面,我會(huì)使用 MySQL 命令行給大家演示 DLA 的功能。
另外你還需要在您的OSS上準(zhǔn)備一些測(cè)試數(shù)據(jù), 我這里準(zhǔn)備的是著名的 TPCH 測(cè)試數(shù)據(jù)集。
DLA 是一個(gè)以 SQL 作為查詢語(yǔ)言的數(shù)據(jù)湖引擎,為了能夠讓 DLA 能夠?qū)?OSS 上的數(shù)據(jù)進(jìn)行查詢,我們需要以某種方式告訴 DLA 我們 OSS 數(shù)據(jù)的結(jié)構(gòu)。為了讓用戶使用更方便,DLA 使用了傳統(tǒng)的 數(shù)據(jù)庫(kù)
, 表
的概念來(lái)維護(hù)這些數(shù)據(jù)的元信息,也就說(shuō),OSS的文件結(jié)構(gòu)的數(shù)據(jù)映射到 DLA 變成了一個(gè)數(shù)據(jù)庫(kù)和一堆表。
以 TPCH
數(shù)據(jù)集來(lái)舉個(gè)例子,我們知道 TPCH 數(shù)據(jù)集里面包含了如下幾塊信息: 用戶(customer)
, 訂單(orders)
, 訂單的詳情(lineitem)
等等,這些數(shù)據(jù)整體屬于一塊業(yè)務(wù),我們建立一個(gè)數(shù)據(jù)庫(kù)來(lái)對(duì)應(yīng):
CREATE SCHEMA oss_tpch with DBPROPERTIES( CATALOG = 'oss', LOCATION = 'oss://public-datasets-cn-hangzhou/tpch/1x/' );
這每塊數(shù)據(jù)對(duì)應(yīng)到OSS上一個(gè)目錄的多個(gè)文件,拿 訂單
來(lái)說(shuō),它對(duì)應(yīng)的是 orders_text
目錄下面的 1 個(gè)文件(這個(gè)例子里面只有一個(gè)文件,實(shí)際使用中,這里可以有多個(gè)文件):
我們把這個(gè) orders_text
目錄映射到我們的數(shù)據(jù)庫(kù) oss_tpch
下面的一張表:
use oss_tpch; CREATE EXTERNAL TABLE IF NOT EXISTS orders ( O_ORDERKEY INT, O_CUSTKEY INT, O_ORDERSTATUS STRING, O_TOTALPRICE DOUBLE, O_ORDERDATE DATE, O_ORDERPRIORITY STRING, O_CLERK STRING, O_SHIPPRIORITY INT, O_COMMENT STRING ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS TEXTFILE LOCATION 'oss://public-datasets-cn-hangzhou/tpch/1x/orders_text/';
這樣我們就可以通過(guò) DLA 對(duì)OSS上的進(jìn)行數(shù)據(jù)分析了, 比如我們先來(lái)查個(gè)前十條看看:
mysql> select * from orders limit 10; +------------+-----------+---------------+--------------+-------------+-----------------+-----------------+----------------+---------------------------------------------------------------------------+ | o_orderkey | o_custkey | o_orderstatus | o_totalprice | o_orderdate | o_orderpriority | o_clerk | o_shippriority | o_comment | +------------+-----------+---------------+--------------+-------------+-----------------+-----------------+----------------+---------------------------------------------------------------------------+ | 1 | 3689999 | O | 224560.83 | 1996-01-02 | 5-LOW | Clerk#000095055 | 0 | nstructions sleep furiously among | | 2 | 7800163 | O | 75388.65 | 1996-12-01 | 1-URGENT | Clerk#000087916 | 0 | foxes. pending accounts at the pending, silent asymptot | | 3 | 12331391 | F | 255287.36 | 1993-10-14 | 5-LOW | Clerk#000095426 | 0 | sly final accounts boost. carefully regular ideas cajole carefully. depos | | 4 | 13677602 | O | 43119.84 | 1995-10-11 | 5-LOW | Clerk#000012340 | 0 | sits. slyly regular warthogs cajole. regular, regular theodolites acro | | 5 | 4448479 | F | 125809.76 | 1994-07-30 | 5-LOW | Clerk#000092480 | 0 | quickly. bold deposits sleep slyly. packages use slyly | | 6 | 5562202 | F | 56408.2 | 1992-02-21 | 4-NOT SPECIFIED | Clerk#000005798 | 0 | ggle. special, final requests are against the furiously specia | | 7 | 3913430 | O | 240358.24 | 1996-01-10 | 2-HIGH | Clerk#000046961 | 0 | ly special requests | | 32 | 13005694 | O | 136666.23 | 1995-07-16 | 2-HIGH | Clerk#000061561 | 0 | ise blithely bold, regular requests. quickly unusual dep | | 33 | 6695788 | F | 183460.23 | 1993-10-27 | 3-MEDIUM | Clerk#000040860 | 0 | uriously. furiously final request | | 34 | 6100004 | O | 52842.63 | 1998-07-21 | 3-MEDIUM | Clerk#000022278 | 0 | ly final packages. fluffily final deposits wake blithely ideas. spe | +------------+-----------+---------------+--------------+-------------+-----------------+-----------------+----------------+---------------------------------------------------------------------------+ 10 rows in set (0.21 sec)
我們?cè)賮?lái)看看用戶 36901
的前十條訂單:
mysql> select * from orders where o_custkey= '36901' limit 10; +------------+-----------+---------------+--------------+-------------+-----------------+-----------------+----------------+------------------------------------------------------------------+ | o_orderkey | o_custkey | o_orderstatus | o_totalprice | o_orderdate | o_orderpriority | o_clerk | o_shippriority | o_comment | +------------+-----------+---------------+--------------+-------------+-----------------+-----------------+----------------+------------------------------------------------------------------+ | 1243264 | 36901 | F | 103833.45 | 1992-03-23 | 2-HIGH | Clerk#000000922 | 0 | nts haggle. even, even theodolites are. blithely | | 1274530 | 36901 | O | 181977.58 | 1997-04-29 | 2-HIGH | Clerk#000000232 | 0 | bold foxes along the carefully expres | | 1599527 | 36901 | F | 322352.11 | 1993-10-16 | 2-HIGH | Clerk#000000674 | 0 | the slyly even dependencies. | | 1837477 | 36901 | F | 101653.62 | 1993-05-27 | 5-LOW | Clerk#000000891 | 0 | lyly special requests. express foxes sleep fu | | 1994082 | 36901 | O | 77952.78 | 1995-07-05 | 3-MEDIUM | Clerk#000000525 | 0 | luffily ironic courts. bold, e | | 2224802 | 36901 | F | 243852.76 | 1993-01-14 | 1-URGENT | Clerk#000000827 | 0 | sly final requests. pending, regular ideas among the furiously u | | 4957636 | 36901 | F | 5741.32 | 1992-05-20 | 5-LOW | Clerk#000000230 | 0 | ackages. fluffily even packages solve carefully dolphins. unusua | | 5078467 | 36901 | F | 119823.03 | 1994-04-29 | 4-NOT SPECIFIED | Clerk#000000402 | 0 | regular asymptotes cajo | | 5173859 | 36901 | F | 103624.02 | 1994-05-28 | 3-MEDIUM | Clerk#000000335 | 0 | regular dependencies poach quickly. unusu | | 5525574 | 36901 | O | 136098.0 | 1998-02-16 | 4-NOT SPECIFIED | Clerk#000000425 | 0 | cial pinto beans wake. slyly even warthogs use. bo | +------------+-----------+---------------+--------------+-------------+-----------------+-----------------+----------------+------------------------------------------------------------------+ 10 rows in set (1.07 sec)
再來(lái)查一查訂單量最多的前是個(gè)人:
mysql> select o_custkey, count(*) as cnt from orders group by o_custkey order by cnt desc limit 10; +-----------+------+ | o_custkey | cnt | +-----------+------+ | 3451 | 41 | | 102022 | 41 | | 102004 | 41 | | 79300 | 40 | | 117082 | 40 | | 122623 | 40 | | 69682 | 39 | | 143500 | 39 | | 142450 | 38 | | 53302 | 38 | +-----------+------+ 10 rows in set (2.69 sec)
恩,這些人就是我們要重點(diǎn)服務(wù)好的客戶啊,我們要把這些用戶的ID回寫到前臺(tái)的 RDS 數(shù)據(jù)庫(kù)里面讓我們的營(yíng)銷同學(xué)做一些針對(duì)性的營(yíng)銷活動(dòng),沒(méi)問(wèn)題,DLA支持把分析好的數(shù)據(jù)回流到RDS
要把分析好的數(shù)據(jù)回流到RDS我們首先一種機(jī)制來(lái)告訴 DLA 數(shù)據(jù)回流的目的地,得益于DLA統(tǒng)一的設(shè)計(jì),我們就像映射 OSS 的數(shù)據(jù)一樣,我們映射一個(gè) MySQL 數(shù)據(jù)庫(kù)進(jìn)來(lái)就好了,比如我們要把數(shù)據(jù)寫到如下的數(shù)據(jù)庫(kù)里面:
mysql -habcde.mysql.rds.aliyuncs.com -P3306 -uhello -pworld -Dmarketing
那么我們?cè)?DLA 里面建一個(gè)映射的庫(kù):
CREATE SCHEMA `mysql_marketing` WITH DBPROPERTIES ( CATALOG = 'mysql', LOCATION = 'jdbc:mysql://abcde.mysql.rds.aliyuncs.com:3306/marketing', USER='hello', PASSWORD='world', INSTANCE_ID = '', VPC_ID = ' ' );
這里需要解釋一下的是
VPC_ID
和INSTANCE_ID
, 我們知道為了安全的原因在阿里云上購(gòu)買的 RDS 我們一般都會(huì)把它放在一個(gè)單獨(dú)的VPC里面,以保證只有我們自己可以訪問(wèn),這里為了讓 DLA 能夠訪問(wèn)到我們的 MySQL 數(shù)據(jù)庫(kù)以進(jìn)行數(shù)據(jù)回流,我們需要告訴 DLA 這個(gè) RDS的相關(guān)信息。
其中 INSTANCE_ID
和 VPC_ID
在 RDS的詳情頁(yè)面都可以找到, 比如 VPC_ID
:
由于 RDS 的安全組會(huì)對(duì)訪問(wèn)的來(lái)源IP進(jìn)行控制,我們需要把DLA相關(guān)的地址段 100.104.0.0/16
IP地址段加入到你的RDS的白名單列表。
到這里為止,準(zhǔn)備工作就完成了,我們的 mysql 數(shù)據(jù)庫(kù)建好了。
我們要保存的結(jié)果很簡(jiǎn)單,就是下單量前 10
的用戶, 這個(gè)表在 MySQL 數(shù)據(jù)庫(kù)里面的建表語(yǔ)句如下:
create table top10_user ( custkey int, order_cnt bigint );
而為了把這個(gè)表映射進(jìn) DLA 我們建一個(gè)對(duì)應(yīng)的表,建表語(yǔ)句幾乎一樣:
use mysql_marketing; create external table top10_user ( custkey int, order_cnt bigint );
下面我們就可以把查出來(lái)的數(shù)據(jù)進(jìn)行回流了:
mysql> insert into mysql_marketing.top10_user -> select o_custkey, count(*) as cnt from oss_tpch.orders -> group by o_custkey order by cnt desc limit 10; +------+ | rows | +------+ | 10 | +------+ 1 row in set (4.71 sec) mysql> select * from mysql_marketing.top10_user; +---------+-----------+ | custkey | order_cnt | +---------+-----------+ | 143500 | 39 | | 102004 | 41 | | 53302 | 38 | | 3451 | 41 | | 122623 | 40 | | 129637 | 38 | | 102022 | 41 | | 117082 | 40 | | 69682 | 39 | | 79300 | 40 | +---------+-----------+ 10 rows in set (0.14 sec)
看完上述內(nèi)容,你們掌握基于DataLakeAnalytics 的數(shù)據(jù)湖實(shí)踐是怎樣的的方法了嗎?如果還想學(xué)到更多技能或想了解更多相關(guān)內(nèi)容,歡迎關(guān)注創(chuàng)新互聯(lián)行業(yè)資訊頻道,感謝各位的閱讀!