Presto
# Set up Presto
### Download Presto bin tar.gz
https://prestosql.io/download.html
Extract the file
$ tar xf presto-server-0.239.3.tar.gz
$ mv presto-server-0.239.3 ~/Applications/
Create the following file and directories. The content is below
$ tree ~/Applications/presto-server-0.239.3/etc
├── catalog
│ ├── blackhole.properties
│ ├── tpcds.properties
│ ├── memory.properties
│ └── tpch.properties
├── config.properties
├── jvm.config
├── log.properties
└── node.properties
$ mkdir -p ~/Applications/presto-server-0.239.3/etc/catalog
$ cat ~/Applications/presto-server-0.239.3/etc/config.properties
coordinator=true
node-scheduler.include-coordinator=true
http-server.http.port=8080
query.max-memory=5GB
query.max-memory-per-node=1GB
query.max-total-memory-per-node=2GB
discovery-server.enabled=true
discovery.uri=http://localhost:8080
$ cat ~/Applications/presto-server-0.239.3/etc/jvm.config
-server
-Xmx4G
-XX:+UseG1GC
-XX:G1HeapRegionSize=32M
-XX:+UseGCOverheadLimit
-XX:+ExplicitGCInvokesConcurrent
-XX:+HeapDumpOnOutOfMemoryError
-XX:+ExitOnOutOfMemoryError
$ cat ~/Applications/presto-server-0.239.3/etc/log.properties
com.facebook.presto=INFO
$ cat ~/Applications/presto-server-0.239.3/etc/node.properties
node.environment=test
$ cat ~/Applications/presto-server-0.239.3/etc/catalog/tpcds.properties
connector.name=tpcds
$ cat ~/Applications/presto-server-0.239.3/etc/catalog/tpch.properties
connector.name=tpch
$ cat ~/Applications/presto-server-0.239.3/etc/catalog/memory.properties
connector.name=memory
It acts as a sink for any data, similar to the null device in Unix operating sys‐ tems, /dev/null. This allows you to use it as a target for any insert queries reading from other catalogs. Since it does not actually write anything, you can use this to measure read performance from those catalogs.
$ cat ~/Applications/presto-server-0.239.3/etc/catalog/blackhole.properties
connector.name=blackhole
Download the presto-cli-340-executable.jar inside bin directory and rename to presto and set +x using chmod
Start the server
bin/launcher run
You should the following message at the end. Keep this application running.
2020-08-27T23:38:13.085+0530 INFO main com.facebook.presto.metadata.StaticCatalogStore -- Loading catalog properties etc/catalog/tpcds.properties --
2020-08-27T23:38:13.085+0530 INFO main com.facebook.presto.metadata.StaticCatalogStore -- Loading catalog tpcds --
2020-08-27T23:38:13.087+0530 INFO main com.facebook.presto.metadata.StaticCatalogStore -- Added catalog tpcds using connector tpcds --
2020-08-27T23:38:13.098+0530 INFO main com.facebook.presto.security.AccessControlManager -- Loading system access control --
2020-08-27T23:38:13.120+0530 INFO main com.facebook.presto.security.AccessControlManager -- Loaded system access control allow-all --
2020-08-27T23:38:13.146+0530 INFO main com.facebook.presto.server.PrestoServer ======== SERVER STARTED ========
$ export PATH=~/Applications/presto-server-0.239.3/bin:$PATH
$ presto --execute "show catalogs"
"system"
"tpcds"
"tpch"
$ presto --catalog tpch --execute "show schemas"
"information_schema"
"sf1"
"sf100"
"sf1000"
"sf10000"
"sf100000"
"sf300"
"sf3000"
"sf30000"
"tiny"
$ presto --catalog tpch --schema sf100 --execute "show tables"
"customer"
"lineitem"
"nation"
"orders"
"part"
"partsupp"
"region"
"supplier"
$ presto --catalog tpch --schema sf100 --execute "select count(*) from customer"
"15000000"
$ presto
presto> show schemas from tpch;
Schema
--------------------
information_schema
sf1
sf100
sf1000
sf10000
sf100000
sf300
sf3000
sf30000
tiny
(10 rows)
Query 20200827_181437_00007_biq5n, FINISHED, 1 node
Splits: 19 total, 19 done (100.00%)
0:00 [10 rows, 119B] [80 rows/s, 959B/s]
It acts as a sink for any data, similar to the null device in Unix operating sys‐ tems, /dev/null. This allows you to use it as a target for any insert queries reading from other catalogs. Since it does not actually write anything, you can use this to measure read performance from those catalogs.
This operation essentially measures read performance from the tpch catalog, since it reads 1.5 million order records and then sends them to blackhole. Using other sche‐ mas like tcph.sf100 increases the data-set size. This allows you to assess the perfor‐ mance of your Presto deployment.
CREATE SCHEMA blackhole.test;
CREATE TABLE blackhole.test.orders AS SELECT * from tpch.tiny.orders;
INSERT INTO blackhole.test.orders SELECT * FROM tpch.sf3.orders;
presto> CREATE SCHEMA blackhole.test;
CREATE SCHEMA
presto> EATE TABLE blackhole.test.orders AS SELECT * from tpch.tiny.orders; INSERT INTO blackhole.test.orders SELECT * FROM tpch.sf3.orders;
presto> CREATE SCHEMA blackhole.test;
Query 20200827_184516_00003_m3sma failed: line 1:1: Schema 'blackhole.test' already exists
CREATE SCHEMA blackhole.test
presto>
presto> CREATE TABLE blackhole.test.orders AS SELECT * from tpch.tiny.orders;
CREATE TABLE: 15000 rows
Query 20200827_184528_00004_m3sma, FINISHED, 1 node
Splits: 30 total, 30 done (100.00%)
0:00 [15K rows, 0B] [74.3K rows/s, 0B/s]
presto> INSERT INTO blackhole.test.orders SELECT * FROM tpch.sf3.orders;
INSERT: 4500000 rows
Query 20200827_184539_00005_m3sma, FINISHED, 1 node
Splits: 30 total, 30 done (100.00%)
0:07 [4.5M rows, 0B] [640K rows/s, 0B/s]
You can use the memory connector like a temporary database. All data is stored in memory in the cluster. Stopping the cluster destroys the data. Of course, you can also actively use SQL statements to remove data in a table or even drop the table altogether.
Using the memory connector is useful for testing queries or temporary storage.
See examples here.
https://github.com/prestosql/presto-the-definitive-guide