Presto

# Set up Presto



### Download Presto bin tar.gz

https://prestosql.io/download.html


Extract the file

$ tar xf presto-server-0.239.3.tar.gz

$ mv presto-server-0.239.3 ~/Applications/


Create the following file and directories. The content is below

$ tree ~/Applications/presto-server-0.239.3/etc

├── catalog

│ ├── blackhole.properties

│ ├── tpcds.properties

│ ├── memory.properties

│ └── tpch.properties

├── config.properties

├── jvm.config

├── log.properties

└── node.properties


$ mkdir -p ~/Applications/presto-server-0.239.3/etc/catalog

$ cat ~/Applications/presto-server-0.239.3/etc/config.properties

coordinator=true

node-scheduler.include-coordinator=true

http-server.http.port=8080

query.max-memory=5GB

query.max-memory-per-node=1GB

query.max-total-memory-per-node=2GB

discovery-server.enabled=true

discovery.uri=http://localhost:8080




$ cat ~/Applications/presto-server-0.239.3/etc/jvm.config

-server

-Xmx4G

-XX:+UseG1GC

-XX:G1HeapRegionSize=32M

-XX:+UseGCOverheadLimit

-XX:+ExplicitGCInvokesConcurrent

-XX:+HeapDumpOnOutOfMemoryError

-XX:+ExitOnOutOfMemoryError




$ cat ~/Applications/presto-server-0.239.3/etc/log.properties

com.facebook.presto=INFO



$ cat ~/Applications/presto-server-0.239.3/etc/node.properties

node.environment=test


$ cat ~/Applications/presto-server-0.239.3/etc/catalog/tpcds.properties

connector.name=tpcds


$ cat ~/Applications/presto-server-0.239.3/etc/catalog/tpch.properties

connector.name=tpch


$ cat ~/Applications/presto-server-0.239.3/etc/catalog/memory.properties

connector.name=memory


It acts as a sink for any data, similar to the null device in Unix operating sys‐ tems, /dev/null. This allows you to use it as a target for any insert queries reading from other catalogs. Since it does not actually write anything, you can use this to measure read performance from those catalogs.


$ cat ~/Applications/presto-server-0.239.3/etc/catalog/blackhole.properties

connector.name=blackhole


Download the presto-cli-340-executable.jar inside bin directory and rename to presto and set +x using chmod



Start the server

bin/launcher run


You should the following message at the end. Keep this application running.

2020-08-27T23:38:13.085+0530 INFO main com.facebook.presto.metadata.StaticCatalogStore -- Loading catalog properties etc/catalog/tpcds.properties --

2020-08-27T23:38:13.085+0530 INFO main com.facebook.presto.metadata.StaticCatalogStore -- Loading catalog tpcds --

2020-08-27T23:38:13.087+0530 INFO main com.facebook.presto.metadata.StaticCatalogStore -- Added catalog tpcds using connector tpcds --

2020-08-27T23:38:13.098+0530 INFO main com.facebook.presto.security.AccessControlManager -- Loading system access control --

2020-08-27T23:38:13.120+0530 INFO main com.facebook.presto.security.AccessControlManager -- Loaded system access control allow-all --

2020-08-27T23:38:13.146+0530 INFO main com.facebook.presto.server.PrestoServer ======== SERVER STARTED ========





$ export PATH=~/Applications/presto-server-0.239.3/bin:$PATH


$ presto --execute "show catalogs"

"system"

"tpcds"

"tpch"



$ presto --catalog tpch --execute "show schemas"

"information_schema"

"sf1"

"sf100"

"sf1000"

"sf10000"

"sf100000"

"sf300"

"sf3000"

"sf30000"

"tiny"


$ presto --catalog tpch --schema sf100 --execute "show tables"

"customer"

"lineitem"

"nation"

"orders"

"part"

"partsupp"

"region"

"supplier"




$ presto --catalog tpch --schema sf100 --execute "select count(*) from customer"

"15000000"


$ presto

presto> show schemas from tpch;

Schema

--------------------

information_schema

sf1

sf100

sf1000

sf10000

sf100000

sf300

sf3000

sf30000

tiny

(10 rows)


Query 20200827_181437_00007_biq5n, FINISHED, 1 node

Splits: 19 total, 19 done (100.00%)

0:00 [10 rows, 119B] [80 rows/s, 959B/s]



It acts as a sink for any data, similar to the null device in Unix operating sys‐ tems, /dev/null. This allows you to use it as a target for any insert queries reading from other catalogs. Since it does not actually write anything, you can use this to measure read performance from those catalogs.


This operation essentially measures read performance from the tpch catalog, since it reads 1.5 million order records and then sends them to blackhole. Using other sche‐ mas like tcph.sf100 increases the data-set size. This allows you to assess the perfor‐ mance of your Presto deployment.


CREATE SCHEMA blackhole.test;

CREATE TABLE blackhole.test.orders AS SELECT * from tpch.tiny.orders;

INSERT INTO blackhole.test.orders SELECT * FROM tpch.sf3.orders;



presto> CREATE SCHEMA blackhole.test;

CREATE SCHEMA

presto> EATE TABLE blackhole.test.orders AS SELECT * from tpch.tiny.orders; INSERT INTO blackhole.test.orders SELECT * FROM tpch.sf3.orders;

presto> CREATE SCHEMA blackhole.test;

Query 20200827_184516_00003_m3sma failed: line 1:1: Schema 'blackhole.test' already exists

CREATE SCHEMA blackhole.test


presto>

presto> CREATE TABLE blackhole.test.orders AS SELECT * from tpch.tiny.orders;

CREATE TABLE: 15000 rows


Query 20200827_184528_00004_m3sma, FINISHED, 1 node

Splits: 30 total, 30 done (100.00%)

0:00 [15K rows, 0B] [74.3K rows/s, 0B/s]


presto> INSERT INTO blackhole.test.orders SELECT * FROM tpch.sf3.orders;

INSERT: 4500000 rows


Query 20200827_184539_00005_m3sma, FINISHED, 1 node

Splits: 30 total, 30 done (100.00%)

0:07 [4.5M rows, 0B] [640K rows/s, 0B/s]




You can use the memory connector like a temporary database. All data is stored in memory in the cluster. Stopping the cluster destroys the data. Of course, you can also actively use SQL statements to remove data in a table or even drop the table altogether.

Using the memory connector is useful for testing queries or temporary storage.


See examples here.

https://github.com/prestosql/presto-the-definitive-guide