Pre-requisite :-
Download SQL Workbench
Download Redshift Driver
Once you have downloaded the above mentioned pre-requisites let's move ahead.
First we will obtain the JDBC URL.
- Login to your AWS Redshift Console .
- Click on the cluster you have created. If you have followed the last post it will be "testdw" .
- In the "Configuration" tab look for JDBC URL .
- Copy the JDBC URL and save it in notepad.
- Click on File > Connect window.
- In the bottom left of the "Select Connection Profile" window click on "Manage Drivers"
- In the "Manage Drivers" window click on the folder icon, browse to the location of the Redshift driver you downloaded earlier and select it.
Name:- Redshiftdriver JDBC 4.2 Classname:- com.amazon.redshift.jdbc.Driver
- Click OK
- Now in the "Select Connection Profile" window
Driver:- Select the Redshift driver you added. URL:- Mention the JDBC URL you saved earlier. Username:- DB username you mentioned during cluster creation. Password:- Enter password of the DB user . Check the Autocommit box.
- Finally click on OK.
- If everything is configured correctly. You will get connected to DB .
- Try executing the query
select * from information_schema.tables;
- If you connection is successful you will see results in the window.
- Now we will load some sample data which is provided by AWS and kept on S3. In the SQL Workbench copy/paste the below query and execute to create a table.
create table users(
userid integer not null distkey sortkey,
username char(8),
firstname varchar(30),
lastname varchar(30),
city varchar(30),
state char(2),
email varchar(100),
phone char(14),
likesports boolean,
liketheatre boolean,
likeconcerts boolean,
likejazz boolean,
likeclassical boolean,
likeopera boolean,
likerock boolean,
likevegas boolean,
likebroadway boolean,
likemusicals boolean);
create table venue(
venueid smallint not null distkey sortkey,
venuename varchar(100),
venuecity varchar(30),
venuestate char(2),
venueseats integer);
create table category(
catid smallint not null distkey sortkey,
catgroup varchar(10),
catname varchar(10),
catdesc varchar(50));
create table date(
dateid smallint not null distkey sortkey,
caldate date not null,
day character(3) not null,
week smallint not null,
month character(5) not null,
qtr character(5) not null,
year smallint not null,
holiday boolean default('N'));
create table event(
eventid integer not null distkey,
venueid smallint not null,
catid smallint not null,
dateid smallint not null sortkey,
eventname varchar(200),
starttime timestamp);
create table listing(
listid integer not null distkey,
sellerid integer not null,
eventid integer not null,
dateid smallint not null sortkey,
numtickets smallint not null,
priceperticket decimal(8,2),
totalprice decimal(8,2),
listtime timestamp);
create table sales(
salesid integer not null,
listid integer not null distkey,
sellerid integer not null,
buyerid integer not null,
eventid integer not null,
dateid smallint not null sortkey,
qtysold smallint not null,
pricepaid decimal(8,2),
commission decimal(8,2),
saletime timestamp);
Now load sample data. Ensure that in below query you replace "<iam-role-arn>" with your ARN.So your query should look like.
copy users from 's3://awssampledbuswest2/tickit/allusers_pipe.txt' credentials 'aws_iam_role=arn:aws:iam::123456789123:role/redshiftrole' delimiter '|' region 'us-west-2';
copy users from 's3://awssampledbuswest2/tickit/allusers_pipe.txt' credentials 'aws_iam_role=<iam-role-arn>' delimiter '|' region 'us-west-2'; copy venue from 's3://awssampledbuswest2/tickit/venue_pipe.txt' credentials 'aws_iam_role=<iam-role-arn>' delimiter '|' region 'us-west-2'; copy category from 's3://awssampledbuswest2/tickit/category_pipe.txt' credentials 'aws_iam_role=<iam-role-arn>' delimiter '|' region 'us-west-2'; copy date from 's3://awssampledbuswest2/tickit/date2008_pipe.txt' credentials 'aws_iam_role=<iam-role-arn>' delimiter '|' region 'us-west-2'; copy event from 's3://awssampledbuswest2/tickit/allevents_pipe.txt' credentials 'aws_iam_role=<iam-role-arn>' delimiter '|' timeformat 'YYYY-MM-DD HH:MI:SS' region 'us-west-2'; copy listing from 's3://awssampledbuswest2/tickit/listings_pipe.txt' credentials 'aws_iam_role=<iam-role-arn>' delimiter '|' region 'us-west-2'; copy sales from 's3://awssampledbuswest2/tickit/sales_tab.txt' credentials 'aws_iam_role=<iam-role-arn>' delimiter '\t' timeformat 'MM/DD/YYYY HH:MI:SS' region 'us-west-2';Once you have loaded the data you can run sample queries like below in your SQL Workbench.
Congrats! You have finally created the Redshift cluster and run queries on it after loading data.
Refer this post if you want to reset the master user password.
Don't forget to cleanup the cluster or you will be billed.
- For deleting the cluster just click on the Cluster(in our case it's testdw) in the AWS console.
- Click on "Cluster" drop down and select delete.
Hope this guide was helpful to you! Do let me know in the comment section if you have any queries or suggestions .
No comments:
Post a Comment