Quantcast
Channel: BI Admin – BI Notes for SAS Software Users
Viewing all articles
Browse latest Browse all 18

Big Data Loading with PostgreSQL available in SAS 9.4

$
0
0

Author:

The recent release of SAS 9.4 was very exciting for me because it finally brings PostgreSQL support!  In my opinion, PostgreSQL is one of the best low-cost database platforms out there.  It has a ton of enterprise level features, it’s fast and highly configurable, and best of all its FREE.  Gotta love open source!  I’m very happy to see that SAS has invested in PostgreSQL as well.  From the looks of it, the Web Infrastructure Platform is now built on PostgreSQL by default in SAS 9.4.  

PostgreSQL Elephant

 

Bulk Loading Big Data with PostgreSQL

For loading large quantities of BIG data into traditional databases, I highly suggest using the bulk load option.  Many SAS Access engines support bulk loading.  In essence, SAS doesn’t actually load the data directly into the target database.  Data is staged into a flat file, then SAS uses the native database client to bulk load into the target database.   In this post I will focus only on PostgreSQL.

Configure PostgreSQL Client on Compute Tier

Before you can bulk load with PostgreSQL, SAS needs to have access to a PostgreSQL client installed where ever SAS is running (typically a server, aka compute tier).  For PostgreSQL, the COPY command is used for bulk loading.  If you’re like me, you want to know what runs behind the scenes, well that’s it!

If you love and run SAS on Ubuntu, its easy to get the PostgresSQL client with this command :-)

sudo apt-get install postgresql-client

For other distributions of Linux/Unix operating systems and Windows, the PostgreSQL Client can be downloaded and installed by your IT systems administrator.  

Bulk Loading with Insert or Append

In this example snippet of code, I’m inserting random sample data used to demonstrate concepts in my SAS Global Forum 2013 paper.  Prior steps in the code stage up the data to fit the schema of the target table (fact_transactions).

proc sql;
insert into defiant.fact_transactions
(bulkload=yes
bl_datafile="/tmp/data"
bl_logfile="/tmp/bulkload_log"
bl_delete_datafile=NO)
select * from stg_fact_transactions;
quit;
Common Bulk Loading Parameters
  • bulkload – obviously tells SAS to use the bulkloader (YES/NO)
  • bl_datafile – specify a filename on the compute tier which is used to stage the data for bulk loading
  • bl_logfile – specify a log file in case the native bulk load utility from the ‘psql’ command returns anything
  • bl_delete_datafile – tells SAS to delete the temporary data after everything is done

You can also use PROC APPEND to accomplish the same.  Both methods accomplish the same thing in about the same amount of time.  Use your own discretion or preference.

proc append base=defiant.fact_transactions
( bulkload=yes
bl_datafile="/tmp/data"
bl_logfile="/tmp/bulkload_log"
BL_LOAD_METHOD=APPEND
)
data=stg_fact_transactions;
run;

I must give the usual disclaimer, mileage may vary for different environments.  Depending on how much data is being loaded and how the target database is configured, I’m 99.9% confident bulk loading will improve load times for BIG DATA.  Check out postgresql.org for more performance suggestions on populating a database.


Never miss a post!

Get the latest BI Notes post in your Inbox when a new post is released! Click here for free subscription. You email address is kept private and you can unsubscribe anytime. Go ahead ... join us!

 
 

Post Big Data Loading with PostgreSQL available in SAS 9.4 appeared first on BI Notes for SAS® Users. Written by . Go to BI Notes for SAS® Users to subscribe.


Viewing all articles
Browse latest Browse all 18

Trending Articles