HOME  |    TRAINING  |   FREE TUTORIALS   |   JOBS
Find out more about our new RSS feed.
FREE Tutorial
PROFESSIONAL LINUX PROGRAMMING PART 3 - ECPG

CATEGORY
SEARCH OUR OTHER TUTORIALS

DESCRIPTION

Now it's time to look at the alternative way of combining SQL and C, by embedding SQL statements in the C code, and then pre-processing them into something the C compiler can understand, before invoking the C compiler. There is still a library to interface C calls to the database, but the details are hidden away behind a pre-processor.
Click here to be kept informed of our new Tutorials.


This free tutorial is a sample from the book Professional Linux Programming.


PostgreSQL's ecpg follows the ANSI standard for embedding SQL in C code, and what follows will be familiar to programmers who have used systems such as Oracle's PRO*C or Informix's ESQL-C. At the time of writing some of the less used features of embedded SQL are not supported, and the standard documentation for ecpg that ships with PostgreSQL is somewhat limited.

Since we have now worked through many of the basics of SQL, this section will actually be quite short. The first problem that has to be tackled is how to delimit sections in the file that the ecpg pre-processor needs to process. This is done with the special sequence in the source that starts 'exec sql', then contains the SQL you want to execute, and ends with a ';'. Depending on the exact syntax, as we shall see in a moment, this can either be a single line that needs to be processed, or it can be used to mark a section that needs pre-processing.

If we want to write a simple C program that performs a single UPDATE statement in the middle of some C code, we need to do only one thing in the source code - embed the UPDATE SQL statement.

What could be easier? Let's write a very simple C program with some embedded SQL that updates a table. By convention these have a file extension of pgc. Here is upd1.pgc:

#include <stdlib.h>

exec sql include sqlca;

main() {

exec sql connect to 'rick@gw1';

exec sql BEGIN WORK;

exec sql UPDATE children SET fname = 'Gavin' WHERE childno = 9;

exec sql COMMIT WORK;

exec sql disconnect all;

return EXIT_SUCCESS;

}

At first sight, this hardly looks like C at all. However, if you ignore the lines that start exec sql, you can see it is just a minimal C program. To compile this program we need a two-stage process. First we must run the ecpg pre-processor, then we compile the resulting C file, linking it with the ecpg library. To compile this you may need to add a -I option to ecpg, to tell it where to look for the include file, depending on your installation. For this program, upd1.pgc, the commands are:

$ ecpg -t -I/usr/include/pgsql upd1.pgc
$ gcc -o upd1 -I/usr/include/pgsql upd1.c -lecpg -lpq

The ecpg command pre-processes the file, leaving a .c file, which we then compile in the normal way, linking with two PostgreSQL libraries. The '-t' on the command line for ecpg tells ecpg that we wish to manage our own transactions with explicit BEGIN WORK and COMMIT WORK statements in the source file. By default ecpg will automatically start a transaction when you connect to the database. There is nothing wrong with this, it's just that the authors prefer to explicitly define transaction blocks.

You will notice the connect string is 'rick@gw1'. This requests a connection to the database 'rick' on server 'gw1'. No password is needed since that's a local machine, and I am already logged in as user rick. However in the general case you can specify the connection in a URL style format, in which case the format is

<protocol>:<service>://<machine>:<port>/<dbname> as 
<connection name> as <login name> using <password for login>

A concrete example makes this much clearer. Suppose we want to connect using tcp to the postgresql service on the dbs6 machine, port 5432, connecting to the database rick, using the database login name neil, who has a password secret. The connect line we would put in our program would be:

exec sql connect to tcp:postgresql://dbs6:5432/rick as 
connect_2 user neil using secret;

If we want to separate out the different elements, then we can use the same style of connect request, but using ''host variables'', which you will notice always start with a ':'. We will see more about host variables later in the chapter; for now just imagine them as normal C variables.

exec sql BEGIN DECLARE SECTION;
   char connect_str[256];
   char as_str[25];
   char user_str[25];
   char using_str[25];

exec sql END DECLARE SECTION;

   strcpy(connect_str, "tcp:postgresql://localhost:5432/rick");
   strcpy(as_str, "connect_2");
   strcpy(user_str, "neil");
   strcpy(using_str, "secret");

exec sql connect to :connect_str as :as_str user 
:user_str using :using_str ;

  if (sqlca.sqlcode != 0) {
   pg_print_debug(__FILE__, __LINE__, sqlca, "Connect failed");
   return DVD_ERR_BAD_DATABASE;
  }

Now we have seen the basics, let's look in slightly more detail at what ecpg does.

The first feature that we almost always need when writing an ecpg program is to include a header file that gives us access to errors and status information from PostgreSQL. Since we need this file to be pre-processed by the ecpg processor, before the C compiler runs, a normal include will not do. What we need is to use the exec sql include command. Since there is just a single file called sqlca, which we almost always need to include, pgc files usually start with:

exec sql include sqlca;

This causes the ecpg command to include the file sqlca.h, which is (by default) found in the /usr/include/pgsql directory, though depending on your installation this may of course be different. This important include file declares an sqlca structure, and variable of the same name, that allows us to determine results from our SQL statements. The sqlca structure is a standard structure used when embedding SQL in C code, though implementations vary slightly. For our install of PostgreSQL the structure is declared to be:

  struct sqlca
  {
     char      sqlcaid[8];
     long      sqlabc;
     long      sqlcode;
     struct
     {
        int      sqlerrml;
        char      sqlerrmc[70];
     } sqlerrm;
     char      sqlerrp[8];
     long      sqlerrd[6];
     char      sqlwarn[8];
     char      sqlext[8];
  };

Actually interpreting the contents of sqlca can seem a little odd. The implementation of ecpg that comes with PostgreSQL does not implement as much of the sqlca functionality as some commercial packages such as Oracle. This means some members of the structure are unused, however all the important functions are implemented, so it is perfectly usable.

When processing an sqlca structure you first need to check sqlca.sqlcode. If it is less than zero then something serious went wrong, if it's zero all is well, and if it's 100 then no data was found, but that was not an error.

When an INSERT, UPDATE or SELECT statement succeeds, sqlca.sqlerrd[2] will contain the number of rows that were affected.

If sqlca.sqlwarn[0] is 'W', then a minor error occurred, usually data was retrieved successfully, but was not transferred correctly into a host variable (we will meet these later in the chapter).

When an error occurs sqlca.sqlerrm.sqlerrmc contains a string describing the error.

Commercial packages use more fields, that can tell you a notional 'cost' and other information, but these are not currently supported in PostgreSQL. However since such information is only occasionally useful, it's omission is not generally missed.

Let's just summarize that explanation:

Let's try this out, by modifying our upd1.pgc file to include sqlca, and also deliberately making it fail, by using an invalid table name:

#include <stdlib.h>
#include <stdio.h>

exec sql include sqlca;
main() {

exec sql connect to 'rick@gw1';

exec sql BEGIN WORK;

exec sql UPDATE XXchildren SET fname = 'Emma' WHERE age = 0;

printf("error code %d, message %s, rows %d, warning %c\n", 
  sqlca.sqlcode, sqlca.sqlerrm.sqlerrmc, sqlca.sqlerrd[2], 
  sqlca.sqlwarn[0]);

exec sql COMMIT WORK;

exec sql disconnect all;

return EXIT_SUCCESS;

}

This is upd2.pgc. The highlighted lines show the important changes. Compile it as before:

$ ecpg -t -I/usr/include/pgsql upd2.pgc
$ gcc -g -o upd2 -I /usr/include/pgsql/ upd2.c -lecpg -lpq

This time when we run it, an error is generated:

error code -400, message Postgres error: ERROR: xxchildren: 
Table does not exist.
line 10., rows 0, warning

As you can see, it's a little basic but does the job.

Now we have seen the basics, we can get to important issue - how do we access data that SQL statements embedded in .pgc files return?

The answer is actually quite simple, and relies on variables called host variables, which are accessible to both the statements delimited by exec sql ... ; and to the ordinary C compiler.

We do this by having a declare section, usually near the start of the file, that is processed by both the ecpg processor, and the C compiler. This is achieved by declaring C variables inside a special declare section, which also tells the ecpg processor to process them. We use the delimiting statements:

exec sql begin declare section;

and

exec sql end declare section;

Suppose we wanted to declare two variables, child_name and child_age, that are intended to be accessible in both the embedded SQL and in the C code for use in the rest of the program.
What we need is:

exec sql begin declare section;

int child_age;
VARCHAR child_name[50];

exec sql end declare section;

You will notice two odd things here, firstly the 'magic number' 50 as a string length, and secondly that VARCHAR is not a normal C type. We are forced to use literal numbers here, because this section of code is being processed by ecpg before the C compiler runs, so it is not possible to use either a #define or a constant. The reason for VARCHAR is because the SQL type of the fname column in children is not a type that maps directly to a C type. We must use the PostgreSQL type in our declaration, which is then converted into a legal C structure by the ecpg pre-processor, before the C compiler sees it. The result of this line in the source file is to create a structure called child_name, with two members, a char array 'arr', and an integer len, to store the length. So what the C compiler sees from this one line is actually:

struct varchar_child_name {int len; char arr[50];} child_name;

Now we have two variables, visible both in SQL and in C. We use a slight extension of the SQL syntax, the 'into' keyword, to retrieve data from the table into named variables, which are denoted by having a ':' prepended to the name. This is so they cannot be confused with values or table names. Notice this 'into' is not the same as the extension some vendors support to allow interactive selecting of data from one table into another. The 'into' keyword has a slightly different meaning when using embedded SQL.

exec sql SELECT fname into :child_name FROM 
children WHERE age = :child_age;

The epgc pre-processor converts this to C, which we compile in the normal way. So our complete code is now in selp1.pgc, and looks like this:

#include <stdlib.h>
#include <stdio.h>

exec sql include sqlca;

exec sql begin declare section;

int child_age;
VARCHAR child_name[50];

exec sql end declare section;

main() {

exec sql connect to 'rick@gw1';

exec sql BEGIN WORK;

child_age = 14;

exec sql SELECT fname into :child_name FROM children WHERE age =
     :child_age; 

printf("error code %d, message %s, rows %d, warning %c\n", 
    sqlca.sqlcode,sqlca.sqlerrm.sqlerrmc, sqlca.sqlerrd[2], 
    sqlca.sqlwarn[0]);

if (sqlca.sqlcode == 0) {
  printf("Child's name was %s\n", child_name.arr);
}

exec sql COMMIT WORK;

exec sql disconnect all;

return EXIT_SUCCESS;
}

The important changes are highlighted. Notice we need to use child_name.arr to access the returned data. However you only need to use VARCHAR declarations when you want to get data out of the database - when you want to store data into a VARCHAR field you should use a NULL terminated C string in the normal way.

However there is a potential problem with this program. You will see that we had to declare our child_name VARCHAR to be a fixed size, even though we could not know in advance how large the answer might have been. What will happen if we make child_name only 3 long, and the name stored in the database is longer than this? In this case ecpg will only retrieve the first 3 characters, and will set the warning flag. If we change the declaration to VARCHAR child_name[3] and run the program we get:

error code 0, message , rows 1, warning W
Child's name was Jen

(You may also see some corruption, we will explain why in a moment.)

Continued...


NEXT PAGE



10 RELATED COURSES AVAILABLE
LINUX FUNDAMENTALS
This course covers the competencies and skills identified as key to intending Linux users and developers. The cou....
LINUX USER INTRODUCTION
This course covers the competencies and skills identified as key to intending Linux users. The course aims are to....
LINUX OVERVIEW
To provide technical users new to Linux with a sound appreciation of the operating system. The course provides re....
UNIX NETWORKING ADMINISTRATION INTRODUCTION
Following the UNIX International Courseware Accreditation standards, this course covers the competencies and the ....
UNIX SYSTEM ADMINISTRATION SVR3
This course covers the competencies and skills identified as key to Unix System administrators. The course intro....
 
1 RELATED JOBS AVAILABLE
UNIX, LINUX (HIGH AVAILABILITY CLUSTERS) & QNX SYSTEMS ADMIN N/WEST
Computer Futures Solutions are seeking an experienced Linux/Unix/QNX Systems Administrator with a knowledge of Ne....
CONTACT US
Thursday 24th July 2008  © COPYRIGHT 2008 - VISUALSOFT