Babelfish for PostgreSQL, what the heck is that?

A short answer is: Babelfish is an extension for PostgreSQL, that allows your SQL Server applications to speak to PostgreSQL in their native language.

The next question you probably have is: “Why would I ever want my applications to do that?” And it’s a valid question, to which the answer would be, to modernize your application out of SQL Server to an open-source PaaS database.

Tower shaped databases of babel.
Databases of Babel

There are two things to keep in mind when it comes to Babelfish for PostgreSQL. First is, that it is not a database migration tool. And I think this is one of the biggest misunderstandings around the whole feature. What the Babelfish for PostgreSQL will do for you, is to act as a translator between your SQL Server based application and the PostgreSQL database. To quote the project directly (also highlighting a few words, to help you set your expectations).

Babelfish for PostgreSQL is an open-source project available under the Apache 2.0 and PostgreSQL licenses. It provides the capability for PostgreSQL to understand queries from applications written for Microsoft SQL Server. Babelfish understands the SQL Server wire-protocol and T-SQL, the Microsoft SQL Server query language, so you don’t have to switch database drivers or re-write all of your application queries. With Babelfish, applications currently running on SQL Server can now run directly on PostgreSQL with fewer code changes.

Babelfish (babelfishpg.org)

The second thing to remember is, that it’s not AWS only. Babelfish is an open-source project, and will run on any other cloud too. However, for AWS benefit, I do think that they’ve made enabling the feature very effortless with their Aurora. Which is also the reason I am using it for the screenshots in this post.

Let’s dive into the details then.

Setting up Babelfish for PostgreSQL in Aurora

Setting up the Babelfish for Aurora PostgreSQL is as straightforward as it can get, it’s a simple checkbox in AWS Portal and we’re good.

AWS portal UI for enabling Babelfish during setup.
Babelfish settings for Amazon Aurora

What actually happens behind the scenes is, that we add a secondary endpoint for our PostgreSQL cluster for TDS. The below image from Amazon documentation describes this setup quite well.

Babelfish for Amazon Aurora picture showing two endpoints.
Babelfish with Amazon Aurora (source: aws.amazon.com)

We can also confirm this by looking at our RDS resources in the Portal, it indicates that we are indeed listening for incoming connections on 2 separate TCP ports. 1433 that is the default port used by SQL Server, and 5432 that is the default port for PostgreSQL.

Two endpoint ports for Amazon Aurora

Connecting to Babelfish for PostgreSQL database

This is the part where things get slightly more complicated, and it’s probably easier to show this with a picture than try to describe it. I am using Azure Data Studio here because it allows me to have connections to both endpoints, PostgreSQL and SQL Server.

The first thing you likely notice is, that while I am connecting to the same server, I am seeing an entirely different set of databases.

In PostgreSQL, I have:

  • babelfish_db (created by Babelfish)
  • postgres (included with Aurora)

And in SQL Server I have:

  • mytestbabeldb1 (which I created manually)
View of Babelfish databases when connecting to PostgreSQL and SQL Server endpoints
View on SQL Server and Postgres

To provide better visual evidence of what’s happening here, I am quickly switching to SQL Server Management studio. On the right side, you’ll see my connections from SSMS to PostgreSQL.

Connection from SQL Server Management Studio
View from SSMS
How the connections appear from PostgreSQL side.
My connections in PostgreSQL

Working with Babelfish for PostgreSQL

So, at first look it appears that I am connecting to a database that doesn’t really exist. The babelfish_db database is used by PostgreSQL to emulate a multi-database SQL Server instance. What this means is, that any database you create using the TDS connection, is actually created inside the babelfish_db. This also includes SQL Server system databases (master, msdb and tempdb). And If you’re wondering why I didn’t mention model, well…

The model database doesn’t live here anymore.

Not that it would be needed, as the Babelfish does all the work for setting up your SQL Server databases.

Picture of system databases, missing the model database.
SQL Server system databases

So yes, having tempdb (and all the other system databases) inside my user database sure sounds spiffy. What could possibly go wrong? But how does that happen then? Simply by switching the databases into schemas (sounds like something Oracle would do?), as can be seen in the following screenshot.

There are separate schemas for:

  • dbo
  • master_dbo
  • msdb_dbo
  • tempdb_dbo
View of SQL Server schemas inside the babelfish_db database.
The schemas inside babelfish_db

Now to confirm this, I run this code, to create a table in tempdb:

USE tempdb
CREATE TABLE temp_table (
   Name VARCHAR(50),
   Age INT

Image of temp_table being created inside the tempdb_dbo schema in babelfish_db
Table inside the tempdb_dbo schema

And sure enough, I see a table created inside the tempdb_dbo schema, in the babelfish_db. As I was playing around testing with tempdb, one of the first things I noticed was, that it wouldn’t support global temp tables, for one.

The test tables I created inside mytestbabeldb1 (visible in SQL Server) all ended up under the dbo-schema.

What about the use cases?

After the initial reveal of the Babelfish, my interest in actually trying this somewhere outside my blog post demos, has gone down quite a bit. I was thinking about the use cases, and to me, I see Babelfish having a single, good one.

Modernization accelerator. When you have multiple non-business critical applications connecting to a single SQL Server database (a common pattern in legacy apps), and you want to start moving the applications fully to PostgreSQL. Babelfish would allow you to modernize your applications to use native PostgreSQL connection libraries and queries, one by one, while still using the same database.

This path, where you would gradually remove SQL Server dependencies, is really the only scenario, that I could think of, that was making sense. I wouldn’t consider Babelfish enabled PostgreSQL being the final destination for my SQL Server databases and applications, but the stepping stone to get to native PostgreSQL.

Wrapping it up

All in all, I think Babelfish is an interesting project. While I was saying that I wouldn’t really be interested in trying it out, in a production environment just yet, I’ll definitely keep my eye on the project. It’ll be fascinating to see which direction the development goes, and if at some point it’ll feel a more mature option for database modernization. Let’s just hope that the development of Babelfish goes forward more smoothly, than the fables that tell of the Tower of Babel.

To me, it isn’t just there yet. Or maybe I was just having unfortunate luck, and managing to hit just those features that were not supported (it’s a long list), while I was testing it out…

Published by

Leave a Reply

%d bloggers like this: