Fun with Filegroups

Published by

on

Picture mentioning databases and filegroups.
Databases and filegroups.

Filegroups are an interesting and useful (if not actually all that fun, despite the title) concept within SQL Server. They provide not just a way to group objects like tables together, but also provide a mechanism to help us to speed up the backup, restore, and recovery of the databases. And occasionally, they can even give us a little help with the performance or to migrate data to disks that have more space available.

In this post, focus will be in archiving a table that we no longer need by making a read-only copy of it. In a real-world scenario we’d also consider putting that to a slower disk, if it’s not frequently used, but my demo environment only has a single data disk.

Filegroups

By default, all databases have two filegroups, primary and log ones. Primary one has few system objects needed to start up the database and pointers to other files on the database. Log filegroup has, unsurprisingly, the transaction log. On top of that, you can add secondary filegroups to hold user objects and data. One nice details is that each filegroup can have multiple data files, which can be placed on separate drives, like in our demo StackOverflow database.

Finding your filegroups

Filegroups and files can be easily added to an existing database, which opens up some interesting possibilities, such as migrating data from one disk to another or just to archive it as a read-only copy. For the demo purpose, we’ll use the database from the StackOverflow, which is public available and just happens to have a number of files already in the primary filegroup. To see the setup we’re starting with, we can use the following query.

SELECT F.name AS [Filegroup]
     , DF.name AS [Logical Filename]
     , DF.physical_name AS [Physical Filename]
     , DF.type_desc AS [File Type]
     , DF.state_desc AS [Status]
FROM sys.filegroups AS F
    INNER JOIN sys.database_files AS DF
        ON DF.data_space_id = F.data_space_id;
Our database filegroups.

So, we have 4 datafiles in a single filegroup (called primary, which is the default) in our database. I think it’s quite safe to assume that all our tables will reside in the primary at this point, but there’s also a query you can use to check that.

SELECT T.name AS [Table Name]
     , DS.name AS [Filegroup Name]
FROM sys.tables T
    INNER JOIN sys.indexes I
        ON T.object_id = I.object_id
    INNER JOIN sys.filegroups DS
        ON I.data_space_id = DS.data_space_id
    INNER JOIN sys.partitions P
        ON I.object_id = P.object_id
           AND I.index_id = P.index_id;

No surprises there, since we only have a single filegroup.

Tables in their filegroup.

Let’s say that we’d like to archive the Badges table and in the process make it a read-only. We would do this by first creating a new filegroup and then proceed to add a new datafile into it.

Adding a filegroup and a datafile

Adding a filegroup and a datafile is actually quite straightforward. First you create the filegroup, then you add a datafile to that filegroup. In the example below, we’ll create a filegroup called ReadOnlyFG and then add a file called StackOverflow2013_Archive.MDF to it.

ALTER DATABASE StackOverflow2013 ADD FILEGROUP ReadOnlyFG;

ALTER DATABASE StackOverflow2013
ADD FILE
    (
        NAME = ArchivedTables
      , FILENAME = 'F:\Data\StackOverflow2013_Archive.MDF'
    )
TO FILEGROUP ReadOnlyFG;

Running the first query from the post again now shows our brand new read-only filegroup.

Newly added FG is visible.

Moving a table

So far, so good. However, moving a table to a new filegroup isn’t as obvious as you might think, mostly because there’s no ALTER TABLE or ALTER DATABASE command that would actually allow you to move a table to another filegroup. Defining the filegroup a table should be in can only be done during the table creation, which of course isn’t very handy in our scenario where the table already exists. There is a way around this, without need for creating a new table and moving data there, by using clustered indexes. What makes this is possible is the fact that clustered indexes actually hold all the data.

So let’s start by looking at the Badges table to determine if it has a clustered index.

SELECT DB_NAME() AS [Database Name]
     , O.name AS [Table Name]
     , I.name AS [Index Name]
	 , I.type_desc AS [Index Type]
FROM sys.indexes AS I
    INNER JOIN sys.objects AS O
        ON O.object_id = I.object_id
WHERE O.type = 'U' AND O.name = N'Badges'
Badges being moved!

And we indeed have one. After that, all we have to do is to re-create the index using DROP_EXISTING, and we should be good.

CREATE UNIQUE CLUSTERED INDEX PK_Badges_Id ON dbo.Badges(Id)
WITH (DROP_EXISTING=ON, ONLINE=ON) ON [ReadOnlyFG]

And this is where I’ll advise some caution. Depending on the amount of data, this can take quite a bit of time. Once we’re completed with the command, let’s see if something has changed.

Badges moving to ReadOnlyFG.

And it has. We’ve now have moved the data to read-only filegroup. Now all that remains is to make it so. This is pretty quickly done, using the following ALTER DATABASE command.

ALTER DATABASE StackOverflow2013 MODIFY FILEGROUP [ReadOnlyFG] READ_ONLY;

And we’re done. Let’s see what happens if we try to update a row in Badges table. Let’s just pick the lucky winner from the top of the list and see if we can change their name.

SELECT TOP 1 * FROM dbo.Badges AS B
Finding the candidate.
UPDATE dbo.Badges SET Name = N'Teacher2' WHERE Id = 82946;
And failing to update it, as planned.

And as expected, it failed. With this result, we can also wrap up this post. I’ll be doing another filegroup related post in the future when I’ll be diving into the wonderful world of backups!

2 responses to “Fun with Filegroups”

  1. Purusharth Pathak Avatar

    It’s a great article Mika. (y)

    I recently got such requirements where I had to partition table with some new file groups based on partition length. ( like some years divided into a month ) , had to make such script with parameters as per requirement.
    I made it in stored procs. The implementation was exactly as per this article.
    The only difficulty was to roll back ( just in case) needed. Also, felt some index dependency ( to make one if not there) in order to move data from the heap to BTree( index).

    1. Mika Sutinen Avatar

      Glad you liked it 🙂

Leave a Reply

WordPress.com.

Discover more from SQLStarters

Subscribe now to keep reading and get access to the full archive.

Continue reading