Precision.BI

June 2010


Table of Contents

Client Spotlight: Sansum Clinic

Idaho Update

Preparing for Idaho

Client Access Center and Web Presentation Center - Security Best Practices

Increasing Revenue - Are Ancillary Services the Answer?

E&M Coding Curves

PBI Training in the 3rd Quarter

What's New - New Website Design, Twitter and a Feature on Microsoft Channel 9

SQL Tips and Tricks

 


Contact Us

Main Office Phone Number
610-927-9344

Main Fax Number
610-927-9343

Support Email
customer.care@precisionbi.com

Website
www.precisionbi.com

Comments or questions about the newsletter – contact Brenda Millar at bmillar@precisionbi.com

 

 

 

 

 

SQL Tips & Tricks: TOP N Rows by Group

By Neil Putt

In this article, I will outline the steps required to get the TOP N rows for a grouped dataset.  Also, be sure to see the Bonus SQL Section at the bottom of this article for a description of the NULLIF function.

For the following code samples, I will be using the Service Analysis table in the Flowcast Data Model for the data source; however, the same concept could be used with a different data source (i.e., the Fact Charge table in the GPMS Data Model).

Let’s start with a simple query to return the Top 25 CPT Codes based on the (Charge) Amount for the Current Fiscal Year.

SELECT TOP 25
ServCptId,
SUM(ServAmt) AS ServAmt
FROM fact_service_analysis
LEFT JOIN dim_pdmap
ON dim_pdmap.DpPk = fact_service_analysis.ServPdId
WHERE dim_pdmap.DpFiscYearDiff = 0
GROUP BY ServCptId
ORDER BY SUM(ServAmt) DESC

Logically, you might think that if we add the Billing Area to the GROUP BY statement, the query would return the TOP 25 CPT Codes by Billing Area.

SELECT TOP 25
ServBillingAreaId,
ServCptId,
SUM(ServAmt) AS ServAmt
FROM fact_service_analysis
LEFT JOIN dim_pdmap
ON dim_pdmap.DpPk = fact_service_analysis.ServPdId
WHERE dim_pdmap.DpFiscYearDiff = 0
GROUP BY ServBillingAreaId, ServCptId
ORDER BY SUM(ServAmt) DESC

However, it still only returns 25 rows of data, broken down by Billing Area and CPT Code; not the TOP 25 CPT Codes for each Billing Area that we had hoped for.

To get the Top 25 CPT Codes by Billing Area based on the (Charge) Amount for the Current Fiscal Year, we need to create two new tables; one to hold the aggregated source data for the TOP N query and one to hold the results. Ideally, the code for this process will be put into a stored procedure and added to the end of the data load. The results table would then be added to the Metadata and exposed in the Treeview for use in Precision.BI.

The two tables that we will be creating are: fact_cpt_billingarea_temp and fact_cpt_billingarea.  Because we will be using a SELECT INTO to populate them, we need to first delete them, if they exist.

IF EXISTS (SELECT * FROM dbo.sysobjects
WHERE id = object_id(N'fact_cpt_billingarea_temp')
AND OBJECTPROPERTY(id, N'IsUserTable') = 1)
DROP TABLE fact_cpt_billingarea_temp

IF EXISTS (SELECT * FROM dbo.sysobjects
WHERE id = object_id(N'fact_cpt_billingarea')
AND OBJECTPROPERTY(id, N'IsUserTable') = 1)
DROP TABLE fact_cpt_billingarea

Next, we need to aggregate the Service Analysis data at the Billing Area, CPT Code Level for the Current Fiscal Year into our source data table and add an IDENTITY column as a Primary Key.

SELECT
IDENTITY(int,1,1) AS Pk,
ServBillingAreaId,
ServCptId,
SUM(ServAmt) AS ServAmt
INTO fact_cpt_billingarea_temp
FROM fact_service_analysis
LEFT JOIN dim_pdmap
ON dim_pdmap.DpPk = fact_service_analysis.ServPdId
WHERE dim_pdmap.DpFiscYearDiff = 0
GROUP BY ServBillingAreaId, ServCptId

Here is the final query to get the Top 25 CPT Codes by Billing Area based on the (Charge) Amount for the Current Fiscal Year. We will then dissect it to help explain its components.

SELECT
c.ServBillingAreaId,
d.RankNum,
c.ServCptId,
c.ServAmt
FROM fact_cpt_billingarea_temp AS c
INNER JOIN (SELECT a.Pk, COUNT(*) AS RankNum
FROM fact_cpt_billingarea_temp AS a
INNER JOIN fact_cpt_billingarea_temp AS b
ON (a.ServBillingAreaId = b.ServBillingAreaId)
AND (a.ServAmt <= b.ServAmt)
GROUP BY a.Pk
HAVING COUNT(*) <= 25) AS d
ON (c.Pk = d.Pk)
ORDER BY c.ServBillingAreaId, d.RankNum

We will start with the self-join in the subquery, with some extra columns for explanation.

SELECT a.ServCptId, a.ServAmt, b.ServCptId, b.ServAmt
FROM fact_cpt_billingarea_temp AS a
INNER JOIN fact_cpt_billingarea_temp AS b
ON (a.ServBillingAreaId = b.ServBillingAreaId)

This query results in a cartesian product, with each CPT Code paired with every other CPT Code (including itself) with the same Billing Area.

Because we are only interested in the TOP 25 CPT Codes based on the Amount, we only want to return CPT Codes paired with other CPT Codes that have a greater than or equal Amount. Thus, the top ranked CPT Code will have only one record in the result; because only one CPT Code (itself) will have an Amount that is greater than or equal to its own to be paired with; and the second ranked CPT Code will have two records in the result, and so on. It is very important to understand this concept, because this is what allows the query to return the correct results.

This also greatly reduces the number of row returned.

SELECT a.ServCptId, a.ServAmt, b.ServCptId, b.ServAmt
FROM fact_cpt_billingarea_temp AS a
INNER JOIN fact_cpt_billingarea_temp AS b
ON (a.ServBillingAreaId = b.ServBillingAreaId)
AND (a.ServAmt <= b.ServAmt)

We get each CPT Codes rank within each Billing Area by adding a GROUP BY statement and counting the rows.

SELECT a.ServBillingAreaId, a.ServCptId, a.ServAmt, COUNT(*) AS RankNum
FROM fact_cpt_billingarea_temp AS a
INNER JOIN fact_cpt_billingarea_temp AS b
ON (a.ServBillingAreaId = b.ServBillingAreaId)
AND (a.ServAmt <= b.ServAmt)
GROUP BY a.ServBillingAreaId, a.ServCptId, a.ServAmt

We limit the results to the TOP 25 by adding a HAVING statement.

SELECT a.ServBillingAreaId, a.ServCptId, a.ServAmt, COUNT(*) AS RankNum
FROM fact_cpt_billingarea_temp AS a
INNER JOIN fact_cpt_billingarea_temp AS b
ON (a.ServBillingAreaId = b.ServBillingAreaId)
AND (a.ServAmt <= b.ServAmt)
GROUP BY a.ServBillingAreaId, a.ServCptId, a.ServAmt
HAVING COUNT(*) <= 25

Since each record in the fact_cpt_billingarea_temp table has a unique Primary Key for the combination Billing Area, CPT Code and Amount, we can replace these fields in the SELECT and GROUP BY statements with the Primary Key. Now, our query is much simpler and should perform much faster

SELECT a.Pk, COUNT(*) AS RankNum
FROM fact_cpt_billingarea_temp AS a
INNER JOIN fact_cpt_billingarea_temp AS b
ON (a.ServBillingAreaId = b.ServBillingAreaId)
AND (a.ServAmt <= b.ServAmt)
GROUP BY a.Pk
HAVING COUNT(*) <= 25

Finally, because we need to get the Billing Area, CPT Code and Amount from the fact_cpt_billingarea_temp table, we self-join the results of the subquery back to that table on the Primary Key and store the results of that query in the fact_cpt_billingarea table. This table would then have to be added to the Metadata (including its relationships to the Billing Area and CPT Code tables) and exposed in the Treeview.

SELECT
c.ServBillingAreaId,
d.RankNum,
c.ServCptId,
c.ServAmt
INTO fact_cpt_billingarea
FROM fact_cpt_billingarea_temp AS c
INNER JOIN (SELECT a.Pk, COUNT(*) AS RankNum
FROM fact_cpt_billingarea_temp AS a
INNER JOIN fact_cpt_billingarea_temp AS b
ON (a.ServBillingAreaId = b.ServBillingAreaId)
AND (a.ServAmt <= b.ServAmt)
GROUP BY a.Pk
HAVING COUNT(*) <= 25) AS d
ON (c.Pk = d.Pk)
ORDER BY c.ServBillingAreaId, d.RankNum

Or, if we want to create a standalone results table and eliminate the need to create the relationships to the Billing Area and CPT Code tables in the Metadata, we can join to those tables as part of the query and include the Billing Area Name and the actual CPT Code.

SELECT
dim_BillingArea.BillingAreaName,
d.RankNum,
dim_cpt.CptCode,
c.ServAmt
INTO fact_cpt_billingarea
FROM fact_cpt_billingarea_temp AS c
INNER JOIN (SELECT a.Pk, COUNT(*) AS RankNum
FROM fact_cpt_billingarea_temp AS a
INNER JOIN fact_cpt_billingarea_temp AS b
ON (a.ServBillingAreaId = b.ServBillingAreaId)
AND (a.ServAmt <= b.ServAmt)
GROUP BY a.Pk
HAVING COUNT(*) <= 25) AS d
ON (c.Pk = d.Pk)
LEFT JOIN dim_BillingArea
      ON dim_BillingArea.BillingAreaPk = c.ServBillingAreaId
LEFT JOIN dim_cpt
      ON dim_cpt.CptPK = c.ServCptId
ORDER BY dim_BillingArea.BillingAreaName, d.RankNum

The above example could easily be modified to return some other TOP N Rows by Group for some other timeframe. For example, TOP 10 CPT Codes by Provider for the Previous Month or TOP 5 Providers by Division for the Previous Year.

BONUS SQL

NULLIF ( expression , expression )

NULLIF returns the first expression if the two expressions are not equal. If the expressions are equal, NULLIF returns a null value of the type of the first expression.

A practical use for the NULLIF function is to check for zero and eliminate Divide by Zero Errors without using a CASE statement. In this example, the denominator will be set to NULL if the CPT Relative Value is equal to zero.

SUM([fact_transac].[trntotalpmt]/(NULLIF([dim_cpt_fact_transac_TrnCptId].[CptRelativeValue],0)))

If you are interested in achieving the TOP N Rows by Group results for use in Precision.BI, but do not have the internal resources that can create the required SQL code; we can help set this up for you. We would create the stored procedure that generates the new tables and add the new fields to the Metadata and the Treeview. For assistance, please contact Karen Basham in our Consulting Department at kbasham@precisionbi.com.

If you have any questions about the TOP N Rows by Group or the NULLIF function outlined in this article, I can be reached at nputt@precisionbi.com.

 

 

 
 
Origin Healthcare Solutions