/ SQL

Splitting array/string into rows in Amazon Redshift or MySQL

We've worked with a lot of customers who writes SQL on a regular basis. And from time to time they come to us with SQL question - that we thought would be interesting to share with others!

Today, one of our customers came to us with a question - They were having problems unnesting a string/array into rows in Amazon Redshift.

Question - Splitting comma-delimited string into rows in Redshift

Suppose we have a books table in Amazon Redshift that has these data:

tags (varchar)
-----
A
A, B
C, D
E

And we want to split/unnest/explode it so that it becomes like this

tag
-----
A
A
B
C
D
E

In Postgres, we can immediately use unnest:

select TRIM( UNNEST( STRING_TO_ARRAY(tags, ',') ) ) from books;

But how do we do this in Amazon Redshift (that doesn't support unnest)?

String to Array in Amazon Redshift

The solution (or workaround) is trying to split the string into multiple part:

with NS AS (
  select 1 as n union all
  select 2 union all
  select 3 union all
  select 4 union all
  select 5 union all
  select 6 union all
  select 7 union all
  select 8 union all
  select 9 union all
  select 10
)
select
  TRIM(SPLIT_PART(B.tags, ',', NS.n)) AS tag
from NS
inner join books B ON NS.n <= REGEXP_COUNT(B.tags, ',') + 1

The above query will give you the exact results as using unnest in Postgres.

Here, the NS (number sequence) is a CTE that returns a list of number from 1 to N, here we have to make sure that our max number is greater than the size of our maximum tags, so you can try adding more numbers to the list depending on your context.

Similar problem in MySQL

Similar approach can be applied if you're using MySQL, though you have to change up the syntax/functions a little bit:

select
  TRIM(SUBSTRING_INDEX(SUBSTRING_INDEX(B.tags, ',', NS.n), ',', -1)) as tag
from (
  select 1 as n union all
  select 2 union all
  select 3 union all
  select 4 union all
  select 5 union all
  select 6 union all
  select 7 union all
  select 8 union all
  select 9 union all
  select 10
) NS
inner join books B ON NS.n <= CHAR_LENGTH(B.tags) - CHAR_LENGTH(REPLACE(B.tags, ',', '')) + 1

A few notes:

  • MySQL doesn't have CTE, so you have to write NS as a subquery
  • MySQL doesn't have SPLIT_PART, so we use 2 nested SUBSTRING_INDEX to achieve the same thing
  • MySQL doesn't have REGEXP_COUNT, so we have to work around a bit to count the number of , in the string.

What if I have an Redshift JSON array instead?

Okay, what if your tags column is actually a string that contains a JSON string?

tags (string with json)
-----
['A']
['A', 'B']
['C', 'D']
['E']

One approach is you can just treat them as string, and apply the same string_split above with some string cleaning (remove quotes, square brackets). Or you can use some JSON array functions:

with NS AS (
  select 1 as n union all
  select 2 union all
  select 3 union all
  select 4 union all
  select 5 union all
  select 6 union all
  select 7 union all
  select 8 union all
  select 9 union all
  select 10
)
select
  TRIM(JSON_EXTRACT_ARRAY_ELEMENT_TEXT(B.tags, NS.n - 1)) AS val
from NS
inner join books B ON NS.n <= JSON_ARRAY_LENGTH(B.tags)

Conclusion

Working with array in Redshift and MySQL is usually a pain, but most of the times there's always creative workaround that helps you achieve the same purpose!

I've compiled the list of examples with sample data here for your reference (you can copy and paste directly into your SQL terminal)


— — — — — — — — — — — —

Having problems finding a simple and affordable data reporting system for your startups? Check us out at holistics.io.

Huy Nguyen

Huy Nguyen

Original creator and cofounder of Holistics, a data platform for tech companies. Holistics’ customers are tech startups like Grab, Traveloka, ShopBack, 99co, Tech In Asia and alike.

Read More