SQL-WTF TIL (Snowflake edition)

WHAT FIELDS WILL I SEE AS A RESULT OF A SELECT QUERY THAT RETRIEVES MULTIPLE FIELDS WITH IDENTICAL NAMES?

Hold on before answering; it's not all that obvious 🙂

Let me explain what the answer depends on:

The type of client from which the query is made.
- Whether we are using USING or not.
- Whether there is data in the sources or not.
- Whether we are looking directly at the result of the executed query or at the result from the history using RESULT_SCAN.

Cool, right? Now, I'll briefly explain each point 🙂

Client Type
Database client creators love adding spices to the work process of data engineers 🙂 All examples discussed were executed in Snowsight (in SnowSQL, the result will be different). To be fair, it's not always clear which part of the described chaos is attributed to the client and which to the database itself. But we won't delve into that now…

The creators decided that when more than one column with the same name is encountered in the result, it would be a good idea to rename them! A, A1, A2… The idea is great, but how do I now understand if these are actually existing columns or not?

In reality, such columns do not exist. You cannot refer to them in SQL.

USING
USING is syntactic sugar to shorten the ON clause. But it affects not only ON, which might seem logical, but also the list of columns.

CREATE TABLE T1 (A INT, B INT);
SELECT
*
FROM T1
LEFT JOIN T1 as T1_1 USING(A);

It is implied that the result should be identical to:

SELECT
*
FROM T1
LEFT JOIN T1 as T1_1 ON T1.A = T1_1.A

and we should get A, B, A, B (or the renamed version described above).

But no! USING removes one column A. Really, why do you need 2 identical columns that will break everything for you later? So, it should be A, B, B.

Everything is fine here. But what if we continue to use JOIN with USING?

SELECT
*
FROM T1
LEFT JOIN T1 as T1_1 USING(A)
LEFT JOIN T1 as T1_2 USING(A)
...

Do you think that each time another column B will be added? Nope! 🙂 All subsequent JOIN USING with identically named columns will simply be ignored! Moreover, this will happen until a JOIN with ON is encountered!


SELECT
*
FROM T1
LEFT JOIN T1 as T1_1 USING(A)
LEFT JOIN T1 as T1_2 USING(A)
LEFT JOIN T1 as T1_3 ON T1.A = T1_3.A

will return A B B2 A2 B_3

Empty and Non-Empty Tables
Yes, Snowflake (Snowsight) renames columns differently depending on whether there is data in the tables or not!


-- EMPTY TABLES
SELECT * FROM t1 CROSS JOIN t1 a;
-- A B A_2 B_2

-- TABLE T1 IS NOT EMPTY
SELECT * FROM t1 CROSS JOIN t1 a;
-- A B A B


RESULT_SCAN
This function returns the result of a query by its UUID in QUERY_HISTORY:


SELECT * FROM t1 CROSS JOIN t1 a;
-- A B A_2 B_2

SELECT * FROM TABLE(RESULT_SCAN(LAST_QUERY_ID()));
-- A B A_1 B_1


But now, A1 and B1 are actual columns!


SELECT A_1 FROM
(
SELECT * FROM TABLE(RESULT_SCAN(LAST_QUERY_ID()))
);
-- OK


More examples are here: Github
Thumbs up here: Linkedin
Media is too big
VIEW IN TELEGRAM
We at dwh.dev decided to make some vertical videos (how do you do, fellow kids?).
What do you think?
Like it on YouTube, please.
Please open Telegram to view this post
VIEW IN TELEGRAM
Friends from Luna Park are looking for a Data Engineer — could be you or someone you know (could be DS, Python dev or ML engineer if you ask me)

Palabra.ai — the first ever real-time voice interpreter: starting as a tiny team of five engineers, they built a speech-to-text solution that works 50 times faster than OpenAI's Whisper!
The interpreter prototype is releasing in a month and will work in Zoom with a two-second delay — right now it’s one of a kind!

They are now looking for a senior engineer, who will write and optimize complex multi-node data pipelines, work on large-scale data scraping and manage datasets that include hundreds of thousands of hours of audio data.

Key requirements:
🟣5+ years of industry experience as a software developer/data engineer;
🟡Python skills (including modern backend frameworks, low-level asyncio, multithreading/multiprocessing);
🔵A good grasp on neural networks and related tools (NumPy, PyTorch, audio-related libraries such as torchaudio, librosa, etc);
🟢Experience in deploying, orchestrating, and scaling multi-node pipelines in the cloud.

Nice-to-haves:
🔘Compilated languages such as Go, C/C++, and Rust;
🔘CUDA or other GPU-related frameworks experience;
🔘Audio processing-related experience.

Salary is $70k-100k, plus equity up to 1%. It’s a fully remote position.

To apply or learn more, reach out to my Luna Park buddy Fedya @owlkov
Please open Telegram to view this post
VIEW IN TELEGRAM
A friend of mine is on a product hunt today!
Forwarded from Roman Agabekov
Hey friends,

We built a database performance management tool Releem that helps engineers to keep database servers fast, secure and reliable.

Why Releem:
- Automatic performance tuning
- Fast slow query identification
- Monitoring & tuning in one place
- MySQL / MariaDB & AWS RDS.
- Open-source agent

After few years development we launched on Product Hunt.

If you interested in such solutions, have some questions or feedback appreciate your support❤️ here https://www.producthunt.com/posts/releem
2024/06/08 06:55:21
Back to Top
HTML Embed Code: