Extracting Key-worlds out of string and show them in another column

Multi tool use


Extracting Key-worlds out of string and show them in another column
I need to write a query to extract specific names out of String and have them show in another column for example a column has this field
Column:
Row 1: jasdhj31e31jh123hkkj,12l1,3jjds,Amin,02323rdcsnj
Row 2:jasnasc8918212,ahsahkdjjMina67,
Row 3:kasdhakshd,asda,asdasd,121,121,Sina878788kasas
Key Words: Amin,Mina,Sina
How could I have these key words in another column? I dont want to insert another column but if that's the only solution let me know.
Any help appreciated!
what the logic for extracting those keywords? do you have those keywords list pre-defined and just need to detect which ones present in which row?
– Mikhail Berlyant
2 days ago
2 Answers
2
Below is for BigQuery Standard SQL
#standardSQL
WITH keywords AS (
SELECT keyword
FROM UNNEST(SPLIT('Amin,Mina,Sina')) keyword
)
SELECT str, STRING_AGG(keyword) keywords_in_str
FROM `project.dataset.table`
CROSS JOIN keywords
WHERE REGEXP_CONTAINS(str, CONCAT(r'(?i)', keyword))
GROUP BY str
You can test, play with above using dummy data from your question as below
#standardSQL
WITH `project.dataset.table` AS (
SELECT 'jasdhMINAj31e31jh123hkkj,12l1,3jjds,Amin,02323rdcsnj' str UNION ALL
SELECT 'jasnasc8918212,ahsahkdjjMina67,' UNION ALL
SELECT 'kasdhakshd,asda,asdasd,121,121,Sina878788kasas'
), keywords AS (
SELECT keyword
FROM UNNEST(SPLIT('Amin,Mina,Sina')) keyword
)
SELECT str, STRING_AGG(keyword) keywords_in_str
FROM `project.dataset.table`
CROSS JOIN keywords
WHERE REGEXP_CONTAINS(str, CONCAT(r'(?i)', keyword))
GROUP BY str
with results as
Row str keywords_in_str
1 jasdhMINAj31e31jh123hkkj,12l1,3jjds,Amin,02323rdcsnj Amin,Mina
2 jasnasc8918212,ahsahkdjjMina67, Mina
3 kasdhakshd,asda,asdasd,121,121,Sina878788kasas Sina
But the column has hundreds of row and I can't add all the row in select. So is there anyway I can write it to look in each row instead of adding each string in SELECT 'jasdhMINAj31e31jh123hkkj,12l1,3jjds,Amin,02323rdcsnj' str UNION ALL SELECT 'jasnasc8918212,ahsahkdjjMina67,' UNION ALL SELECT 'kasdhakshd,asda,asdasd,121,121,Sina878788kasas'
– Amin
2 days ago
see update. so hopefully it is more clear for you now how to use answer. just replace
project.dataset.table
with your real table reference– Mikhail Berlyant
2 days ago
project.dataset.table
as FYI: you can mark accepted answer by using the tick on the left of the posted answer, below the voting. see meta.stackexchange.com/questions/5234/… for why it is important.
– Mikhail Berlyant
2 days ago
It works thank you so much
– Amin
2 days ago
when I'm adding my criteria in this query I'm getting the error that Unrecognized name: str at [8:23]. I dont know whats the problem with the query. could you please advise! WITH keywords AS ( SELECT keyword FROM UNNEST(SPLIT('a,b,s')) keyword ) SELECT STRING_AGG(keyword) keywords_in_str FROM
xxx
CROSS JOIN keywords WHERE REGEXP_CONTAINS(keyword, CONCAT(r'(?i)')) GROUP BY str– Amin
yesterday
xxx
to count the no of keywords
#standardSQL
WITH `project.dataset.table` AS (
SELECT 'jasdhMINAj31e31jh123hkkj,12l1,3jjds,Amin,02323rdcsnj' str UNION ALL
SELECT 'jasnasc8918212,ahsahkdjjMina67,' UNION ALL
SELECT 'kasdhakshd,asda,asdasd,121,121,Sina878788kasas'
)
select str,array(select as struct countif(lower(x) ="amin") amin,countif(lower(x) ="mina") mina,countif(lower(x)="sina") sina from unnest(x)x)keyword from
(select str,regexp_extract_all(str,"(?i)(Amin|Mina|Sina)")x from `project.dataset.table`)
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
You should improve the formatting of you row data. It's currently quite hard to understand.
– Gregor Koukkoullis
2 days ago