php preg_replace assign different replacement pattern for each capturing group

The name of the picture


php preg_replace assign different replacement pattern for each capturing group



I'm trying to perform a mysql fulltext search in boolean mode and I need to prepare the search text before I build the mysql query.



In order to achieve it, I though I could use the PHP function preg_replace and replace each capturing group by one specific pattern.


preg_replace


"hello world"


+


+"hello world"


+


*


+how*



regex pattern


["']+([^"']+)["']+|([^s"']+)



substitution pattern


+"1" +2*



EXAMPLE



For the following input:


"hello world" how are you?



It should return:


+"hello world" +how* +are* +you?*



But instead, it returns something 'wrong':


+"hello world" +* +"" +how* +"" +are* +"" +you?*



I'm aware that the replacement pattern +"1" +2* will never work since I'm not telling anywhere that +"..." should only apply to the first capturing group and +...* to the second one.


+"1" +2*


+"..."


+...*



Test online regex.



PHP code


$query = preg_replace('~["']+([^"']+)["']+|([^s"']+)~', '+"1" +2*', $query);



Is there a way to achieve this in PHP? Thank you in advance.



EDIT / SOLUTION



Thanks to @revo suggestion to use the PHP function preg_replace_callback, I managed to assign a replace pattern to each search pattern with the extended function preg_replace_callback_array. Please note that this function needs PHP >= 7.


preg_replace_callback


preg_replace_callback_array



Here I post the final version of the function I use to perform a FULLTEXT search via MATCH (...) AGAINST (...) IN BOOLEAN MODE. The function is declared within the class dbReader in a Wordpress Plugin. Maybe it can be useful for someone.


FULLTEXT


MATCH (...) AGAINST (...) IN BOOLEAN MODE


class dbReader


// Return maximum 100 ids of products matching $query in
// name or description searching for each word using MATCH AGAINST in BOOLEAN MODE
public function search_products($query) {

function replace_callback($m, $f) {
return sprintf($f, isset($m[1]) ? $m[1] : $m[0]);
}

// Replace simple quotes by double quotes in strings between quotes:
// iPhone '8 GB' => iPhone "8 GB"
// Apple's iPhone 8 '32 GB' => Apple's iPhone 8 "32 GB"
// This is necessary later when the matches are devided in two groups:
// 1. Strings not between double quotes
// 2. Strings between double quotes
$query = preg_replace("~(s*)'+([^']+)'+(s*)~", '$1"$2"$3', $query);

// Do some magic to take numbers with their units as one word
// iPhone 8 64 GB => iPhone 8 "64 GB"
$pattern = array(
'(b[.,0-9]+)s*(gbb)',
'(b[.,0-9]+)s*(mbb)',
'(b[.,0-9]+)s*(mmb)',
'(b[.,0-9]+)s*(mhzb)',
'(b[.,0-9]+)s*(ghzb)'
);
array_walk($pattern, function(&$value) {
// Surround with double quotes only if the user isn't doing manual grouping
$value = '~'.$value.'(?=(?:[^"]*"[^"]*")*[^"]*Z)~i';
});
$query = preg_replace($pattern, '"$1 $2"', $query);

// Prepare query string for a "match against" in "boolean mode"
$patterns = array(
// 1. All strings not sorrounded by double quotes
'~([^s"]+)(?=(?:[^"]*"[^"]*")*[^"]*Z)~' => function($m){
return replace_callback($m, '+%s*');
},

// 2. All strings between double quotes
'~"+([^"]+)"+~' => function($m){
return replace_callback($m, '+"%s"');
}
);

// Replace every single word by a boolean expression: +some* +word*
// Respect quoted strings: +"iPhone 8"
// preg_replace_callback_array needs PHP Version >= 7
$query = preg_replace_callback_array($patterns, $query);

$fulltext_fields = array(
'title' => array(
'importance' => 1.5,
'table' => 'p',
'fields' => array(
'field1',
'field2',
'field3',
'field4'
)
),
'description' => array(
'importance' => 1,
'table' => 'p',
'fields' => array(
'field5',
'field6',
'field7',
'field8'
)
)
);
$select_match = $match_full = $priority_order = "";

$args = array();
foreach ($fulltext_fields as $index => $obj) {
$match = $obj['table'].".".implode(", ".$obj['table'].".", $obj['fields']);
$select_match .= ", MATCH ($match) AGAINST (%s IN BOOLEAN MODE) AS {$index}_score";
$match_full .= ($match_full!=""?", ":"").$match;
$priority_order.= ($priority_order==""?"ORDER BY ":" + ")."({$index}_score * {$obj['importance']})";
array_push($args, $query);
}
$priority_order .= $priority_order!=""?" DESC":"";

// User input $query is passed as %s parameter to db->prepare() in order to avoid SQL injection
array_push($args, $query, $this->model_name, $this->view_name);

return $this->db->get_col(
$this->db->prepare(
"SELECT p.__pk $select_match
FROM ankauf_... AND
MATCH ($match_full) AGAINST (%s IN BOOLEAN MODE)
INNER JOIN ...
WHERE
m.bezeichnung=%s AND
a.bezeichnung=%s
$priority_order
LIMIT 100
;",
$args
)
);
}





You need preg_replace_callback.
– revo
Jul 17 at 16:35


preg_replace_callback





Now I'd like to be sure that this query does not remain open to MySQL injection. Do you think it's enough db->prepare("... AGAINST (%s IN BOOLEAN MODE) ...") to avoid it? Thank you in advance for your advice.
– Gerard Fígols
6 hours ago


db->prepare("... AGAINST (%s IN BOOLEAN MODE) ...")




1 Answer
1



You have to use preg_replace_callback:


preg_replace_callback


$str = '"hello world" how are you?';

echo preg_replace_callback('~("[^"]+")|S+~', function($m) {
return isset($m[1]) ? "+" . $m[1] : "+" . $m[0] . "*";
}, $str);



Output:


+"hello world" +how* +are* +you?*



Live demo





I accepted your answer but I couldn't mark it as 'useful' (up vote) because I need at least 15 reputation points.
– Gerard Fígols
6 hours ago






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

Keycloak server returning user_not_found error when user is already imported with LDAP

Using generate_series in ecto and passing a value

PHP parse/syntax errors; and how to solve them?