php preg_replace assign different replacement pattern for each capturing group
I'm trying to perform a mysql fulltext search in boolean mode and I need to prepare the search text before I build the mysql query.
In order to achieve it, I though I could use the PHP function preg_replace
and replace each capturing group by one specific pattern.
preg_replace
"hello world"
+
+"hello world"
+
*
+how*
regex pattern
["']+([^"']+)["']+|([^s"']+)
substitution pattern
+"1" +2*
EXAMPLE
For the following input:
"hello world" how are you?
It should return:
+"hello world" +how* +are* +you?*
But instead, it returns something 'wrong':
+"hello world" +* +"" +how* +"" +are* +"" +you?*
I'm aware that the replacement pattern +"1" +2*
will never work since I'm not telling anywhere that +"..."
should only apply to the first capturing group and +...*
to the second one.
+"1" +2*
+"..."
+...*
Test online regex.
PHP code
$query = preg_replace('~["']+([^"']+)["']+|([^s"']+)~', '+"1" +2*', $query);
Is there a way to achieve this in PHP? Thank you in advance.
EDIT / SOLUTION
Thanks to @revo suggestion to use the PHP function preg_replace_callback
, I managed to assign a replace pattern to each search pattern with the extended function preg_replace_callback_array
. Please note that this function needs PHP >= 7.
preg_replace_callback
preg_replace_callback_array
Here I post the final version of the function I use to perform a FULLTEXT
search via MATCH (...) AGAINST (...) IN BOOLEAN MODE
. The function is declared within the class dbReader
in a Wordpress Plugin. Maybe it can be useful for someone.
FULLTEXT
MATCH (...) AGAINST (...) IN BOOLEAN MODE
class dbReader
// Return maximum 100 ids of products matching $query in
// name or description searching for each word using MATCH AGAINST in BOOLEAN MODE
public function search_products($query) {
function replace_callback($m, $f) {
return sprintf($f, isset($m[1]) ? $m[1] : $m[0]);
}
// Replace simple quotes by double quotes in strings between quotes:
// iPhone '8 GB' => iPhone "8 GB"
// Apple's iPhone 8 '32 GB' => Apple's iPhone 8 "32 GB"
// This is necessary later when the matches are devided in two groups:
// 1. Strings not between double quotes
// 2. Strings between double quotes
$query = preg_replace("~(s*)'+([^']+)'+(s*)~", '$1"$2"$3', $query);
// Do some magic to take numbers with their units as one word
// iPhone 8 64 GB => iPhone 8 "64 GB"
$pattern = array(
'(b[.,0-9]+)s*(gbb)',
'(b[.,0-9]+)s*(mbb)',
'(b[.,0-9]+)s*(mmb)',
'(b[.,0-9]+)s*(mhzb)',
'(b[.,0-9]+)s*(ghzb)'
);
array_walk($pattern, function(&$value) {
// Surround with double quotes only if the user isn't doing manual grouping
$value = '~'.$value.'(?=(?:[^"]*"[^"]*")*[^"]*Z)~i';
});
$query = preg_replace($pattern, '"$1 $2"', $query);
// Prepare query string for a "match against" in "boolean mode"
$patterns = array(
// 1. All strings not sorrounded by double quotes
'~([^s"]+)(?=(?:[^"]*"[^"]*")*[^"]*Z)~' => function($m){
return replace_callback($m, '+%s*');
},
// 2. All strings between double quotes
'~"+([^"]+)"+~' => function($m){
return replace_callback($m, '+"%s"');
}
);
// Replace every single word by a boolean expression: +some* +word*
// Respect quoted strings: +"iPhone 8"
// preg_replace_callback_array needs PHP Version >= 7
$query = preg_replace_callback_array($patterns, $query);
$fulltext_fields = array(
'title' => array(
'importance' => 1.5,
'table' => 'p',
'fields' => array(
'field1',
'field2',
'field3',
'field4'
)
),
'description' => array(
'importance' => 1,
'table' => 'p',
'fields' => array(
'field5',
'field6',
'field7',
'field8'
)
)
);
$select_match = $match_full = $priority_order = "";
$args = array();
foreach ($fulltext_fields as $index => $obj) {
$match = $obj['table'].".".implode(", ".$obj['table'].".", $obj['fields']);
$select_match .= ", MATCH ($match) AGAINST (%s IN BOOLEAN MODE) AS {$index}_score";
$match_full .= ($match_full!=""?", ":"").$match;
$priority_order.= ($priority_order==""?"ORDER BY ":" + ")."({$index}_score * {$obj['importance']})";
array_push($args, $query);
}
$priority_order .= $priority_order!=""?" DESC":"";
// User input $query is passed as %s parameter to db->prepare() in order to avoid SQL injection
array_push($args, $query, $this->model_name, $this->view_name);
return $this->db->get_col(
$this->db->prepare(
"SELECT p.__pk $select_match
FROM ankauf_... AND
MATCH ($match_full) AGAINST (%s IN BOOLEAN MODE)
INNER JOIN ...
WHERE
m.bezeichnung=%s AND
a.bezeichnung=%s
$priority_order
LIMIT 100
;",
$args
)
);
}
preg_replace_callback
db->prepare("... AGAINST (%s IN BOOLEAN MODE) ...")
1 Answer
1
You have to use preg_replace_callback
:
preg_replace_callback
$str = '"hello world" how are you?';
echo preg_replace_callback('~("[^"]+")|S+~', function($m) {
return isset($m[1]) ? "+" . $m[1] : "+" . $m[0] . "*";
}, $str);
Output:
+"hello world" +how* +are* +you?*
Live demo
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
You need
preg_replace_callback
.– revo
Jul 17 at 16:35