Skip to main content

mb_strtok implementation in PHP–String tokenizer for Multibyte

This is a simple function to implement some kind of mb_strtok() in PHP. As maybe you all are aware the mb_strtok function does not available for multibyte string (aka Unicode string). So this is my attempt to solve the problem. Anyway, there are bugs where the program halt if the input text is too long (how long? not sure yet). Maybe you could improve to provide better result.

Thank you and happy coding.

string-tokenizer-for-multibyte

The PHP code mb_strtok.php ;

<html>
<head>
<title>String token for MB_STRING</title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
</head>
<body>
<h1>String token for MB_STRING</h1>
<h2><a href="http://kerul.net">kerul.net</a></h2>

<form method="GET" ACTION="">
Input text <br>
<textarea name="txtinput" cols=30 rows=10></textarea>
<br>
<input type="submit" >
</form>

<?php
$in=$_GET["txtinput"];
$inputlen=mb_strlen($in, 'UTF-8');
echo ("Input length: $inputlen characters. <br>\n");

$tokens=mb_strtok(" /n/t?\'.", $in);
echo ("List of TOKENS<br>\n");
//echo $tokens;
for($i=0; $i<count($tokens); $i++){
echo ("[$i] -> ".$tokens[$i] ." <br> \n");
}

function mb_strtok($delimiters, $str=NULL)
{
static $pos = 0; // Keep track of the position on the string for each subsequent call.
static $string = "";
static $listtoken=array();
// If a new string is passed, reset the static parameters.
if($str!=NULL)
{
$pos = 0;
$string = $str;
}

// Initialize the token.
$token = "";

while ($pos < mb_strlen($string,'UTF-8'))//loop till end of input string
{

$char = mb_substr($string, $pos, 1);//fetch one character, pos = char position
$pos++;
//echo ("Char at $pos => $char <br>\n");//trace character at position

if(mb_strpos($delimiters, $char)===FALSE)//if character is not delimeter
{
$token .= $char;//put character in the token node
}
else
{
//if arrive at delimeter, push token to listtoken
array_push($listtoken, $token);
$token="";//clear the token node
}
}
// return the list of tokens
if ($listtoken!="")
{
return $listtoken;
}
else
{
return false;
}
}
?>
</body>
</html>


There is another one, this time the separator (.,;:) will be stored in the list of token (listtoken).


<html>
<head>
<title>String token for MB_STRING</title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
</head>
<body>
<h1>String token for MB_STRING</h1>
<h2><a href="http://kerul.net">kerul.net</a></h2>

<form method="GET" ACTION="">
Input text <br>
<textarea name="txtinput" cols=30 rows=10></textarea>
<br>
<input type="submit" >
</form>

<?php
$in=$_GET["txtinput"];
$inputlen=mb_strlen($in, 'UTF-8');
echo ("Input length: $inputlen characters. <br>\n");

$tokens=mb_strtok(" /n/t/f", $in);//delimeter by whitespace only
echo ("List of TOKENS<br>\n");
//echo $tokens;
for($i=0; $i<count($tokens); $i++){
echo ("[$i] -> ".$tokens[$i] ." <br> \n");
}

function mb_strtok($delimiters, $str=NULL)
{
static $pos = 0; // Keep track of the position on the string for each subsequent call.
static $string = "";
static $listtoken=array();
// If a new string is passed, reset the static parameters.
if($str!=NULL)
{
$pos = 0;
$string = $str;
}

// Initialize the token.
$token = "";

while ($pos < mb_strlen($string,'UTF-8'))//loop till end of input string
{

$char = mb_substr($string, $pos, 1, 'UTF-8');//fetch one character, pos = char position

echo ("Char at $pos => $char <br>\n");//trace character at position


if(mb_strpos($delimiters, $char)===FALSE)//if character is not delimeter
{
if($char=="." || $char==";"||$char==":"||$char==","){
echo "Token detected $token <br>\n";
array_push($listtoken, $char);
//$token="";//clear the token node
}else{
$token .= $char;//put character in the token node
}
}
else
{
//if arrive at delimeter, push token to listtoken
echo "Token detected $token <br>\n";
array_push($listtoken, $token);
$token="";//clear the token node
}
$pos++;
}
return $listtoken;
// return the list of tokens
if ($listtoken!="")
{
return $listtoken;
}
else
{
return false;
}

}
?>
</body>
</html>


mb_strtok implementation in PHP for Arabic unicode strings


Modified using the code by http://www.anastis.gr/mb_strtok-a-php-implementation/

Comments

Popular posts from this blog

Several English proverbs and the Malay pair

Or you could download here for the Malay proverbs app – https://play.google.com/store/apps/details?id=net.kerul.peribahasa English proverbs and the Malay pair Corpus Reference: Amir Muslim, 2009. Peribahasa dan ungkapan Inggeris-Melayu. DBP, Kuala Lumpur http://books.google.com.my/books/about/Peribahasa_dan_ungkapan_Inggeris_Melayu.html?id=bgwwQwAACAAJ CTRL+F to search Proverbs in English Definition in English Similar Malay Proverbs Definition in Malay 1 Where there is a country, there are people. A country must have people. Ada air adalah ikan. Ada negeri adalah rakyatnya. 2 Dry bread at home is better than roast meat home's the best hujan emas di negeri orang,hujan batu di negeri sendiri Betapa baik pun tempat orang, baik lagi tempat sendiri. 3 There's no accounting for tastes We can't assume that every people have a same feel Kepala sama hitam hati lain-lain. Dalam kehidupan ini, setiap insan berbeza cara, kesukaan, perangai, tabia

Submit your blog address here

Create your own blog and send the address by submitting the comment of this article. Make sure to provide your full name, matrix and URL address of your blog. Refer to the picture below. Manual on developing a blog using blogger.com and AdSense, download here … Download Windows Live Writer (a superb offline blog post editor)

Applications of Web 2.0

Web 2.0 describes the changing trends in the use of World Wide Web technology and web design that aim to enhance creativity , secure information sharing, collaboration and functionality of the web. Web 2.0 concepts have led to the development and evolution of web-based communities and hosted services , such as social-networking sites , video sharing sites , wikis , blogs . Find a website or web application that conform to the criteria of Web 2.0. Put the name of the application and the URL in the comment below. Please provide your full name and matrix number. Make sure the application you choose is not already chosen by your friend in the previous comment.

ViewFlipper Example–a simple FlashCard

UPDATE: Improved with Fling gesture (Sept 2012) UPDATE: ViewFlipper with Flip-In and Flip-Out Animation (August 2012) This tutorial is to demonstrate the ViewFlipper layout that is almost similar to CardLayout (in Java). The app will produce a simple Flash card that provide several screens with different picture for each card. Flip-in and Flip-out animation provided. Added in Sept 2012 – an improvement to support Fling gesture – enjoy… The amendment is only on the coding part. Some how the layout design (main.xml) is quite long. Later I’ll produce separated screen by including several XML layout from outside files. Screenshots;

ASK kerul pls...

Ask me please, just click the button... Any question related to the web, PHP, database, internet, this blog, my papers, or anything. I'll answer to you as soon as I can, if related to my domain. If not I'll try to find out from Google and provide you a link that can help you solve a problem. Sincerely, kerul. ASK kerul pls... http://kerul.blogspot.com