Skip to main content

mb_strtok implementation in PHP–String tokenizer for Multibyte

This is a simple function to implement some kind of mb_strtok() in PHP. As maybe you all are aware the mb_strtok function does not available for multibyte string (aka Unicode string). So this is my attempt to solve the problem. Anyway, there are bugs where the program halt if the input text is too long (how long? not sure yet). Maybe you could improve to provide better result.

Thank you and happy coding.

string-tokenizer-for-multibyte

The PHP code mb_strtok.php ;

<html>
<head>
<title>String token for MB_STRING</title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
</head>
<body>
<h1>String token for MB_STRING</h1>
<h2><a href="http://kerul.net">kerul.net</a></h2>

<form method="GET" ACTION="">
Input text <br>
<textarea name="txtinput" cols=30 rows=10></textarea>
<br>
<input type="submit" >
</form>

<?php
$in=$_GET["txtinput"];
$inputlen=mb_strlen($in, 'UTF-8');
echo ("Input length: $inputlen characters. <br>\n");

$tokens=mb_strtok(" /n/t?\'.", $in);
echo ("List of TOKENS<br>\n");
//echo $tokens;
for($i=0; $i<count($tokens); $i++){
echo ("[$i] -> ".$tokens[$i] ." <br> \n");
}

function mb_strtok($delimiters, $str=NULL)
{
static $pos = 0; // Keep track of the position on the string for each subsequent call.
static $string = "";
static $listtoken=array();
// If a new string is passed, reset the static parameters.
if($str!=NULL)
{
$pos = 0;
$string = $str;
}

// Initialize the token.
$token = "";

while ($pos < mb_strlen($string,'UTF-8'))//loop till end of input string
{

$char = mb_substr($string, $pos, 1);//fetch one character, pos = char position
$pos++;
//echo ("Char at $pos => $char <br>\n");//trace character at position

if(mb_strpos($delimiters, $char)===FALSE)//if character is not delimeter
{
$token .= $char;//put character in the token node
}
else
{
//if arrive at delimeter, push token to listtoken
array_push($listtoken, $token);
$token="";//clear the token node
}
}
// return the list of tokens
if ($listtoken!="")
{
return $listtoken;
}
else
{
return false;
}
}
?>
</body>
</html>


There is another one, this time the separator (.,;:) will be stored in the list of token (listtoken).


<html>
<head>
<title>String token for MB_STRING</title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
</head>
<body>
<h1>String token for MB_STRING</h1>
<h2><a href="http://kerul.net">kerul.net</a></h2>

<form method="GET" ACTION="">
Input text <br>
<textarea name="txtinput" cols=30 rows=10></textarea>
<br>
<input type="submit" >
</form>

<?php
$in=$_GET["txtinput"];
$inputlen=mb_strlen($in, 'UTF-8');
echo ("Input length: $inputlen characters. <br>\n");

$tokens=mb_strtok(" /n/t/f", $in);//delimeter by whitespace only
echo ("List of TOKENS<br>\n");
//echo $tokens;
for($i=0; $i<count($tokens); $i++){
echo ("[$i] -> ".$tokens[$i] ." <br> \n");
}

function mb_strtok($delimiters, $str=NULL)
{
static $pos = 0; // Keep track of the position on the string for each subsequent call.
static $string = "";
static $listtoken=array();
// If a new string is passed, reset the static parameters.
if($str!=NULL)
{
$pos = 0;
$string = $str;
}

// Initialize the token.
$token = "";

while ($pos < mb_strlen($string,'UTF-8'))//loop till end of input string
{

$char = mb_substr($string, $pos, 1, 'UTF-8');//fetch one character, pos = char position

echo ("Char at $pos => $char <br>\n");//trace character at position


if(mb_strpos($delimiters, $char)===FALSE)//if character is not delimeter
{
if($char=="." || $char==";"||$char==":"||$char==","){
echo "Token detected $token <br>\n";
array_push($listtoken, $char);
//$token="";//clear the token node
}else{
$token .= $char;//put character in the token node
}
}
else
{
//if arrive at delimeter, push token to listtoken
echo "Token detected $token <br>\n";
array_push($listtoken, $token);
$token="";//clear the token node
}
$pos++;
}
return $listtoken;
// return the list of tokens
if ($listtoken!="")
{
return $listtoken;
}
else
{
return false;
}

}
?>
</body>
</html>


mb_strtok implementation in PHP for Arabic unicode strings


Modified using the code by http://www.anastis.gr/mb_strtok-a-php-implementation/

Comments

Popular posts from this blog

Several English proverbs and the Malay pair

Or you could download here for the Malay proverbs app – https://play.google.com/store/apps/details?id=net.kerul.peribahasa English proverbs and the Malay pair Corpus Reference: Amir Muslim, 2009. Peribahasa dan ungkapan Inggeris-Melayu. DBP, Kuala Lumpur http://books.google.com.my/books/about/Peribahasa_dan_ungkapan_Inggeris_Melayu.html?id=bgwwQwAACAAJ CTRL+F to search Proverbs in English Definition in English Similar Malay Proverbs Definition in Malay 1 Where there is a country, there are people. A country must have people. Ada air adalah ikan. Ada negeri adalah rakyatnya. 2 Dry bread at home is better than roast meat home's the best hujan emas di negeri orang,hujan batu di negeri sendiri Betapa baik pun tempat orang, baik lagi tempat sendiri. 3 There's no accounting for tastes We can't assume that every people have a same feel Kepala sama hitam hati lain-lain. Dalam kehidupan ini, setiap insan berbeza cara, kesukaan, perangai, tabia

Contact Us at blog.kerul.net

Powered by EMF HTML Contact Form

Login JSON Android using Login Activity

I’ve been trying to release this tutorial quite a while. At last after a long hard effort. Since HttpClient is not supported any more in Android SDK 23, I have to resort to org.json.JSONObject and java.net.HttpURLConnection library to do online database with JSON. The objective of this tutorial is to log-in from a mobile client with the username and password stored in an online database facility. STEP 1: Create a new Android project, this time choose the LoginActivity .

Bootstrap Template for PHP database system - MyCompanyHR

HTML without framework is dull. Doing hard-coded CSS and JS are quite difficult with no promising result on cross platform compatibility. So I decided to explore BootStrap as they said it is the most popular web framework. What is BootStrap? - Bootstrap is the most popular HTML, CSS, and JavaScript framework for developing responsive, mobile-first web sites. (  http://www.w3schools.com/bootstrap/   ) Available here -  http://getbootstrap.com/ Why you need Flat-UI? Seems like a beautiful theme to make my site look professional. Anyway you could get variety of BootStrap theme out there, feel free to select here  http://bootstraphero.com/the-big-badass-list-of-twitter-bootstrap-resources/ Flat-UI is from DesignModo -   http://designmodo.com/flat/ Web Programming MyCompanyHR – PHP & MySQL mini project (with Boostrap HTML framework) Template 1: Template for the Lab Exercise. This is a project sample of a staff record management system. It has the PHP structured co

Pick a file using Intent.ACTION_GET_CONTENT

This tutorial is tested on Honeycomb 3.0 environment. In order to be used it in the lower level of Android API, I believe you could just change the target in the Android project. Screenshots Pick an image file using Intent.ACTION_GET_CONTENT The intention of this project is to choose an image file available in the device storage, and simply display it in an ImageView. The benefit of using Intent.ACTION_GET_CONTENT is you don’t have to develop open dialog box which is not available in the Android library. ACTION_GET_CONTENT with MIME type */* and category CATEGORY_OPENABLE -- Display all pickers for data that can be opened with ContentResolver.openInputStream() , allowing the user to pick one of them and then some data inside of it and returning the resulting URI to the caller. This can be used, for example, image chooser (as shown in this article).