Jul
07
2017
PHP Sorting Iterators

Iterator

a few months ago, I wrote an article on RecursiveDirectoryIterator, you can find the article here: PHP Recursive Directory File Listing . If you run the code example, you ‘ll see that the output is not sorted.

Object

Recursive Iterator is actually an object, a special object that we can perform iterations on sequence (collection) of data. So it is a little difficult to sort them using known php functions. Let me give you an example:

$Iterator = new RecursiveDirectoryIterator('./');
foreach ($Iterator as $file)
    var_dump($file);
object(SplFileInfo)#7 (2) {
  ["pathName":"SplFileInfo":private]=>
  string(12) "./index.html"
  ["fileName":"SplFileInfo":private]=>
  string(10) "index.html"
}

You see here, the iterator is an object of SplFileInfo class.

Internet Answers

Unfortunately stackoverflow and other related online results provide the most complicated answers on this matter. Of course this is not stackoverflow’s error, and it is really a not easy subject to discuss or understand, but personally I dont get the extra fuzz (complexity) on some of the responses.

Back to basics

So let us go back a few steps and understand what an iterator really is. An iterator is an object that we can iterate! That means we can use a loop to walk through the data of an iterator. Reading the above output you can get (hopefully) a better idea.

We can also loop the Iterator as a simply array.

eg.

$It = new RecursiveDirectoryIterator('./');
foreach ($It as $key=>$val)
    echo $key.":".$val."n";

output:

./index.html:./index.html

Arrays

It is difficult to sort Iterators, but it is really easy to sort arrays!
We just need to convert the Iterator into an Array:

// Copy the iterator into an array
$array = iterator_to_array($Iterator);

that’s it!

Sorting

For my needs I need to reverse sort the array by key (filename on a recursive directory), so my sorting looks like:

krsort( $array );

easy, right?

Just remember that you can use ksort before the array is already be defined. You need to take two steps, and that is ok.

Convert to Iterator

After sorting, we need to change back an iterator object format:

// Convert Array to an Iterator
$Iterator = new ArrayIterator($array);

and that’s it !

Full Code Example

the entire code in one paragraph:

<?php
// ebal, Fri, 07 Jul 2017 22:01:48 +0300

// Directory to Recursive search
$dir = "/tmp/";

// Iterator Object
$files =  new RecursiveIteratorIterator(
          new RecursiveDirectoryIterator($dir)
          );

// Convert to Array
$Array = iterator_to_array ( $files );
// Reverse Sort by key the array
krsort ( $Array );
// Convert to Iterator
$files = new ArrayIterator( $Array );

// Print the file name
foreach($files as $name => $object)
    echo "$namen";

?>
Tag(s): php, iterator
Jan
29
2017
PHP Recursive Directory File Listing

Iterators

In recent versions of PHP, there is an iterator that you can use for recursively go through a directory. The name of this iterator is RecursiveDirectoryIterator and below is a simple test use:


  1 <?php
  2
  3     $Contentpath = realpath('/tmp/');
  4     $Directory = new RecursiveDirectoryIterator($Contentpath);
  5     $Iterator  = new RecursiveIteratorIterator($Directory);
  6
  7     foreach($Iterator as $name => $object){
  8         echo "$name\n";
  9     }
 10
 11 ?>

the result is something like this:


# php test.php
/tmp/.
/tmp/..
/tmp/sess_td0p1cuohquk966fkit13fhi36
/tmp/sess_et3360aidupdnnifct0te2kr31
/tmp/sess_44rrgbn1em051u64bm49c6pmd2
/tmp/sess_42f9e0mhps120a72kco9nsbn81
/tmp/fresh.log
/tmp/.ICE-unix/.
/tmp/.ICE-unix/..

Filter

One of the benefits of this iterator, is that you can extend the RecursiveFilterIterator class to filter out unwanted values. Here is an example of the extend:


<?php
    $Contentpath = realpath('./');
    $Directory = new RecursiveDirectoryIterator($Contentpath);

    class MyRecursiveFilterIterator extends RecursiveFilterIterator {
        public function accept() {
            return $this->current()->getFilename();
        }
    }   

    $MyFilter  = new MyRecursiveFilterIterator($Directory);
    $Iterator  = new RecursiveIteratorIterator($MyFilter);

    foreach($Iterator as $name => $object){
        echo "$name\n";
    }

?>

at the above example, we did not exclude or filter anything.
But our RecursiveIteratorIterator is now passing through our MyRecursiveFilterIterator !

TXT

Let’s filter out everything, but text files.


  1 <?php
  2     $Contentpath = realpath('./');
  3     $Directory = new RecursiveDirectoryIterator($Contentpath);
  4
  5     class MyRecursiveFilterIterator extends RecursiveFilterIterator {
  6         public function accept() {
  7             $file_parts = pathinfo($this->current()->getFilename());
  8
  9             if ( $file_parts['extension'] == 'txt' ) {
 10                 return $this->current()->getFilename();
 11             }
 12
 13         }
 14     }
 15
 16     $MyFilter = new MyRecursiveFilterIterator($Directory);
 17     $Iterator = new RecursiveIteratorIterator($MyFilter);
 18
 19     foreach($Iterator as $name => $object){
 20         echo "$name\n";
 21     }
 22 ?>

There is a little caveat on the above example !

Seems that the above piece of code is working just fine for a specific directory, but when you are running it against a recursive directory, you are going to have errors like the below one:


PHP Notice:  Undefined index: extension

and that’s why pathinfo will also run against directories !!!

Directories

So, we need to exclude - filter out all the directories:


  1 <?php
  2     $Contentpath = realpath('./');
  3     $Directory = new RecursiveDirectoryIterator($Contentpath);
  4
  5     class MyRecursiveFilterIterator extends RecursiveFilterIterator {
  6         public function accept() {
  7
  8             if ( $this->current()->isDir() )
  9                 return true;
 10
 11              $file_parts = pathinfo($this->current()->getFilename());
 12
 13             if ( $file_parts['extension'] == 'txt' ) {
 14                 return $this->current()->getFilename();
 15             }
 16
 17         }
 18     }
 19
 20     $MyFilter = new MyRecursiveFilterIterator($Directory);
 21     $Iterator = new RecursiveIteratorIterator($MyFilter);
 22
 23     foreach($Iterator as $name => $object){
 24         echo "$name\n";
 25     }
 26 ?>

pretty close.

Dots

Pretty close indeed, but we are not excluding the DOT directories:


.
..

FilesystemIterator

From the FilesystemIterator class we learn that there is a flag that does that:

const integer SKIP_DOTS = 4096 ;

and you can use it on RecursiveDirectoryIterator as the recursive directory iterator is actually an extend of FilesystemIterator

 RecursiveDirectoryIterator extends FilesystemIterator implements SeekableIterator , RecursiveIterator 

so our code is transforming to this one:


  1 <?php
  2     $Contentpath = realpath('./');
  3     $Directory = new RecursiveDirectoryIterator($Contentpath,RecursiveDirectoryIterator::SKIP_DOTS);
  4
  5     class MyRecursiveFilterIterator extends RecursiveFilterIterator {
  6         public function accept() {
  7
  8             if ( $this->current()->isDir() )
  9                 return true;
 10
 11             $file_parts = pathinfo($this->current()->getFilename());
 12
 13             if ( $file_parts['extension'] == 'txt' ) {
 14                 return $this->current()->getFilename();
 15             }
 16
 17         }
 18     }
 19
 20     $MyFilter = new MyRecursiveFilterIterator($Directory);
 21     $Iterator = new RecursiveIteratorIterator($MyFilter);
 22
 23     foreach($Iterator as $name => $object){
 24         echo "$name\n";
 25     }
 26 ?>

That’s It !

Aug
04
2016
Open compressed file with gzip zcat perl php lua python

I have a compressed file of:


250.000.000 lines
Compressed the file size is: 671M
Uncompressed, it's: 6,5G

Need to extract a plethora of things and verify some others.

I dont want to use bash but something more elegant, like python or lua.

Looking through “The-Internet”, I’ve created some examples for the single purpose of educating my self.

So here are my results.
BE AWARE they are far-far-far away from perfect in code or execution.

Sorted by (less) time of execution:

pigz

pigz - Parallel gzip - Zlib



# time pigz  -p4 -cd  2016-08-04-06.ldif.gz &> /dev/null 

real    0m9.980s
user    0m16.570s
sys 0m0.980s

gzip

gzip 1.8



# time /bin/gzip -cd 2016-08-04-06.ldif.gz &> /dev/null

real    0m23.951s
user    0m23.790s
sys 0m0.150s

zcat

zcat (gzip) 1.8



# time zcat 2016-08-04-06.ldif.gz &> /dev/null

real    0m24.202s
user    0m24.100s
sys 0m0.090s

Perl

Perl v5.24.0

code:



#!/usr/bin/perl

open (FILE, '/bin/gzip -cd 2016-08-04-06.ldif.gz |');

while (my $line = ) {
  print $line;
}

close FILE;

time:


# time ./dump.pl &> /dev/null

real    0m49.942s
user    1m14.260s
sys 0m2.350s

PHP

PHP 7.0.9 (cli)

code:


#!/usr/bin/php

< ? php

  $fp = gzopen("2016-08-04-06.ldif.gz", "r");

  while (($buffer = fgets($fp, 4096)) !== false) {
        echo $buffer;
  }

  gzclose($fp);

 ? >

time:


# time php -f dump.php &> /dev/null

real    1m19.407s
user    1m4.840s
sys 0m14.340s

PHP - Iteration #2

PHP 7.0.9 (cli)

Impressed with php results, I took the perl-approach on code:



< ? php

  $fp = popen("/bin/gzip -cd 2016-08-04-06.ldif.gz", "r");

  while (($buffer = fgets($fp, 4096)) !== false) {
        echo $buffer;
  }

  pclose($fp);

 ? >

time:


# time php -f dump2.php &> /dev/null 

real    1m6.845s
user    1m15.590s
sys 0m19.940s

not bad !

Lua

Lua 5.3.3

code:


#!/usr/bin/lua

local gzip = require 'gzip'

local filename = "2016-08-04-06.ldif.gz"

for l in gzip.lines(filename) do
  print(l)
end

time:


# time ./dump.lua &> /dev/null

real    3m50.899s
user    3m35.080s
sys 0m15.780s

Lua - Iteration #2

Lua 5.3.3

I was depressed to see that php is faster than lua!!
Depressed I say !

So here is my next iteration on lua:

code:


#!/usr/bin/lua

local file = assert(io.popen('/bin/gzip -cd 2016-08-04-06.ldif.gz', 'r'))

while true do
        line = file:read()
        if line == nil then break end
        print (line)
end
file:close()

time:


# time ./dump2.lua &> /dev/null 

real    2m45.908s
user    2m54.470s
sys 0m21.360s

One minute faster than before, but still too slow !!

Lua - Zlib

Lua 5.3.3

My next iteration with lua is using zlib :

code:



#!/usr/bin/lua

local zlib = require 'zlib'
local filename = "2016-08-04-06.ldif.gz"

local block = 64
local d = zlib.inflate()

local file = assert(io.open(filename, "rb"))
while true do
  bytes = file:read(block)
  if not bytes then break end
  print (d(bytes))
end

file:close()

time:



# time ./dump.lua  &> /dev/null 

real    0m41.546s
user    0m40.460s
sys 0m1.080s

Now, that's what I am talking about !!!

Playing with window_size (block) can make your code faster or slower.

Python v3

Python 3.5.2

code:


#!/usr/bin/python

import gzip

filename='2016-08-04-06.ldif.gz'
with gzip.open(filename, 'r') as f:
    for line in f:
        print(line,)

time:


# time ./dump.py &> /dev/null

real    13m14.460s
user    13m13.440s
sys 0m0.670s

Not enough tissues on the whole damn world!

Python v3 - Iteration #2

Python 3.5.2

but wait ... a moment ... The default mode for gzip.open is 'rb'.
(read binary)

let's try this once more with rt(read-text) mode:

code:


#!/usr/bin/python

import gzip

filename='2016-08-04-06.ldif.gz'
with gzip.open(filename, 'rt') as f:
    for line in f:
        print(line, end="")

time:


# time ./dump.py &> /dev/null 

real    5m33.098s
user    5m32.610s
sys 0m0.410s

With only one super tiny change and run time in half!!!
But still tooo slow.

Python v3 - Iteration #3

Python 3.5.2

Let's try a third iteration with popen this time.

code:


#!/usr/bin/python

import os

cmd = "/bin/gzip -cd 2016-08-04-06.ldif.gz"
f = os.popen(cmd)
for line in f:
  print(line, end="")
f.close()

time:


# time ./dump2.py &> /dev/null 

real    6m45.646s
user    7m13.280s
sys 0m6.470s

Python v3 - zlib Iteration #1

Python 3.5.2

Let's try a zlib iteration this time.

code:



#!/usr/bin/python

import zlib

d = zlib.decompressobj(zlib.MAX_WBITS | 16)
filename='2016-08-04-06.ldif.gz'

with open(filename, 'rb') as f:
    for line in f:
        print(d.decompress(line))

time:


# time ./dump.zlib.py &> /dev/null 

real    1m4.389s
user    1m3.440s
sys 0m0.410s

finally some proper values with python !!!

Specs

All the running tests occurred to this machine:


4 x Intel(R) Core(TM) i3-3220 CPU @ 3.30GHz
8G RAM

Conclusions

Ok, I Know !

The shell-pipe approach of using gzip for opening the compressed file, is not fair to all the above code snippets.
But ... who cares ?

I need something that run fast as hell and does smart things on those data.

Get in touch

As I am not a developer, I know that you people know how to do these things even better!

So I would love to hear any suggestions or even criticism on the above examples.

I will update/report everything that will pass the "I think I know what this code do" rule and ... be gently with me ;)

PLZ use my email address: evaggelos [ _at_ ] balaskas [ _dot_ ] gr

to send me any suggestions

Thanks !

Tag(s): php, perl, python, lua, pigz
Jun
09
2015
PHP rants

-or how i spent a morning fixing something that didnt need fixing !!!

 

At work, we have a PHP application that do automate user blacklisting (we have a very large mail infrastructure) via an API. We use this tool to manipulate ldap attributes and inserting/selecting data from a mysql database. Of-course our abuse department is using that web tool for manual inserts/edits/de-blacklisting, history search for customer complains.

 

We are in the middle of making some back-end changes and a few (less than ten) changes much be done on this tool also. Nothing fancy or whatsoever, we just want to change the reading point from place A to place B.

 

Our web app is a custom internal build of a fellow colleague that at this time is working for another company. So I take charge to this easy and simple task.

 

Five minutes later and all the changes were made. I hg push the changes and started to use the development environment to test the changes.

And boom.jpg nothing is working !!!!

What-the-feck ?

Did a hg diff and see the SEVEN (7) tiny changes on the code.

To clear some things up, the changes was in the below form:


// read from ldap the attribute Profile
$attr_old = array ("Profile" );

// write to mysql the value of Profile
$old_profile = $entries [$i] ["Profile"] [0];

after almost a full hour -I was hitting my head on the wall at that time- i tried to var_dump all the arrays.

And WHAT I see, was unreal !!!

The code is reading the ldap attribute: Profile from the ldap as Profile.

BUT

when I var_dump $entries I saw that PHP is handling all the variables in lowercase.


so Profile is becoming profile

I still dont know/understand whys is this happening!
I just did two more tiny changes, so that mysql is now inserting


$entries [$i] ["profile"] [0];

and not the wrong one:


$entries [$i] ["Profile"] [0];

and everything is OK now.

Tag(s): php
Dec
24
2013
Failures on update

A colleague of mine wants to add a new vhost on one of our apache web servers.

Running:


  /etc/init.d/httpd configtest 

he noticed that php_admin_flag had produced an error msg. We comment this flag out and tried to restart the web server. Unfortunately the httpd didnt came up.

Searching through logs I’ve seen these:


Dec 14 14:33:54 Erased: php-snmp
Dec 14 14:33:54 Erased: php-mbstring
Dec 14 14:33:54 Erased: php-pear
Dec 14 14:33:55 Erased: php-common
Dec 14 14:33:55 Erased: php-mcrypt
Dec 14 14:33:55 Erased: php-gd
Dec 14 14:33:55 Erased: php-mysql
Dec 14 14:33:55 Erased: php-cli
Dec 14 14:33:55 Erased: php-pgsql
Dec 14 14:33:55 Erased: php-ldap
Dec 14 14:33:55 Erased: php
Dec 14 14:33:55 Erased: php-devel
Dec 14 14:33:56 Erased: php-pdo
Dec 14 14:34:17 Installed: php53-common-5.3.3-22.el5_10.x86_64
Dec 14 14:34:17 Installed: php53-pdo-5.3.3-22.el5_10.x86_64
Dec 14 14:34:27 Installed: libc-client-2004g-2.2.1.x86_64
Dec 14 14:34:28 Installed: php53-mcrypt-5.3.3-1.el5.x86_64
Dec 14 14:34:28 Installed: php53-mysql-5.3.3-22.el5_10.x86_64
Dec 14 14:34:28 Installed: php53-ldap-5.3.3-22.el5_10.x86_64
Dec 14 14:34:28 Installed: php53-mbstring-5.3.3-22.el5_10.x86_64
Dec 14 14:34:28 Installed: php53-gd-5.3.3-22.el5_10.x86_64
Dec 14 14:34:28 Installed: php53-xml-5.3.3-22.el5_10.x86_64
Dec 14 14:34:28 Installed: php53-imap-5.3.3-22.el5_10.x86_64
Dec 14 14:34:28 Installed: php53-snmp-5.3.3-22.el5_10.x86_64
Dec 14 14:34:28 Installed: php53-pgsql-5.3.3-22.el5_10.x86_64
Dec 14 14:34:28 Installed: php53-cli-5.3.3-22.el5_10.x86_64

If you havent noticed the horror yet let me explain it to you:

There is NO php on the system!

A couple weeks ago, another colleague did a not so successfully update on this server.

blah blah blah
blah blah blah
blah blah blah

and story told short:


yum install php53.x86_64

worked it’s magic.

So keep it in mind that after yum update, you have to do manual restarts on the running services and check that everything works properly OR someone like me, will try to destroy your Christmas plans as a revenge !

Tag(s): centos, apache, php, update
Sep
02
2011
How a memory leak can destroy your evening.

Yesterday evening i had the pleasure to watch my apache crashing till the entire memory of my vps server was been consumed.
I had the opportunity to see a memory leak and drink a couple of beers among good friends.
Friends that can support you (psychological) till you find the bug (is it?) and fix it.

So lets begin our journey:

My blog engine (flatpress) has a identi.ca/twitter plugin for posting entries on my blog.
I’ve connected it with my identi.ca account and i ‘ve done a little hack to add a microblogging category to separate my rss feed from my blogging rss feed (category=1)

So the main problem was(is) that the identica.png image doesnt get the correct file path from php variable.
It should be something like that:

blog/fp-plugins/identicaconnect/res/identica.png

but it seems to be:

https://balaskas.gr/blog/https://balaskas.gr/blog/blog/fp-plugins/identicaconnect/res/identica.png

That would be easy to fix, right?
That was what i thought too.

But in the process or fixing it, i saw the below error on my apache logs:

“PHP Notice: Undefined index: PATH_INFO”

I fired up my php.info page and saw that there wasnt any value for the $_SERVER[’PATH_INFO’]
In fact there wasnt any $_SERVER[’PATH_INFO’] in PHP Variables !!!

WTF ?

I was searching for an answer on google and i was noticing that my site was inaccessible.

pgrep httpd | wc -l

showed me about 200 apache threads and rising really fast.

dmesg complaint about resource and at that moment my vps crashed for the first time with a memory leak in console !!!

My previous apache installation was : httpd 2.0.64 + php-5.3.3 + suhosin-patch-5.3.3-0.9.10.patch + mod_evasive + eaccelerator-0.9.6.1 and my apache custom compilation options were:

./configure     
        --enable-dav 
        --enable-rewrite 
        --enable-ssl 
        --enable-so 
        --enable-proxy 
        --enable-headers 
        --enable-deflate 
        --enable-cache 
        --enable-disk-cache

my php compilation options were:

./configure  
        --with-zlib     
        --with-openssl  
        --with-gd       
        --enable-mbstring 
        --with-apxs2=/usr/local/apache2/bin/apxs 
        --with-mysql 
        --with-mcrypt 
        --with-curl

When i saw the memory leak, my first (and only) thought was: killapache.pl !

In a heartbeat, i was compiling httpd-2.2.20 + php-5.3.8 + suhosin-patch-5.3.7-0.9.10.patch + eaccelerator-0.9.6.1 + mod_evasive, i had moved my /usr/local/apache2 folder to apache2.bak and installed the newest (and hopefully most secure) version of apache & php.

I have pretty well document all of my installations process and i am keeping comments for every line in configuration files i have ever changed. So to setup up httpd 2.2.20 was in indeed a matter of minutes.

I was feeling lucky and confident.

I started apache and fired up my blog.
I was tailing error logs too.

BUM !!!!

apache had just crashed again !!!!

WTF^2 ?

How can a null php variable, crash apache with a memory leak and open about a million threads?
After debugging it, i fix it by just putting an isset function in front of $_SERVER[’PATH_INFO’] php variable !!!!

Too much trouble to fix (i didnt) the path of an image in my blog.

So my question is this:

  • Is this an apache bug ?
  • Is this a php bug ? or
  • Is it a software bud (flatpress) ?
Tag(s): httpd, php