satori code

|

Michael Trelinski's Technology Blog

Archive for the ‘Lucene’ Category

Lucene and PHP

Thursday, September 3rd, 2009

Today  I realized the strengths and speed of Lucene, but I also realized something else: Zend_Search_Lucene is painfully slow compared to Java’s implementation of it, and it requires a ton of memory to perform Lucene searches.  The ease of development, in this case, is not worth the hassle (and the memory leaks).

run time

The great alternative, albeit a pain in the ass to implement (due to debugging and not playing friendly with IDEs), is the PHP/Java Bridge method of doing a Lucene search.  I was able to perform a sample search against a Lucene index with Zend’s implemention in an average of 0.30 milliseconds (varying by 0.1 milliseconds) with a required 50mb of memory allotted to the PHP script on an index containing 29,000 documents.  I was able to swap out the Zend-specific objects and replace them with PHP/Java Bridge SOAP/Java methods.  I really didn’t think I had much to gain by doing this other than cutting down on my memory usage.  I thought just by the pure fact it had to make a socket connection and then make calls to the java Lucene jar file, I was going to lose time… I was totally wrong on this one.  To run the same query against the vanilla tomcat server running in local mode through the PHP/Java Bridge took a consistent 0.01 milliseconds with almost no variance.

server load

How do they compare side by side with a load test?  This one shocked me too – the Zend_Lucene_Search implementation would theoretically max out a 4gb server memory with 80 concurrent connections.  By today’s standard, a load of 80 concurrent users shouldn’t max out the server’s memory with an index size of 29,000 documents (with no extraneous binary storage) consuming ~15mb on the disk.  That is totally unacceptable.  The PHP/Java Bridge implementation survived well over a load of 1000 concurrent connections without maxing out the memory or the processor.  The only problem I ran into was “too many open files” which a “ulimit” modification (with a tomcat restart) solved.   However, I do not expect there to be thousands of connections to just this one server opened at the same time – I just wanted to see how much it could handle … and it turned out to be quite a lot!

overall

Zend_Search_Lucene PHP/Java Bridge with Lucene
Query Execution time 0.30 ms 0.01 ms
Memory usage High Low
Ease of Implementation Easy Moderate (difficult to debug)
Recommended Use Internal Apps/Speed of development Production Apps/Speed-critical applications