Today I realized the strengths and speed of Lucene, but I also realized something else: Zend_Search_Lucene is painfully slow compared to Java’s implementation of it, and it requires a ton of memory to perform Lucene searches. The ease of development, in this case, is not worth the hassle (and the memory leaks).
The great alternative, albeit a pain in the ass to implement (due to debugging and not playing friendly with IDEs), is the PHP/Java Bridge method of doing a Lucene search. I was able to perform a sample search against a Lucene index with Zend’s implemention in an average of 0.30 milliseconds (varying by 0.1 milliseconds) with a required 50mb of memory allotted to the PHP script on an index containing 29,000 documents. I was able to swap out the Zend-specific objects and replace them with PHP/Java Bridge SOAP/Java methods. I really didn’t think I had much to gain by doing this other than cutting down on my memory usage. I thought just by the pure fact it had to make a socket connection and then make calls to the java Lucene jar file, I was going to lose time… I was totally wrong on this one. To run the same query against the vanilla tomcat server running in local mode through the PHP/Java Bridge took a consistent 0.01 milliseconds with almost no variance.
How do they compare side by side with a load test? This one shocked me too – the Zend_Lucene_Search implementation would theoretically max out a 4gb server memory with 80 concurrent connections. By today’s standard, a load of 80 concurrent users shouldn’t max out the server’s memory with an index size of 29,000 documents (with no extraneous binary storage) consuming ~15mb on the disk. That is totally unacceptable. The PHP/Java Bridge implementation survived well over a load of 1000 concurrent connections without maxing out the memory or the processor. The only problem I ran into was “too many open files” which a “ulimit” modification (with a tomcat restart) solved. However, I do not expect there to be thousands of connections to just this one server opened at the same time – I just wanted to see how much it could handle … and it turned out to be quite a lot!
|Zend_Search_Lucene||PHP/Java Bridge with Lucene|
|Query Execution time||0.30 ms||0.01 ms|
|Ease of Implementation||Easy||Moderate (difficult to debug)|
|Recommended Use||Internal Apps/Speed of development||Production Apps/Speed-critical applications|