Lucene and PHP
September 3rd, 2009 at 6:21Today I realized the strengths and speed of Lucene, but I also realized something else: Zend_Search_Lucene is painfully slow compared to Java’s implementation of it, and it requires a ton of memory to perform Lucene searches. The ease of development, in this case, is not worth the hassle (and the memory leaks).
run time
The great alternative, albeit a pain in the ass to implement (due to debugging and not playing friendly with IDEs), is the PHP/Java Bridge method of doing a Lucene search. I was able to perform a sample search against a Lucene index with Zend’s implemention in an average of 0.30 milliseconds (varying by 0.1 milliseconds) with a required 50mb of memory allotted to the PHP script on an index containing 29,000 documents. I was able to swap out the Zend-specific objects and replace them with PHP/Java Bridge SOAP/Java methods. I really didn’t think I had much to gain by doing this other than cutting down on my memory usage. I thought just by the pure fact it had to make a socket connection and then make calls to the java Lucene jar file, I was going to lose time… I was totally wrong on this one. To run the same query against the vanilla tomcat server running in local mode through the PHP/Java Bridge took a consistent 0.01 milliseconds with almost no variance.
server load
How do they compare side by side with a load test? This one shocked me too – the Zend_Lucene_Search implementation would theoretically max out a 4gb server memory with 80 concurrent connections. By today’s standard, a load of 80 concurrent users shouldn’t max out the server’s memory with an index size of 29,000 documents (with no extraneous binary storage) consuming ~15mb on the disk. That is totally unacceptable. The PHP/Java Bridge implementation survived well over a load of 1000 concurrent connections without maxing out the memory or the processor. The only problem I ran into was “too many open files” which a “ulimit” modification (with a tomcat restart) solved. However, I do not expect there to be thousands of connections to just this one server opened at the same time – I just wanted to see how much it could handle … and it turned out to be quite a lot!
overall
| Zend_Search_Lucene | PHP/Java Bridge with Lucene | |
| Query Execution time | 0.30 ms | 0.01 ms |
| Memory usage | High | Low |
| Ease of Implementation | Easy | Moderate (difficult to debug) |
| Recommended Use | Internal Apps/Speed of development | Production Apps/Speed-critical applications |