<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>satori code &#187; Lucene</title>
	<atom:link href="http://satoricode.com/category/lucene/feed/" rel="self" type="application/rss+xml" />
	<link>http://satoricode.com</link>
	<description>Michael Trelinski&#039;s Technology Blog</description>
	<lastBuildDate>Fri, 26 Feb 2010 19:13:56 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Lucene and PHP</title>
		<link>http://satoricode.com/2009/09/03/lucene-and-php/</link>
		<comments>http://satoricode.com/2009/09/03/lucene-and-php/#comments</comments>
		<pubDate>Thu, 03 Sep 2009 10:21:17 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Lucene]]></category>
		<category><![CDATA[PHP]]></category>
		<category><![CDATA[Searching]]></category>

		<guid isPermaLink="false">http://satoricode.com/?p=9</guid>
		<description><![CDATA[Today  I realized the strengths and speed of Lucene, but I also realized something else:  Zend_Search_Lucene is painfully slow compared to Java&#8217;s implementation of it, and it requires a ton of memory to perform Lucene searches.  The ease of development, in this case, is not worth the hassle (and the memory leaks).
run time
The great [...]]]></description>
			<content:encoded><![CDATA[<p>Today  I realized the strengths and speed of <a title="Lucene" href="http://lucene.apache.org">Lucene</a>, but I also realized something else:  <a title="Zend_Search_Lucene" href="http://framework.zend.com/manual/en/zend.search.lucene.html" target="_blank">Zend_Search_Lucene</a> is painfully slow compared to Java&#8217;s implementation of it, and it requires a ton of memory to perform Lucene searches.  The ease of development, in this case, is not worth the hassle (and the memory leaks).</p>
<p><strong>run time</strong></p>
<p>The great alternative, albeit a pain in the ass to implement (due to debugging and not playing friendly with IDEs), is the <a title="PHP/Java Bridge" href="http://php-java-bridge.sourceforge.net/pjb/" target="_blank">PHP/Java Bridge</a> method of doing a <a title="Lucene search" href="http://php-java-bridge.sourceforge.net/pjb/examples/source.php?source=lucene_search-old.php" target="_blank">Lucene search</a>.  I was able to perform a sample search against a Lucene index with Zend&#8217;s implemention in an average of 0.30 milliseconds (varying by 0.1 milliseconds) with a required 50mb of memory allotted to the PHP script on an index containing 29,000 documents.  I was able to swap out the Zend-specific objects and replace them with PHP/Java Bridge SOAP/Java methods.  I really didn&#8217;t think I had much to gain by doing this other than cutting down on my memory usage.  I thought just by the pure fact it had to make a socket connection and then make calls to the java Lucene jar file, I was going to lose time&#8230; I was totally wrong on this one.  To run the same query against the vanilla tomcat server running in local mode through the PHP/Java Bridge took a consistent 0.01 milliseconds with almost no variance.</p>
<p><strong>server load</strong></p>
<p>How do they compare side by side with a load test?  This one shocked me too &#8211; the Zend_Lucene_Search implementation would theoretically max out a 4gb server memory with 80 concurrent connections.  By today&#8217;s standard, a load of 80 concurrent users shouldn&#8217;t max out the server&#8217;s memory with an index size of 29,000 documents (with no extraneous binary storage) consuming ~15mb on the disk.  That is totally unacceptable.  The PHP/Java Bridge implementation survived well over a load of 1000 concurrent connections without maxing out the memory or the processor.  The only problem I ran into was &#8220;too many open files&#8221; which a &#8220;ulimit&#8221; modification (with a tomcat restart) solved.   However, I do not expect there to be thousands of connections to just this one server opened at the same time &#8211; I just wanted to see how much it could handle &#8230; and it turned out to be quite a lot!</p>
<p><strong>overall</strong></p>
<table border="1" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td width="148" valign="top"></td>
<td width="148" valign="top">Zend_Search_Lucene</td>
<td width="148" valign="top">PHP/Java Bridge with Lucene</td>
</tr>
<tr>
<td width="148" valign="top">Query Execution time</td>
<td width="148" valign="top">0.30 ms</td>
<td width="148" valign="top">0.01 ms</td>
</tr>
<tr>
<td width="148" valign="top">Memory usage</td>
<td width="148" valign="top">High</td>
<td width="148" valign="top">Low</td>
</tr>
<tr>
<td width="148" valign="top">Ease of Implementation</td>
<td width="148" valign="top">Easy</td>
<td width="148" valign="top">Moderate (difficult to debug)</td>
</tr>
<tr>
<td width="148" valign="top">Recommended Use</td>
<td width="148" valign="top">Internal Apps/Speed of development</td>
<td width="148" valign="top">Production Apps/Speed-critical applications</td>
</tr>
</tbody>
</table>
]]></content:encoded>
			<wfw:commentRss>http://satoricode.com/2009/09/03/lucene-and-php/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
