Lucene

Lucene is the search engine leader in open source. But java is only a part of avialable technology in opensource. Other languages tried to implement lucene in their native version. Most of them failed, Perl or Python very deadly slow vs java version. Python cheats with a swig binding of GCJ compiled version of Lucene.

Search server

The clean way seems to be a neutral communication between java and the outer world.

Solr do that with a classical servlet/XML communication. Thrift use this tactic too, with a C++ search engine and php/python as client.

REST

Using a REST like communication is a good idea in a dangerous world like Internet, but in a private LAN, it's not a such good idea. Not persistant communication is good idea to limit the number of opened socket, but with event driven server and friendly clients, using one socket per session can be more simple and efficiant.

Event driven server

Mina gives us a framework to build event driven network server. HTTP can be done with a sub project, but Plain Old Socket is available too. This is the return of telnet.

JSON-RPC

JSON-rpc is a tiny specification to build the simplest RPC system. Over http, or over socket. JSON is now the more simple and useful serialization in most of languages.

Passerelle

I build a little project to bring object approach to json-rpc : Passerelle (small bridge in french). The basic idea is very simple, with just a simple notation, we can bring object call via procedural notation :

instance@class#method

The server keep instances in a map and throw them out when the session ended. Proxy and dynamic method gives simple access to distant object. In the future, local object will be generated from the server version, with useful stuff like lazy fetching, iterator and other tools.

Clients (now, PHP and Python) only use native code, no compiler needed.

Goniometre

This was the basic idea, now, a real world example : Indexing and searching with Compass from the scripting world : project Goniometre (a goniometer is an optical instrument).

Spring is the glue to build java project. It can be used as an ultimate configuration language.

A simple test

The config files is not so ugly. server is the mina server, with a port, beanFactory dispatch bean usable from clients, compass is a classical compass object, with link to other xml files : Book.cpm.xml and library.cmd.xml, Compass loves xml.

From php, it's easier :

Book is a basic object, with the property described in Book.cpm.xml. First, we create a new book, indexing it, and search twice.

[php]
include('../../src/php/class.passerelle.php');
include('../../src/php/class.proxy.php');
initPasserelle('127.0.0.1', 8042);

class Book {
	public $alias="book";
	public $id = 42;
	function __construct($title, $summary) {
		$this->title = $title;
		$this->summary = $summary;
	}
}

$session = new Proxy('session');
$book = new Book("PHP for dummies", "");
$session->saveOrUpdate($book);
$session->close();

$session = new Proxy('session');
$hits = $session->find("php");
var_dump($hits);

$hits = $session->find("java");
var_dump($hits);

This is a basic example, alias restriction (search only book, not author), pagination and all compass magic stuff.