ParserFactory. The
ParserFactory is a singleton that returns new query parsers.
You can create query parsers that inherit their settings from the factory or
query parsers with no starting settings.
ParserFactory parserFactory = ParserFactory.getInstance(new QsolConfiguration());
QsolParser parser = parserFactory.getParser(false);
Analyzer analyzer = new StandardAnalyzer();
parser.markDateField("date");
String query = "test serach";
Query result = null;
try {
result = parser.parse("allFields", query, analyzer);
} catch (QsolSyntaxException e) {
throw new RuntimeException(e);
}
System.out.println(result.toString());
Suggested Query: to use the suggested query funtion, QsolParser.getSuggestedSearch(), the SpellChecker jar from Lucene contrib must be on the classpath.
Find Replace: parser.addFindReplace(new FindReplace("NEAR", "~10", true,true));
Field Break Marker: parser.setFieldBreakMarker(String);
Thesaurus:
Set words = new HashSet();
words.add("test1");
words.add("test2"); words.add("test3");
parser.addThesaurusEntry("test", words, false);
Paragraph/Sentence Proximity Searching:
If you have enabled sentence and
paragraph proximity searching then the '~' operator may also be used as '~3p'
or '~5s' to perform paragraph and sentence proximity searches. Paragraph and
sentence proximity searching is implemented using special tokens that must be
put into the index at appropriate positions. It is up to you to inject the
tokens into the index and then identify them to the QsolParser with
setSentenceMarker(String marker) and
setParagraphMarker(String marker).
This allows queries like:
horse ~3p gopher
mark ~3 (cat ~3p (tommy ~1s gun))
Here is some example code that is a replacment for the StandardAnalyzer. Use this QsolAnalyzer the same way you would use the StandardAnalyzer class. QsolFilter will inject sentence and paragraph tokens into the index. QsolFilter will also recognize a FieldBreak Marker of ->->(A FieldBreak marker keeps proximity queries from crossing the marker) QsolAnalyzer uses a basic regular expression sentence recognizer and checks for paragraphs by keying on a pilcrow (U+00B6) symbol (expected to already be in the input text). To use this Analyzer, set the paragraph marker on the parser to the pilcrow and set the sentence marker to the section sign (U+00A7). You can modifiy these markers to something that suits your data by changing QsolTokenizer.jj and QsolFilter.java. The injection is pretty simple so don't be afraid to experiment.
Note: the sentence recognizer is not perfect and will find a sentence after Mr in the tokens: hello Mr. Miller. Because a sentence marker takes up a position, Mr is now within 2 of Miller instead of 1. If this is an issue, you could
look at QsolFilter around the line:
} else if (type == SENTENCE_TYPE) {
And before returning the token set the position increment to 0:
stok.setPositionIncrement(0); Of course this creates the problem that the sentence marker will be at the same place as the last word in a sentence -- making the last word in a sentence part of both the next and current sentence.
This Analyzer is a JavaCC analyzer based on the old Lucene StandardAnalyzer. If you are interested in a Jflex version (much faster), drop me a line.
