OrcaTec Granted Patent on Language Modeling Concept Search

Share Article

Unique approach makes eDiscovery and information retrieval smarter, more effective.

OrcaTec Logo No Circle


The ideas embodied in this patent are the building blocks that make the OrcaTec Document Decisioning Suite work so well for predictive coding, advanced analytics and review as well as search.

The recognized father of concept search, Herbert L. Roitblat Ph.D., today received another patent for his unique vision, this one with OrcaTec for the use of language modeling as a basis for concept search. Concept search seeks out ideas in context in large data collections rather than using the more common but notoriously less effective keyword search method.

Roitblat, CTO and Chief Scientist for OrcaTec and the architect for the OrcaTec Document Decisioning Suite™, built the Suite using the language modeling on which US Patent No. 8,401,841 is based. “OrcaTec’s predictive coding, advanced analytics, review – all the pieces of the Document Decisioning Suite are based on language modeling and using words in context,” said Dr. Roitblat. “The ideas embodied in this patent are the building blocks that make our Suite work so well. We are gratified that the US Patent Office finds it to be markedly unique.”

One of OrcaTec’s commonly used examples of concept search in context is the word “court.” “If you hear the word ‘court,’ you don’t know what that means,” says Roitblat. “If you hear ‘court, blah, blah, basketball’ or ‘court, blah, blah, judge,’ then you know what ‘court’ means. We use the same kind of process to understand what words mean in the documents that we’ve indexed. This eliminates the ambiguity you have in keyword search.”

Language modeling also uses a fill-in-the-blank probabilistic approach. In the patent application, Dr. Roitblat noted individual documents contain only a limited and fixed set of words, but there are often many other words that could have appeared in the same place. People write sentences like “The boy skateboarded down the street.” People do not write sentences like “The [boy, child, youth, young person, kid] skateboarded down the street [alley, road, parkway, boulevard, blacktop].” The language modeling method recognizes that any of the words in brackets could have been used, but in any particular sentence only one of the words is used.

A given query and a given document typically use only one of the alternatives from the distribution of words, but it is difficult to predict which one. As a result, OrcaTec’s language modeling method searches for a distribution of words, rather than just the specific words in the query.

Additionally, the language modeling method of concept search that Dr. Roitblat has patented does not use dictionaries, ontologies or thesauri. Instead it analyzes the way words are used in the context of the entire document set being reviewed by using the probability of word co-occurrence within a paragraph. Thus, it is language agnostic, including non-alphabetic languages.

“OrcaTec has successfully used our concept search and predictive coding in many languages, including Arabic, German, Japanese and, of course, English,” he said.

“We are very pleased that the ideas upon which OrcaTec has based its product development have been found to be unique and protection-worthy. And we believe profoundly that our approach to information retrieval makes document decisioning smarter, faster and more accurate.”

About OrcaTec
Atlanta-based OrcaTec is creating the Ultimate Decisioning Machine by combining all-in-one smarter predictive coding, advanced analytics and Computer-Assisted Review (CAR). In the recent Global Aerospace* case, OrcaTec’s predictive coding was the first ever court-ordered predictive coding to pass judicial muster. Beyond keyword searching, the concept-based OrcaTec Document Decisioning Suite™ takes eDiscovery from data ingestion through document production. See how OrcaTec can cut first-pass review time from weeks or months to just days with demonstrably high levels of accuracy and transparency at http://www.OrcaTec.com, or by calling 888-335-2200 x 2.

*Global Aerospace Inc. v. Landow Aviation Limited Partnership, et
al., No. 61040 in the 20th Judicial Circuit of Virginia’s Loudoun Circuit Court

Share article on social media or email:

View article via:

Pdf Print

Contact Author

Tracie Burns
Email >

Herb Roitblat
888.335.2200 229
Email >
Follow us on
Visit website