Without sampling nobody knows the importance of non-textual documents to an organization – it’s all conjecture.
Memphis TN (PRWEB) June 19, 2014
In a series of blog posts, information governance technology provider BeyondRecognition has noted major limitations in the prevalent text-based approach used by virtually all major content management and information governance systems and has proposed new industry metrics and types of sampling to determine the significance of these restriction to any given organization.
BeyondRecognition (“BR”) founder and CEO John Martin pointed out, “The major information governance systems can only index documents for search and classification purposes if they have text associated with them. This is a major problem for companies like those in the oil and gas industry that have a large proportion of documents that don’t have searchable text, often exceeding 40+% of the total documents, things like image-only PDF or scanned schematics and well logs. These often-critical documents are literally out of sight, out of mind for text-restricted systems – they simply don’t see them.”
Using the company’s Document U blog as a forum (http://www.beyondrecognition.net/document-u-blog), BeyondRecognition has described the problem, suggested new metrics and ways to sample within an organization to determine the significance of the problem:
In “Is Your TAR Really Text-Assisted-Review? (And Why It Should Matter to You)” on March 25, 2014, and “’Predictive’ Coding and the Naked Emperor,” on May 8, 2014, BR explored the significance of the text restriction in technology-assisted review and predictive coding technologies currently gaining favor in the electronic discovery world.
In “Measuring Text Bias/Tunnel Vision in Content Search and ECM System,” on May 29, 2014, BR proposed the use of an industry metric, MTV or Maximum Textual Vision, which would be the proportion of all documents that were searchable using text-restricted technologies. The blog posting also pointed out that “recall” measures may often fail to account for documents that were missed because they did not have text layers.
Finally, in “Sampling Resolves Conjecture on Significance of Non-Textual Documents,” on Jun 17, 2014, BR noted that without sampling the significance of non-textual documents to any organization is all conjecture, and suggested two ways to sample an organization’s documents to calculate not only MTV but also the significance of the documents that current systems are not able to see or read.
BeyondRecognition (“BR”) provides a suite of integrated tools that help Fortune 500 companies consolidate paper and electronic documents, migrate documents to content management systems, and remediate file shares. BR’s key technologies include the ability to organize content by grouping documents based on visual similarity – even native files, PDF’s, and scanned documents can all be processed together, even those with no associated text. BR’s visual coding then automates the process of extracting data elements from each cluster. BR’s clients enjoy rapid project start-up, and improved accuracy in coding or extracting data elements from documents, and they particularly appreciate finishing projects in months originally scheduled to take years.
For more information about BeyondRecognition, visit the BR website at http://www.BeyondRecognition.net, or contact Joe Howie, VP, Corporate Communications, at jhowie(at)beyondreognition(dot)net, or 918-894-6943.
You can also follow BR on Twitter @BeyondRecog or join the BeyondRecognition group on LinkedIn at http://www.linkedin.com/company/beyondrecognition.