PDFTextStream v2.0 Simplifies the Conversion of Unstructured PDF Content Into Usable Data

Share Article

The new version of the leading PDF content extraction tool helps large enterprises and governments more easily access critical data held in PDF documents.

Snowtide Informatics Systems, Inc., the leading provider of enterprise-class PDF content extraction solutions, today announced the release of PDFTextStream v2.0, the latest version of its PDF content extraction API. Adding a wealth of new capabilities in response to customer requests, PDFTextStream v2.0 is now available on the Python and .NET platforms as well as for Java; now supports the extraction of Chinese, Japanese, and Korean (CJK) text; provides new tools that simplify content extraction from unstructured PDF documents; adds the ability to recognize and interpret tabular data in PDF documents; now supports v1.9 and v2.0 of the Apache Lucene search engine; and includes other critical performance enhancements.

PDFTextStream enables back-end enterprise systems to extract the text and metadata contained in PDF documents. This latest version is especially suited for large enterprises and government agencies that need to automate and speed the extraction and cataloging of content held in PDF documents, yet demand high extraction accuracy.

“Over the last year, large enterprises and government agencies have been approaching us with increasingly complex PDF content extraction problems revolving around pressing business issues,” said Chas Emerick, the President and Founder of Snowtide Informatics Systems, Inc.

“These problems often present unique technology challenges,” he added. “For example, some require the extraction of data from unstructured content; others the extraction of CJK text, or the ability to interpret and access tabular data in PDF documents so it can be more easily converted into spreadsheets, XML files, or database-ready records. With PDFTextStream v2.0, we can now offer an even more comprehensive API to meet these sophisticated demands.”

Functionality Not Matched by Competitive Offerings

The release of PDFTextStream v2.0 expands the leadership position of PDFTextStream as the most comprehensive and highest-performing set of developer tools for turning unstructured PDF content into structured data. New capabilities include:

  • The ability to use PDFTextStream within Python and .NET environments, where it was previously only available on the Java platform.
  • Functionality that enables the recognition and interpretation of tabular data -- along with the API for accessing the data -- for the purpose of rendering spreadsheets, XML files, or other usable formats.
  • Full CJK character encoding support built in to the standard PDFTextStream distribution, an increasingly important requirement in today's global economy
  • Added support for v1.9 and v2.0 of the Apache Lucene search engine, necessary to keep PDFTextStream's integration module up to date with the latest Lucene releases

Other important new features include improved accuracy of extracts sourced from rotated text, the ability to more easily plug text extracts into existing test analysis processes, functionality that enables the merging of two or more PDF documents into a single file, and significantly improved performance.

For additional PDFTextStream v2.0 product details, please visit http://snowtide.com/PDFTextStream

About Snowtide Informatics Systems, Inc.

Snowtide Informatics Systems, Inc. is a privately held software company headquartered in Holyoke, Massachusetts. Its high-performance software and custom development services enable large enterprises and government agencies to automate the extraction, conversion, and cataloging of content held in PDF documents. PDFTextStream, Snowtide Informatics' flagship product, is a software component library for Java, Python, and .NET environments that has been built from the ground up to rapidly and accurately extract text and metadata held in PDF documents. For more information about Snowtide Informatics Systems visit snowtide.com.

# # #

Share article on social media or email:

View article via:

Pdf Print

Contact Author

Charles Emerick
Visit website