Sorcero Open-Sourcing Ingestum™ Framework For Unstructured-Data Text Extraction

Share Article

Sorcero Inc., a rapidly growing enterprise AI startup, is releasing Ingestum, a free open source software unified ingestion framework, to make ingestion extensible, scalable, and easy.

Ingestum: The extensible, scalable, free open source ingestion engine.

Ingestum: The extensible, scalable, free open source ingestion engine.

“Data and Analytics executives tell us that unstructured documents are full of data they need but can’t access. We want organizations to benefit from AI and ingestion is a significant barrier. We think open-sourcing Ingestum will democratize ingestion,” said Dipanwita Das, Sorcero CEO & Co-founder.

Sorcero Inc., a rapidly growing Washington, DC enterprise AI software startup, announces the release of Ingestum™ (“ingest’em”), a free and open source software (FOSS) unified content ingestion framework that supports sourcing and transformation of a wide variety of data and document types into a uniform document format.

Ingestion of arbitrary and unstructured content formats—PDF files, Microsoft Office® documents, email threads, and so on—presents a challenge in the AI industry. The ingestion market is extremely fragmented. There are many niche players, and most AI firms handle the ingestion of unstructured text in-house. This is the challenge the Ingestum framework meets head-on: it is a methodical, reusable, extensible, and scalable framework for ingesting content, free and open to all.

Written in Python and built around reusable, programmable pipelines, Ingestum—from the Latin word to ingest or toss in—is largely agnostic of both source and output formats; it is designed to be extended through the use of plugins, and it can be deployed as a command-line tool or as a web service. Ingestum integrates existing FOSS projects such as PDFMiner, Google’s Tesseract-OCR Engine, and Mozilla's Deep Speech speech-to-text engine.

“Data and Analytics executives tell us that unstructured documents are full of data they need but can’t access. We want organizations to benefit from AI and ingestion is a significant barrier. We think open-sourcing Ingestum will democratize ingestion,” said Dipanwita Das, Sorcero CEO & Co-founder.

“Ingestum leverages many existing open source projects, so no one has to reinvent the wheel; it can easily integrate existing workflows, or incorporate existing software as plugins,” said Walter Bender, CTO and Co-founder of Sorcero, who revealed Ingestum yesterday at the LibrePlanet FOSS conference.

Sorcero—recently featured at the LOINC and InsurTech NY conferences—invites IT directors, software engineers, and AI researchers to download and use Ingestum today (git clone https://gitlab.com/sorcero/community/ingestum.git).

About Sorcero:

Sorcero was founded in Washington, DC in 2018 by Dipanwita Das, Richard Graves, and Walter Bender. Sorcero’s Language Intelligence Platform uses domain understanding to power mission-critical decision-making across enterprises in life sciences and insurance. The company’s mission is to inform critical decisions to improve lives through access and understanding of the world’s knowledge. To date, the company has raised $5.4 million in funding from Leawood Venture Capital, WorldQuant Ventures, Castor Ventures (the MIT Alumni fund), and H/L Ventures.

Share article on social media or email:

View article via:

Pdf Print

Contact Author

Claiborne Deming
Sorcero
(202) 750-4435
Email >
Visit website