Create searchable PDFs from images
Auto-rotate
Pattern matching for redaction, indexing, and highlighting
overview
OCR Xpress… extracting text from document images
Use OCR Xpress to add OCR (Optical Character Recognition) into your document imaging, e-discovery, or records management solution. Extract text from scanned and FAXed document images for indexing. Redact sensitive personal information like social security numbers and credit card numbers to protect privacy. Investment in the OCR Xpress SDK decreases expenses in development, implementation, and deployment.
OCR Xpress is available as a .NET SDK, an ActiveX SDK, and Activities for Windows Workflow Foundation (WF). It delivers:
- High-accuracy OCR of document images
- Searchable document creation (i.e. PDF image over text)
- Full-page text recognition of scanned or FAXed images
- Pattern matching for automatic redaction, indexing, or highlighting of phone numbers, social security numbers, and other data
- Superior auto binarize, auto rotate, and auto deskew
- Support for 13 languages
- Preservation of photos and graphics
tech specs
Technical Notes
- Deploys within .NET as a managed control and is fully compliant with .NET Framework 2.0 and above
- ActiveX COM control available for most other development environments
- Sample code included for: VB.NET, C#, VB, Delphi, VC++, HTML
- Object-oriented API for .NET users
- Can be used in a multi-threaded environment, performing thread-safe processing (more)
- Supports user-specified debug logging levels
- Suitable for client-server Web-based applications
- For documents up to 999 pages
- Free full-featured trial version available for download (trial version watermarks output files)
Language Recognition
- Supports English, French, German, Italian, Spanish, Portuguese, Danish, Dutch, Swedish, Norwegian, Hungarian, Polish, and Finnish
- Includes dictionaries for all supported languages
- Custom dictionary applies user-defined words
File Output Formats
Output can contain unformatted text, formatted text, or formatted text plus images, in these file formats:
- PDF version 1.4 files (Professional Edition only)
- PDF – Original image over hidden text
- PDF – Formatted Text and Graphics (Normal)
- Microsoft Word-compatible RTF
- Excel v2.x (compatible with later versions)
- WordPerfect 5.0 (compatible with later versions)
- HTML, with a sub-folder containing images
- ASCII
- ASCII with no line breaks
- ASCII with line breaks
- ASCII, smart-formatted with spaces
- ASCII, comma- or tab-delimited
Pattern Matching using Approximate Regular Expressions
- Search OCR output for occurrences of any defined pattern (such as social security numbers, phone numbers, or dates)
- Approximate matching allows inexact matches to be located
- Located strings can be redacted or highlighted using the included NotateXpress component
- POSIX-compliant regular expression syntax
Image Input and Pre-processing - ImagXpress Document is included with OCR Xpress
- Opens TIFF, JPEG, GIF, PNG, JBIG2, and many other image formats
(read the full ImagXpress Document v9 product description) - Advanced auto binarization evaluates color images to optimize conversion
- Deskew, despeckle, and many other image cleanup functions
- Accepts uncompressed in-memory image data for high performance
Auto Rotation
- Automatically rotates 0, 90, 180, or 270 degrees to correct text orientation
- Returns the applied rotation angle
- Highly optimized for speed
Character Position Information
- Returns character position for all characters
- Can be used to redact or highlight text in the original image
Character Confindence Values
- Returns confidence for all recognized characters
- Confidence values can be used for combining voting engines
- Alternate suggested characters provided
- Add text proofing and character replacement functions to applications
- Character reinsertion enables text correction prior to document creation
Segmentation
- Automatically or manually locate regions of the input image and identify them as graphics whose color can be preserved or areas containing recognizable text
- Access each region separately, or recombine into fully-formatted documents such as PDF or RTF files
Font Generation
Formatted output based upon recognized text:
- Serif, sans serif, or monospaced font
- Normal, bold, italic, or bold-italic
- Scaled to the closest font size
Edition Descriptions
- Professional Edition creates PDF output, as well as all other supported formats
- Standard Edition supports all output formats except PDF
Also Available
ImagXpress® v10 speed-optimized document and photo imaging SDK
SmartZone™ v2 text & handprint recognition SDK
FormFix® v3.1 structured forms processing and OMR SDK
FormSuite™ v2 structured forms processing SDK collection; zonal OCR, ICR, OMR
Barcode Xpress™ v5 1D and 2D barcode recognition SDK
ScanFix® Xpress v5.1 superior scanned image cleanup SDK
ISIS® Xpress™ v3 high-speed ISIS scanning SDK
PDF Xpress™ v3 high-level PDF and PDF/A SDK
Prizm® IP grayscale and color image cleanup and forms processing
FolderBots™ the image transformer service