Text Synthesis & Recognition
- Emdros
("Engine for MdF Database Retrieval, Organization, and Storage".
Text query-language engine for expressing linguistically relevant queries on a RDBMS)
(ml)
(forum)
- Ellogon
(C based, multi-lingual, general-purpose text research
framework. Ellogon includes a KDE based visualization tool
and API bindings for C++, Perl, Python and other programming
languages.)
(forum)
- libextractor
(C library for extracting keywords from files)
(source)
- DDC
("Dialing DWDS Concordancer". C++ based linguistic search engine,
used by German science institutions.)
- Traduki (Python based
text translation toolkit)
(cvs)
(ml)
- dbacl
(Tool to help classify text documents into categories, and then
compare other text documents to the learned categories, using
Bayesian filtering
techniques.)
- Alice (AIML XML based
QA-engine designed to pass the Turing Test)
(cvs)
(ml)
- Anna (AIML XML based QA-engine
designed to pass the "Loebner Prize"
Turing Test. Anna is a code-fork of Alice.)
(cvs)
- Catty (Google search engine
based chat bot)
- Cack
(English sentences generator using 'random' words - using nouns,
verbs, adjectives and adverbs in the correct context)
- Dadadoo
(Tool to create 'random' sentences based on a text analysis for
Markov chains
of word probabilities. Inactive project.)
- Enca ("Extremely
Naive Charset Analyser". Tool to recognize the character encoding of a
text and convert it into another encoding. Inactive project.)
- Snowball (String processing
language parser for creating
stemming algorithms for use in textual query systems. Inactive project.)
(cvs)