About | FAQ | Backlog
Open Source projects, categorized.
add filters by typing...
...or clicking
ai algorithms api application audio audio-recognition bi business business-intelligence classifier classifiers code computational-linguistics computing conversion corba corpora data data-analysis data-mining development disimilarity distributed distributed-computing document education euclidean-distance fft filter framework fuzzy grid html information-analysis information-retrieval intelligent-data-analysis java java-data-mining kdd knowledge-discovery language library lpc machine-learning mahalanobis-distance marf mathematics modular-audio-recognition-framework naturallanguage natural-language natural-language-processing neural-network nlp optimization parser pattern-recognition processing programming research rmi science semantics similarity similaritysearch software-engineering sound speech speech-recognition streaming structured-document text-classification text-processing tokenization tool tools unicode video voice voice-recognition web xml xml-rpc
[19 users on Ohloh]
Tags: research data-mining data java algorithms artificial-intelligence machine-learning analysis
Weka is a collection of machine learning algorithms for solving real-world data mining problems. It is written in Java and runs on almost any platform. The algorithms can either be applied directly to a dataset or called from your own Java code.
RapidMiner (YALE): Java Data Mining
[10 users on Ohloh]
RapidMiner (formerly YALE) is the most comprehensive open-source software for intelligent data analysis, data mining, knowledge discovery, machine learning, predictive analytics, forecasting, and analytics in business intelligence (BI). RapidMiner provides more than 400 data mining operators, a graphical user interface (GUI), an online tutorial with hands-on data mining applications, a comprehensive PDF tutorial, many visualization schemes for data sets and data mining results, many different le...
[2 users on Ohloh]
SimMetrics is a Similarity Metric Library, e.g. from edit distance's (Levenshtein, Gotoh, Jaro etc) to other metrics, (e.g Soundex, Chapman). Work provided by UK Sheffield University funded by (AKT) an IRC sponsored by EPSRC, grant number GR/N15764/01.
Java Data Mining Package (JDMP)
[1 users on Ohloh]
The Java Data Mining Package (JDMP) is an open source Java library for data analysis and machine learning.

It facilitates the access to data sources and machine learning algorithms (e.g. clustering, regression, classification, graphical models, optimization) and provides visualization modules. It includes a matrix library for storing and processing any kind of data, with the ability to handle very large matrices even when they do not fit into memory. Import and export interfaces are provid...

MARF:Modular Audio Recognition Framework
[1 users on Ohloh]
MARF is an open-source research platform and a collection of voice/sound/speech/text and natural language processing (NLP) algorithms written in Java and arranged into a modular and extensible framework facilitating addition of new algorithms. MARF can run distributedly over the network and may act as a library in applications or be used as a source for learning and extension.
[1 users on Ohloh]
TestEl is a Java-based learning analyzer for HTML (and possibly other) structured documents. It can be trained to detect structures in such documents and renders hits in XML.
Feature Extraction plugin API
[1 users on Ohloh]
Tags: information-analysis artificial-intelligence analysis
Easy-to-use platform-independent plugin API for the extraction of low-level features from audio data in PCM format, as required in the context of music information retrieval software.
Apolo, user based music suggesting.
[0 users on Ohloh]
Tags: artificial-intelligence analysis
Apolo is a personal music suggesting system based on user behavior analysis.
DataTime Process Framework
[0 users on Ohloh]
Tags: analysis video artificial-intelligence framework distributed-computing
The DataTime Process Framework is intended to support the processing of time-based data in a modular, concurrent, distributed and extensible manner. C++, using YARP, ACE, Qt and MUSCLE on Linux, OSX, Windows and Solaris.