About | FAQ | Backlog
Open Source projects, categorized.
add filters by typing...
...or clicking
.net 3d ajax analysis apache api application audio automation blog browser c c# c++ cms code code-generators communications compiler console content cross-platform css database debian design development django dynamic-content eclipse eclipse-plugin editor education email embedded engine extension filesystem filter finance firefox flash framework freebsd ftp game games generator gnome google graphics groovy gtk gui haskell html http i18n ide image interpreter java javascript jquery json language library linux logging lua mac macosx mac-os-x mapping maps mathematics metadata monitor monitoring mono multimedia music mvc mysql network networking objective-c object-oriented oop opengl osx pdf performance perl php php5 player plugin portable posix programming python qt rails research rss ruby science script scripting search security server shell simulation software-development sound source sql sqlite statistics streaming subversion swing sysadmin systems-administration technology template templates test testing text tool toolkit tools unit-testing unix utilities video viewer visualization web wiki win32 windows www x11 xhtml xml xslt
Apache Xerces2 J
[177 users on Ohloh]
A Java library for parsing, validating and manipulating XML documents. The latest version released, 2.9.1, provides support for the following standards and APIs:

* XML 1.0 (4th Edition)
* Namespaces in XML 1.0 (2nd Edition)
* XML 1.1 (2nd Edition)
* Namespaces in XML 1.1 (2nd Edition)
* W3C XML Schema 1.0 (2nd Edition)
* XInclude 1.0 (2nd Edition)
* OASIS XML Catalogs 1.1
* SAX 2.0.2 ...

[132 users on Ohloh]
Libxml2 is the XML C parser and toolkit developed for the Gnome project (but usable outside of the Gnome platform).

Includes the xmllint tool for checking documents for well-formedness, validating documents against a DTD or XML Schema, and pretty printing XML input.

dom4j: flexible XML framework for Java
[58 users on Ohloh]
dom4j is an easy to use, open source library for working with XML, XPath, and XSLT on the Java platform, using the Java Collections Framework, and with full support for DOM, SAX, and JAXP.
[53 users on Ohloh]
Tags: scrap html ruby parser gem scraper
Hpricot is a very flexible HTML parser, based on Tanaka Akira's HTree and John Resig's JQuery, but with the scanner recoded in C (using Ragel for scanning.) I've borrowed what I believe to be the best ideas from these wares to make Hpricot heaps of fun to use.
Saxon XSLT and XQuery Processor
[50 users on Ohloh]
The SAXON package is a collection of tools for processing XML documents.
Apache Xerces C++
[48 users on Ohloh]
Xerces-C++ is a validating XML parser written in a portable subset of C++. A shared library is provided for parsing, generating, manipulating, and validating XML documents. Xerces-C++ is faithful to the XML 1.0 recommendation and many associated standards. Source code, samples and API documentation are provided with the parser. For portability, care has been taken to make minimal use of templates, no RTTI, and minimal use of #ifdefs.
[38 users on Ohloh]
ANother Tool for Language Recognition (ANTLR) is the name of a parser generator that uses LL(k) parsing. ANTLR is the successor to the Purdue Compiler Construction Tool Set (PCCTS), first developed in 1989, and is under active development. Its maintainer is professor Terence Parr of the University of San Francisco.
[32 users on Ohloh]
Rome is a set of Atom/RSS Java utilities that make it easy to work in Java with most syndication formats. Today it accepts all flavors of RSS (0.90, 0.91, 0.92, 0.93, 0.94, 1.0 and 2.0) and Atom 0.3 feeds. Rome includes a set of parsers and generators for the various flavors of feeds, as well as converters to convert from one format to another. The parsers can give you back Java objects that are either specific for the format you want to work with, or a generic normalized SyndFeed object that le...
Expat XML Parser
[28 users on Ohloh]
Expat is a fast, non-validating, stream-oriented XML parsing library.
[24 users on Ohloh]
Tags: dom parser perl static lexer
Parse, Analyze and Manipulate Perl (without perl)

The ability to read, and manipulate Perl (the language) programmatically other than with perl (the application) was one that caused difficulty for a long time.

The cause of this problem was Perl's complex and dynamic grammar. Although there is typically not a huge diversity in the grammar of most Perl code, certain issues cause large problems when it comes to parsing.

Indeed, quite early in Perl's history Tom Christenson introdu...

[21 users on Ohloh]
Java Compiler Compiler is the most popular parser generator for use with Java applications. A parser generator is a tool that reads a grammar specification and converts it to a Java program that can recognize matches to the grammar. In addition to the parser generator itself, JavaCC provides other standard capabilities related to parser generation such as tree building (via a tool called JJTree included with JavaCC), actions, debugging, etc.
[21 users on Ohloh]
Sparse, the semantic parser, provides a compiler frontend capable of parsing most of ANSI C as well as many GCC extensions, and a collection of sample compiler backends, including a static analyzer also called "sparse". Sparse provides a set of annotations designed to convey semantic information about types, such as what address space pointers point to, or what locks a function acquires or releases.
Universal Feed Parser
[20 users on Ohloh]
Parse RSS and Atom feeds in Python
Natural Language Toolkit (NLTK)
[18 users on Ohloh]
NLTK — the Natural Language Toolkit — is a suite of open source Python modules, linguistic data and documentation for research and development in natural language processing, supporting dozens of NLP tasks, with distributions for Windows, Mac OSX and Linux.
[17 users on Ohloh]
Markdown is a text-to-HTML conversion tool for web writers. Markdown allows you to write using an easy-to-read, easy-to-write plain text format, then convert it to structurally valid XHTML (or HTML).
[17 users on Ohloh]
Parsec is designed from scratch as an industrial-strength parser library. It is simple, safe, well documented (on the package homepage), has extensive
libraries and good error messages, and is also fast.
[14 users on Ohloh]
SimplePie puts the 'simple' back into 'really simple syndication'. Flexible enough to suit newbies and veterans alike, SimplePie's focus has been two-fold: speed and ease of use. By thinking about the most useful ways to handle blogs, news sites, and podcasts, we've come up with an API that makes it easy to do cool things with your feeds.
The Spirit Parser Library
[13 users on Ohloh]
Spirit is an object-oriented, recursive descent parser generator framework implemented using template meta-programming techniques. Expression templates allow Spirit to approximate the syntax of Extended Backus Normal Form (EBNF) completely in C++. The Spirit framework enables a target grammar to be written exclusively in C++. EBNF grammar specifications can mix freely with other C++ code and, thanks to the generative power of C++ templates, are immediately executable.
Apache PDFBox
[12 users on Ohloh]
Tags: pdf java lucene library parser
Apache PDFBox is an open source Java PDF library for working with PDF documents. This project allows creation of new PDF documents, manipulation of existing documents and the ability to extract content from documents. Apache PDFBox also includes several command line utilities.

* PDF to text extraction
* Merge PDF Documents
* PDF Document Encryption/Decryption
* Lucene Search Engine Integration
* Fill in form data FDF and XFDF ...

[10 users on Ohloh]
Texy is one of the most complex lightweight markup language. It allows adding of images, links, nested lists, tables and has full support for typography and CSS.

Texy allows you to enter content using an easy to read Texy syntax which is filtered into structurally valid XHTML. No knowledge of HTML is required.