View on GitHub

Poem generator

UKZN COMP 316 Semester project - 2016

Download this project as a .zip file Download this project as a tar.gz file

Poem_Generator


UKZN COMP 316 (Natural Language Processing) Semester project - 2016
github.io: http://shaherin.github.io/Poem_Generator/

Coded in eclipse using Java 8. Includes only the src folder of the eclipse project.

This project requires JWI (Java WordNet Interface) using WordNet 3.1, and the Stanford CoreNLP Library(2015). Both libraries need to be present in the project folder - or edit file paths in the wrapper classes for exception safety.

Released under the GNU General Public License v3.0

Status:
The project is now completed and functional, but is not in a desirable state. We are able to generate free verse as well as a close approximation of sonnets.
See "Poem Generation Report.pdf", in particular the concluding statements, for additional info.

NOTE:
I do not find NLP particularly interesting on this level, so the purpose if this project evolved into investigating multithreading and design patterns in Java, with some degree of search optimization. This shows in the poem generation algorithm itself, as many NLP techniques were studied and added as functions but were never used in the final algorithm.
A seperate project was done in C++ in optimization, with the intention of porting the progress to java and including it in this project. That was never done, and currently the search optimization is limited to a basic multithreaded search.

On Multithreading
The conclusion on multithreading is still undetermined. Java provides features like thread pools and executor services which remove a lot of responsibility from the programmer, but may be prone to misuse and error. Of course, we have the ability to write our own thread pools etc., and forego the java implementation. Our own implementations could potentially afford us more control, and may even be better than the java standard in some cases.
C++11 has come a long way in terms of its standard library and multithreading. We now have access to inherent mutex and semaphore capabilities(among others), as well as an inherent thread library. It is no longer necessary to use external libraries like boost, sfml, Qt or other any other multithreading options.

On Design Patterns
The purpose of design patterns is often to overcome certain issues/oversights in programming languages themselves, as well as to write maintainable, readable code. C++ is outright superior in this aspect, with just the use of function pointers, lambdas(much more versatile and powerful than the identically named lambda in java; the idea of functional interfaces are never enforced in C++), std::function and std::signal in C++11, you can overcome most design issues simply and concisely, leading to shorter, smarter, and even more readable code.
There isn't much in the way of information on updated design pattern coding in C++11, so if you're ever following a design pattern textbook and think to yourself: "Well this seems superfluous, can't the same thing just be done by x language feature?"; the chances are that it can and probably should be.

On Optimization
A large part of optimization is optimizing memory usage along with your preferred optimization algorithm, i.e. the dreaded manual memory management. C++, being the lower level language, requires manual memory management which allows a far greater degree of control over the program. Although java has improved over the years, there's no question that manual memory management is superior to any garbage collector, if used well.

It's generally accepted that C++ is superior for performance intensive applications, simply due to the fact that it's a lower level language. Hence for projects like this it is preferable. However, the language chosen must match the primary library required for a project. The Stanford CoreNLP library was determined to be superior to C++ equivalents, which led to Java as the language of choice.