Skip to content
/ meta Public
forked from meta-toolkit/meta

A Modern C++ Data Sciences Toolkit

License

MIT, NCSA licenses found

Licenses found

MIT
LICENSE.mit
NCSA
LICENSE.ncsa
Notifications You must be signed in to change notification settings

illinois/meta

 
 

Repository files navigation

MeTA: ModErn Text Analysis

This fork of MeTA aims to simplify the maintenance of the project for its use in CS 410 Text Information Systems at the University of Illinois at Urbana-Champaign. Thus, a containerized version of MeTA is provided instead of the original installation instructions for each platform. This container, based on Ubuntu 22.04 LTS, includes a pre-built version of MeTA with support for amd64 and arm64 architectures.


Please visit our web page for information and tutorials about MeTA!

Outline

Intro

MeTA is a modern C++ data sciences toolkit featuring

  • text tokenization, including deep semantic features like parse trees
  • inverted and forward indexes with compression and various caching strategies
  • a collection of ranking functions for searching the indexes
  • topic models
  • classification algorithms
  • graph algorithms
  • language models
  • CRF implementation (POS-tagging, shallow parsing)
  • wrappers for liblinear and libsvm (including libsvm dataset parsers)
  • UTF8 support for analysis on various languages
  • multithreaded algorithms

Documentation

Doxygen documentation can be found here.

Tutorials

We have walkthroughs for a few different parts of MeTA on the MeTA homepage.

Citing

If you used MeTA in your research, we would greatly appreciate a citation for our ACL demo paper:

@InProceedings{meta-toolkit,
  author    = {Massung, Sean and Geigle, Chase and Zhai, Cheng{X}iang},
  title     = {{MeTA: A Unified Toolkit for Text Retrieval and Analysis}},
  booktitle = {Proceedings of ACL-2016 System Demonstrations},
  month     = {August},
  year      = {2016},
  address   = {Berlin, Germany},
  publisher = {Association for Computational Linguistics},
  pages     = {91--96},
  url       = {http://anthology.aclweb.org/P16-4016}
}

Project setup

Docker

A Docker image with a pre-built version of MeTA is available on Docker Hub.

docker pull josecols/meta:3.0.2

This docker image makes the MeTA binaries globally available, allowing you to run them from anywhere. For example, to perform Basic Text Analysis on a document, you can create a container from the image and run the following command:

docker run -it --rm --name meta --mount type=bind,source=$(pwd),target=/app --entrypoint bash josecols/meta:3.0.2
profile /meta/config.toml doc.txt --stop

The docker command above will mount the working directory to the /app directory in the container. This allows you to access the files in the current directory (e.g., doc.txt) from the container. Similarly, a default config.toml file is provided in the /meta directory for easy access. However, you can still provide your own config.toml file by placing it in the current directory of the host machine.

Additionally, a Dockerfile is provided in this repository for building the image locally. The Dockerfile is based on Ubuntu 22.04 LTS and supports ARM (e.g., Apple silicon) and AMD64 architectures, among others. To build the image locally, run the following command:

docker build -t meta:latest .

About

A Modern C++ Data Sciences Toolkit

Resources

License

MIT, NCSA licenses found

Licenses found

MIT
LICENSE.mit
NCSA
LICENSE.ncsa

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C++ 98.1%
  • CMake 1.6%
  • Other 0.3%