Prepare text as tree or list
Tree
The
jsngram.dir2 module
gives an easy list of files in a directory tree.
ix = jsngram.jsngram.JsNgram()
# adding from directory "src"
for file in jsngram.dir2.list_files(src):
ix.add_file(file)
List
Explicit list of files can also be used instead of a file tree. List of id-content-pairs can be used instead of files.
ix = jsngram.jsngram.JsNgram()
# adding from file list "files"
for file in files:
ix.add_file(file)
# adding (id, content) pairs "contents"
for id, content in contents:
ix.add_document(id, content)