Transforms the documents in a list of words, considering also their sentences
Important prerequisities: download the file at the following link, create a directory and decompress its content in it.
This example creates a data set in which each record is a word of the documents that were already parsed and written in an ADaMSoft data set.
Note that the procedure recognize also the different sentences that compose the documents; this will permit to refer each word to both the document and the sentence.
In order to run this example just paste the following statements in the Command area and press the button: EXECUTE. To view the content of the resulting data set go in the PATH tab and refer to the one named Documents_words.
Important prerequisities: download the file at the following link, create a directory and decompress its content in it.
This example creates a data set in which each record is a word of the documents that were already parsed and written in an ADaMSoft data set.
Note that the procedure recognize also the different sentences that compose the documents; this will permit to refer each word to both the document and the sentence.
In order to run this example just paste the following statements in the Command area and press the button: EXECUTE. To view the content of the resulting data set go in the PATH tab and refer to the one named Documents_words.
define directory_treetagger=WRITE HERE THE PATH OF THE DIRECTORY WHERE THE PREREQUISITIES FILES WERE EXTRACTED, BY USING THE CHARACTER "/" INSTEAD OF "\"; Proc Words2records dict=Document_content out=Documents_words; var document_content; vardescriptor document_reference; consider_sentence ; sentence_detector_file &directory_treetagger/it-sent.bin; onlyascii; run; |