Create a term-document matrix
This example creates a data set that contains in each record the reference name of a document (file) and its content in two separated variables.
In order to run this example just paste the following statements in the Command area and press the button: EXECUTE. To view the content of the resulting data set go in the PATH tab and refer to the one named Document_content.
More options to execute the procedure are accessible through the GUI of ADaMSoft by clicking the button Execute from the List of procedure tab, after having selected the Text mining (cinteractions with external files...) and the link to the Read documents (paths from external data set...).
This example creates a data set that contains in each record the reference name of a document (file) and its content in two separated variables.
In order to run this example just paste the following statements in the Command area and press the button: EXECUTE. To view the content of the resulting data set go in the PATH tab and refer to the one named Document_content.
More options to execute the procedure are accessible through the GUI of ADaMSoft by clicking the button Execute from the List of procedure tab, after having selected the Text mining (cinteractions with external files...) and the link to the Read documents (paths from external data set...).
Proc Multidocreader dict=dircontent out=document_content; varpathfiles name; vargroupby filenamenoext; run; |