*********************
                            *                   *
                            * A Quick CQP Guide *
                            *                   *
                            *********************

The CWB web-page (soon to be on SourceForge!):

http://www.ims.uni-stuttgart.de/projekte/CorpusWorkbench/

Stefan Evert's CQP tutorial:

http://www.ims.uni-stuttgart.de/projekte/CorpusWorkbench/CQPTutorial/html
http://www.ims.uni-stuttgart.de/projekte/CorpusWorkbench/
                                             CQPTutorial/cqp-tutorial.pdf

Invoke cqp:

cqp -e
cqp -eC

Exit from cqp:

exit

(If you just did that, please enter again!)

show corpora;

While in cqp, keep in mind that some things work like on the Unix
terminal -- in particular, you can recall previous commands with the
upwards-pointing arrow, and you navigate the kwic results with
more/less-like syntax (space to move to next page, q to quit, etc.)

Select corpus (remember the semi-colon at the end of each command),
e.g.:

PSYCHIATRY-EN;
WEB-EN;

etc.

A quick way to know how many tokens are in a corpus:

info;

Simple kwic:

"compulsive";
"compulsive" %c;
"obsessive" "compulsive" "disorder";

If you have problems seeing accented characters (as in vowels with
umlaut in German or with accents in Italian), try:

set Pager more;

To see the frequency of occurrence of your last query:

size Last;

Order results by left and right context (funny syntax because these
are "macros" written by Stefan Evert):

"compulsive";

/sort_left[];
/sort_right[];

If you have too many results, it is a good idea to take a look at a
random sample...

First, "save" query into a variable:

A = "often";

Then, "reduce" A to the desired number of randomly selected contexts, e.g.:

reduce A to 20;  

Finally, take a look at these contexts:

cat A;    

Change context size:

set Context 60;
set Context 5 words;
set Context s;
set Context 3 s;
set Context default;

Other visualization options:

show + pos;
show + lem;
show -pos -lem;
show -cpos;
set PrintStructures text_id;
set PrintStructures ""; 

Doing queries using morphosyntactic annotation (if you've been
experimenting with show and set, now it's a good moment to go back to
a normal-looking kwic-display...):

[word = "obsessive"] [pos = "NN.*"]; 
[word = "obsessive" %c] [pos = "NN.*"];

[word = "cause"];
[lem = "cause"];
[lem = "cause" & pos = "V.*"];

[pos = "JJ"] [pos = "NN.*"];

For a query like the latter, often it is more meaningful to look at
frequency lists:

[pos = "JJ"] [pos = "NN.*"];
count by word %c; 

A frequency list for a collocate extracted from a "flexible" context:

[lem = "cause" & pos = "V.*"][pos="DT"]?[pos="JJ"]*[pos="NN.*"];
count by lem %c on matchend; 

You can also save the results to an output file:

cat > "myconc.txt"; 
count by word %c > "myfqlist.txt"; 

Rather advanced, but very useful: construct a frequency list of
collocations from a "flexible" context, e.g., all noun/verb pairs with
optionally one article/determiner and zero or more adjectives in the
middle (not strictly necessary to save query to variable A, but handy
since we don't care about seeing the ad interim kwics):

A = [pos = "VV.*"][pos="DT"]?[pos="JJ"]*[pos="NN.*"];
tabulate A match lem, matchend lem > "pairs.txt";

Now, external file pairs.txt contains all tab-delimited pairs of shape
V-N extracted from previous query, without the elements in the middle
(so, both "meeting deadlines" and "meet a difficult deadline" become
"meet deadline"), ready to be used as input for UCS.

Alternatively, you can collect a frequency list like this:

tabulate A match lem, matchend lem > "| sort | uniq -c | sort -nrk1 > vn.f.txt";

More fun with cqp:

set MatchingStrategy longest;
[lem ="cause" & pos = "V.*"] [pos = "NN.*"]+;

[lem = "cause" & pos = "V.*"] [pos = "DT"]? [pos = "JJ"]* [pos = "NN.*"]+ 
([word = "of"]|[word = "and"])? [pos = "DT"]? [pos = "JJ"]* [pos = "NN.*"]+;

"as" []{1,3} "as" within s;