Home Linguistics: Fieldwork: Databases: Linux: Debian: Windows: Miscellaneous

Organising text databases in Linux

Organising Shoebox (Toolbox) databases in Linux

Searching strings.

tr

$ tr -d '\n' < test.wc.tex | wc -m

Ptx

Links: making indices for lines which contains \\mp and kituqa from any tex files in all subdirectories of the directory sbxata. Note that the regular expression '.*' means any letters in any lengh.

sbxata$ ptx -AW "\\\<mp\>.*kituqa.*" */*.tex

making indices for lines which contains \\mp and kituqa from any tex files in all subdirectories of the directory sbxata. Note that the regular expression '.*' means any letters in any lengh, and '\<' word initial and'\>' word final. If using '\<kituqa' for '\<kituqa\>' the command pick ups kituqa, kituqasi and etc and if using 'kituqa\>' for '\<kituqa\>' the command pick ups kituqa, akituqa, ekituqa and etc.

sbxata$ ptx -AW "\\\<mp\>.*\<kituqa\>.*" */*.tex

Grep

Links: Examples:

to extract lines which contain \\mp and word from any .tex files in all subdirectories of the directory X in C dirive.

$ grep -rn \\mp.*word c:/X/*/*.tex

to extract \\mp lines from any .tex files in all subdirectories of the directory X in C drive and list lines which contains YYY.

$ grep -rn \\mp c:/X/*/*.tex | grep YYY

Extracting necessary lines from Shoebox (Toolbox) files

$ grep -n \\vl [iput file name] > [output file name] (extract lines which begin with \vl from the input file and save the result to the output file. -n --line-number)

Remove % and \ lines from .tex files

$ grep -v \% test.tex | grep -v \\* > result.tex

sed

Links: My remider

\\, \/, \., \[, \], correspond to \, /, ., [, ] respectively below.

add XX to end of each line.
$ sed 's/$/ XX/' file

Replace XX with YY
$ sed 's/XX/YY/g' INPUTFILE > OUTPUT

Examples:
$ sed 's/-3\\ns\.\\gen/{\\his}/g' INPUT FILE > OUTPUT (replaceing -3\ns.\gen with {\him}.)
$ sed 's/\\rb{[^}]*}//g' sed.tex | sed 's/\[[^\]]*\]//g' | sed 's/\[\]//g' > sed.1.tex (removing \rb{.*} [] [.*])
$ sed 's/\\rb{[^}]*}//g' sed.tex > sed2.tex (removing {.*})
$ sed 's/\[[^\]]*\]//g' sed.tex > sed5.tex (removing [.*])
$ sed 's/\[\]//g' sed.tex > sed1.tex (removeing [])
$ sed 's/\[//g' sed.tex > sed1.tex (removeing [)

perl