Introduction

This is the key to the meaning of the features of the etcbc4c dataset.

We organize the features in several groups, roughly analagous to the types of objects we have:

Grid features

otype node type book verse clause phrase word
oslots slot containment 1 1-11 2010-2015,2020-2030
otext textapi no data, only specifications

Sectional features

book name of Bible book Genesis Psalmi Amos
chapter number of chapter within book 3
verse number of verse within chapter 4
label passage indicator AMOS 03,04
label key for part within verse A B C

Lexeme features (on node type lex)

lex lexeme consonantal transliterated >MR[
voc lexeme pointed transliterated R;>CIJT
voc_utf8 lexeme pointed hebrew רֵאשִׁית
sp part of speech verb subs
ls lexical set quot ques
nametype type of named entity topo
gloss gloss beginning
language language Hebrew Aramaic

Word features

Orthography

g_cons word consonantal transliterated >CR
g_cons_utf8 word consonantal hebrew אשׁר
g_word word pointed transliterated >:ACER&
g_word_utf8 word pointed hebrew אֲשֶׁר
qere word (qere) consonantal hebrew HAJ:Y;74>
qere_utf8 word (qere) pointed hebrew הַיְצֵ֣א
trailer_utf8 after-word pointed hebrew ׃ ׆̇
qere_trailer after-word (qere) pointed hebrew ׃ ׆̇
qere_trailer_utf8 after-word (qere) pointed hebrew ׃ ׆̇

Lexical (on node type word)

lex word consonantal transliterated >MR[
lex_utf8 word consonantal hebrew אמר[
g_lex word pointed transliterated >MER
g_lex_utf8 word pointed hebrew אמֶר
language language Hebrew Aramaic
sp part of speech verb subs
pdp phrase dependent part of speech verb subs
ls lexical set quot ques

Morphology

gn prs_gn gender m f
nu prs_nu number sg pl du
ps prs_ps person p1 p2 p3
st state a c e
vs verbal stem qal piel nif hif
vt verbal tense perf impf wayq

Morphemes

nme g_nme g_nme_utf8 nominal ending / /IJM /@H
pfm g_pfm g_pfm_utf8 preformative !! !J.I! !TI!
prs g_prs g_prs_utf8 pronominal suffix +OW +IJ +HEM
uvf g_uvf g_uvf_utf8 univalent final ~@H ~IJ ~OW
vbe g_vbe g_vbe_utf8 verbal ending [ [W. [T.IJ
vbs g_vbs g_vbs_utf8 root formation ]] ]NI] ]HA]

Statistics

freq_lex frequency of lexeme
freq_occ frequency of word occurrence
rank_lex rank of lexeme
rank_occ rank of word occurrence

Linguistic features

Sentence(-atom) features

Nothing specific, just a generic number feature.

Clause(-atom) features

typ clause type AjCl WayX WXQt ZImX
kind rough clause type VC NC WP
rela clause constituent relation Adju Attr Coor
domain text type ?? Q N D
txt text type NQ NQQ QNQQ NQND
code clause atom relation 200 477 999
is_root ??  
tab hierarchical tabulation 0 3 10 29
pargr paragraph number 1 1.2 2.3.4
instruction instruction .q .d .. ve

Phrase(-atom) features

typ phrase type VP NP PP AdjP AdvP
rela phrase atom relation Appo Para Resu
function phrase function Pred Subj
det determination det und

Relationships

mother relation of linguistic dependency
distributional_parent the parent in the distributional hierarchy (-atoms)
functional_parent the parent in the distributional hierarchy (sentence clause phrase)

Generic features

number sequence number in context 123
dist distance to mother -10 0 1 8
dist_unit unit of measuring distance to mother clause_atoms phrase_atoms words
mother_object_type object type of mother clause phrase subphrase word