- Q: In the English language, what are the most frequently appearing
1) letters overall? 2) letters BEGINNING words? 3) final letters? 4) digrams (ordered pairs of letters)?
A:
web2 = word list from Webster's Second Unabridged web2a = hyphenated words and phrases from Webster's Second Unabridged both = web2 + web2a net = several gigabytes of Usenet traffic
1) Most frequently appearing letters overall: web2: eiaorn tslcup mdhygb fvkwzx qj both: eairon tslcud pmhgyb fwvkzx qj net: etaoin srhldc umpfgy wbvkxj qz
2) Most frequently appearing letters BEGINNING words: web: spcaut mbdrhi eofgnl wvkjqz yx both: spcatb umdrhf eigowl nvkqjz yx net: taisow cmbphd frnelu gyjvkx qz
3) Most frequent final letters: web: eysndr ltacmg hkopif xwubzv jq both: eydsnr tlagcm hkpoiw fxbuzv jq net: estndr yolafg mhipuk cwxbvz jq
4) Most frequent digrams (ordered pairs of letters) web: er in ti on te al an at ic en is re ra le ri ro st ne ar ... both: er in te ti on an re al at le en ra ic ar st ri ro ed ne ... net: th he in er re an on at te es or en ar ha is ou it to st nd ...
Program to compute this from word list in standard input:
- include <stdio.h>
- include <ctype.h>
typedef struct {
int count; char name[3?;
} FREQ;
FREQ all256?,initial[256?,terminal[256?,digram[65536?;
int compare(p,q) FREQ *p,*q; { return q->count - p->count; }
void sort_and_print(freq,count,description) FREQ *freq; int count; char *description; { register FREQ *p;
(void)qsort(freq,count,sizeof(*freq),compare); puts(description); for (p=freq;p<freq+count;p++)
if (p->count) printf("%s %d\n",p->name,p->count);
}
main() { char s[BUFSIZ?;
register char *p; register int i;
while (gets(s)!=NULL) {
if (islower(*s)) {
initial[*s?.count++; sprintf(initial[*s?.name,"%c",*s); for (p=s;*p;p++) {
if (isalpha(*p)) {
all*p?.count++; sprintf(all*p?.name,"%c",*p); if (isalpha(p[1?)) {
i = p0?*256 + p1?; digrami?.count++; sprintf(digrami?.name,"%c%c",p0?,p1?);
}
}
} terminal*--p?.count++; sprintf(terminal[[*p?.name,"%c",*p);
}
} sort_and_print(all,256,"overall character distribution: "); sort_and_print(initial,256,"initial character distribution: "); sort_and_print(terminal,256,"terminal character distribution: "); sort_and_print(digram,65536,"digram distribution: ");
}
