I'm in a c programming class right now working on a project and have a couple questions. I have to make a program that will open a txt document and read through it twice once to count the number of words and a second time to count the number of sentences. Right now I can get it to read the number of words and print to the terminal screen but i am not sure how i can get it to read a sentence and know when to stop (Starts with a capital letter or a number and ends with a terminator with two spaces after). Can anyone help???
I'm not fluent in C, but most languages have string functions that will split strings at a given point. In PHP you have explode, in JS and AS you have split, and there's tons of others. Probably the easiest way though would be to use regular expressions. Once you set up a regex that fits your parameters, you can test against that and have it return those that meet the conditions. Hope that helps a bit.
It doesn't really seem to be a "C" problem. If you are as far as counting words, you should have everything you need to count sentences. As you stated, you need to decide what the rules are to define a "sentence". Implementing it with the code you currently have to count words shouldn't be much of a change. There are many programming boards out there than can help with specific questions (which function to use, etc.). To go further would require looking at the source code itself, and that is a no-no if you are doing classwork.
If all sentences end with a period then why not use the period to delineate the end of each sentence? Code: FILE * f; char szBuff[512]; int nLen; int nSentences; // Assume f is the open file... nSentences = 0; rewind(f); for(;;) { char * c; c = fgets(szBuff, sizeof(szBuff) - sizeof(char), f); if(c) { while(*c) { if(*c++ == '.') nSentences++; } } else { break; } } printf("There are %d sentences.\n", nSentences); I guess in retrospect you'd need to account for periods used inside sentences.
You'll need seperate if statments for things such as question marks. I think its going to be more of a If And statment. If there are two spaces before a character start, end on period or question mark ect. Its been alone time, might have to break out the books
Those are library functions <strlib.h>, and I'm guessing the point of the class is to show him how to do it himself, and that the libraries are easier... for the question, here's the pseudocode: Rewind the file (set fp to start); for (inSent=sentCnt=0; !feof(fp); curChar = read(fp) ) { if (!inSent && ((curChar>='A') && (curChar<='Z')) { inSent = 1; sentCnt++; } /*if not already in a sentence and the current char is uppercase Alpha */ if (inSent && (curChar == ' ') && (lastChar == ' ') && (last2Char == '.')) inSent = 0; last2Char = lastChar; lastChar = curChar; } /* for loop over file */ I left out the variable initialization and file management. your code may look different depending on tons of stuff. Edit: on reread, if the sentence can end in other punctuation, I'd replace the (last2Char == '.') with a function IsPunctuation(last2Char), and use a switch statement in that function to test for various characters.
yea the period ending all of the sentences won't really work because if you have a number which according to this assignment includes numbers so 19.87 could be a word. i don't understand how to make the if and statement work (this is my first programming class) I know what needs to happen I just am not all that sure of how to get it into code. a capital letter or number can start a sentence a period, question mark, or exclamation mark followed by a space and tab or double space or space and end of line. so far I have code to read the character and start a sentance and i can figure out how to have it pick up a period, question mark, or exclamation mark but I do not know how to have code that knows there is a period with two spaces after it to end a sentence.
Thanks Kx2zx That does help so if i understand this correctly the statement- if (inSent && (curChar == ' ') && (lastChar == ' ') && (last2Char == '.')) inSent = 0; will make sure that there are two spaces after the puncuation mark to end a sentance? I should be able to change the code and know that it can use either two spaces or space and EOL or space and TAB, and i can figure out that it can be eith a '.', '?', or '!'.
I would listen to kz2zx. He's got the cryptic C style of writing code down which means he's done a little bit of C programming. (btw: I hate that style but like everything...its a style) One note...a sentence doesn't always start with a capital letter as proper names are capitalized mid sentence. You really need to go off of the punctuation and the spacing. Don't forget though sentences can end with a quote and you put the sentence punctuation inside the quote like: "blah blah blah?" I doubt though your instructor expects you to get all corner cases but just the obvious one. Also you can write the code to do it all in one loop and I'd use the switch...case syntax to determine where sentences split. But if you haven't been introduced to that then just stick with if...else statements.
Your going to waint to use the two spaces to begin a sentance as well. Because you'd count things twice. For instance: "At the movies, I bought two drinks and some candy and a box of 100 Goobers." That would count 4 times if you just go off starting a sentance with a cap. letter or number.
I can do cryptic C++, too! I can't tell you how many times I've looked at my own C++ code, fully commented (with paragraphs and UML diagrams) and said "WTF just happened here?" It's worse when you use UML and state charts to generate code (Rhapsody...)
I'll tell you what happened....you were drinking too much when you wrote it the first time and when you went to go back and review it...you forgot you had to be drunk to understand it. Next time make sure you put in your comments, // Note: Be sure to consume at least a 6-pack of beer // before reading this next section or it won't make sense :up:
A coworker's Jewish. She put this comment in once, after a method she borrowed: // and a great miracle happened there
In order for the routine to be really accurate and versatile it would have to account for carriage returns and line feeds. Any routine that relies on a period, question mark or exclamation point followed by two spaces ending a sentence is going to miss sentences that are followed by a <cr><lf> pair. It is also possible for sentences to end with a period and only one space. On this BBS as an example you'll notice that posts are stripped of their extra spaces after periods. If you were to copy text from this post as you're reading it now, paste it to a text file and run it through kz2zx's routine above it would count zero sentences. Not saying mine's better, just pointing out that depending on the complexity of the sentences being evaluated a parsing routine can get very complicated in a hurry. And leave us not forget the possibility that someone might end a sentence with three exclamation points!!! Or could they toss something as unconventional as exclamation points and question marks into the same sentence ending punctuation nightmare?!? The possibilities could go on and on...
Dusty, IMO no need to over-complicate the solution - all you need to do to count a sentence is recognize that you have two sentence termination characters in a row. Don't have to be concerned with the start of a sentence or whether you're in a sentence, as recognizing the end of a sentence is sufficient. This should work, you can fine tune it a little as needed for the complete universe of sentence termination character sequences including multiple spaces, EOL, etc. All it does as presented is recognize that we've run into two sentence termination characters in a row, i.e., a period or a question mark followed immediately by a space. fopen(), etc. obviously left out, but you should be able to follow easily. Code: FILE * fpInput; char cChar; int bFirstTermChar; /* 1 == found first termination character in */ /* a sequence of characters in the input */ /* 0 == have not found first term character */ int nSentences; bFirstTermChar = 0; nSentences = 0; do { cChar = fgetc(fpInput); switch (cChar) { case '.': case '?': bFirstTermChar = 1; break; case ' ': if (bFirstTermChar) { ++nSentences; bFirstTermChar = 0; } break; default: If (bFirstTermChar) bFirstTermChar = 0; } } while (cChar != EOF);
The only reason that I am using a capital to start it is because it is in the program description. the way that i have been doing it is once you are in a sentence it is true therefore even though you have 4 starters (A, I, 100, and G) since it already true it will not add a sentence until it hits the terminator, and after the terminator it needs to make the statement in a sentence FALSE.
Here's an approach that handles the requirements of starting on a capital letter, not breaking on punctuation in a sentence, and only counting sentences that end with 2 spaces. Code: #include <ctype.h> #define START_SEARCH_STATE 0 #define TERMINATOR_SEARCH_STATE 1 #define FIRST_SPACE_STATE 2 #define SECOND_SPACE_STATE 3 .... int state; FILE * fp; char currentChar; int numSentences; state = START_SEARCH_STATE; numSentences = 0; currentChar = fgetc(fp); while(currentChar != EOF) { if (state == START_SEARCH_STATE) { if (isalnum(currentChar) && isupper(currentChar)) { state = TERMINATOR_SEARCH_STATE; } } else if (state == TERMINATOR_SEARCH_STATE) { switch(currentChar) { case '.' : case '!' : case '?' : state = FIRST_SPACE_STATE; break; default : /* Do nothing */ } } else if (state == FIRST_SPACE_STATE) { if (currentChar == ' ') { state = SECOND_SPACE_STATE; } else { /* Handle the punctuation in middle of sentence case */ state = TERMINATOR_SEARCH_STATE; } } else if (state == SECOND_SPACE_STATE) { if (currentChar == ' ') { numSentences++; state = START_SEARCH_STATE; } else { /* It's not a sentence terminator if there's only 1 space */ state = TERMINATOR_SEARCH_STATE; } } currentChar = fget(fp); } printf("There were %d sentences\n", numSentences);