1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Anyone out there fluent in c Programming?

Discussion in 'General' started by dusty20, Nov 3, 2008.

  1. dusty20

    dusty20 #97 North Central Ex.

    I'm in a c programming class right now working on a project and have a couple questions. I have to make a program that will open a txt document and read through it twice once to count the number of words and a second time to count the number of sentences. Right now I can get it to read the number of words and print to the terminal screen but i am not sure how i can get it to read a sentence and know when to stop (Starts with a capital letter or a number and ends with a terminator with two spaces after).

    Can anyone help???
     
  2. darylbowden

    darylbowden Well-Known Member

    I'm not fluent in C, but most languages have string functions that will split strings at a given point. In PHP you have explode, in JS and AS you have split, and there's tons of others.

    Probably the easiest way though would be to use regular expressions. Once you set up a regex that fits your parameters, you can test against that and have it return those that meet the conditions.

    Hope that helps a bit.
     
  3. Smilodon

    Smilodon Wannabe

    It doesn't really seem to be a "C" problem. If you are as far as counting words, you should have everything you need to count sentences. As you stated, you need to decide what the rules are to define a "sentence".

    Implementing it with the code you currently have to count words shouldn't be much of a change.

    There are many programming boards out there than can help with specific questions (which function to use, etc.). To go further would require looking at the source code itself, and that is a no-no if you are doing classwork.
     
  4. gixxernaut

    gixxernaut Hold my beer & watch this

    If all sentences end with a period then why not use the period to delineate the end of each sentence?

    Code:
    FILE * f;
    char szBuff[512];
    int nLen;
    int nSentences;
    
    // Assume f is the open file...
    
    nSentences = 0;
    rewind(f);
    
    for(;;)
    {
        char * c;
        c = fgets(szBuff, sizeof(szBuff) - sizeof(char), f);
        if(c)
        {
            while(*c)
            {
                if(*c++ == '.')
                    nSentences++;
            }
        }
        else
        {
            break;
        }
    }
    
    printf("There are %d sentences.\n", nSentences);
    I guess in retrospect you'd need to account for periods used inside sentences.
     
  5. peakpowersports

    peakpowersports Well-Known Member

    You'll need seperate if statments for things such as question marks. I think its going to be more of a If And statment. If there are two spaces before a character start, end on period or question mark ect. Its been alone time, might have to break out the books ;)
     
  6. kz2zx

    kz2zx zx2gsxr2zx

    Those are library functions <strlib.h>, and I'm guessing the point of the class is to show him how to do it himself, and that the libraries are easier...

    for the question, here's the pseudocode:

    Rewind the file (set fp to start);
    for (inSent=sentCnt=0; !feof(fp); curChar = read(fp) ) {
    if (!inSent && ((curChar>='A') && (curChar<='Z')) {
    inSent = 1;
    sentCnt++;
    } /*if not already in a sentence and the current char is uppercase Alpha */

    if (inSent && (curChar == ' ') && (lastChar == ' ') && (last2Char == '.')) inSent = 0;

    last2Char = lastChar;
    lastChar = curChar;
    } /* for loop over file */

    I left out the variable initialization and file management.

    your code may look different depending on tons of stuff.


    Edit: on reread, if the sentence can end in other punctuation, I'd replace the (last2Char == '.') with a function IsPunctuation(last2Char), and use a switch statement in that function to test for various characters.
     
  7. dusty20

    dusty20 #97 North Central Ex.

    yea the period ending all of the sentences won't really work because if you have a number which according to this assignment includes numbers so 19.87 could be a word. i don't understand how to make the if and statement work (this is my first programming class) I know what needs to happen I just am not all that sure of how to get it into code.

    a capital letter or number can start a sentence
    a period, question mark, or exclamation mark followed by a space and tab or double space or space and end of line.

    so far I have code to read the character and start a sentance and i can figure out how to have it pick up a period, question mark, or exclamation mark but I do not know how to have code that knows there is a period with two spaces after it to end a sentence.
     
  8. dusty20

    dusty20 #97 North Central Ex.

    Thanks Kx2zx That does help

    so if i understand this correctly the statement-

    if (inSent && (curChar == ' ') && (lastChar == ' ') && (last2Char == '.')) inSent = 0;

    will make sure that there are two spaces after the puncuation mark to end a sentance?
    I should be able to change the code and know that it can use either two spaces or space and EOL or space and TAB, and i can figure out that it can be eith a '.', '?', or '!'.
     
  9. dusty20

    dusty20 #97 North Central Ex.

    Yea the only libraries we are using right now are stdio.h and math.h
     
  10. kz2zx

    kz2zx zx2gsxr2zx


    Yeah. Have fun!
     
  11. Mr Sunshine

    Mr Sunshine Banned

    I would listen to kz2zx. He's got the cryptic C style of writing code down which means he's done a little bit of C programming. :) (btw: I hate that style but like everything...its a style)

    One note...a sentence doesn't always start with a capital letter as proper names are capitalized mid sentence. You really need to go off of the punctuation and the spacing. Don't forget though sentences can end with a quote and you put the sentence punctuation inside the quote like: "blah blah blah?"

    I doubt though your instructor expects you to get all corner cases but just the obvious one.

    Also you can write the code to do it all in one loop and I'd use the switch...case syntax to determine where sentences split. But if you haven't been introduced to that then just stick with if...else statements.
     
  12. peakpowersports

    peakpowersports Well-Known Member

    Your going to waint to use the two spaces to begin a sentance as well. Because you'd count things twice. For instance: "At the movies, I bought two drinks and some candy and a box of 100 Goobers." That would count 4 times if you just go off starting a sentance with a cap. letter or number.
     
  13. kz2zx

    kz2zx zx2gsxr2zx

    I can do cryptic C++, too!

    I can't tell you how many times I've looked at my own C++ code, fully commented (with paragraphs and UML diagrams) and said "WTF just happened here?"

    It's worse when you use UML and state charts to generate code (Rhapsody...)
     
  14. Mr Sunshine

    Mr Sunshine Banned

    I'll tell you what happened....you were drinking too much when you wrote it the first time and when you went to go back and review it...you forgot you had to be drunk to understand it. Next time make sure you put in your comments,

    // Note: Be sure to consume at least a 6-pack of beer
    // before reading this next section or it won't make sense


    :up:
     
  15. Schitzo42

    Schitzo42 dweeb

    Or some of my code ....

    // magic happens...

    ...code code code ...

    // end magic happens..


    -steve
     
  16. kz2zx

    kz2zx zx2gsxr2zx

    A coworker's Jewish. She put this comment in once, after a method she borrowed:

    // and a great miracle happened there
     
  17. gixxernaut

    gixxernaut Hold my beer & watch this

    In order for the routine to be really accurate and versatile it would have to account for carriage returns and line feeds.

    Any routine that relies on a period, question mark or exclamation point followed by two spaces ending a sentence is going to miss sentences that are followed by a <cr><lf> pair. It is also possible for sentences to end with a period and only one space. On this BBS as an example you'll notice that posts are stripped of their extra spaces after periods. If you were to copy text from this post as you're reading it now, paste it to a text file and run it through kz2zx's routine above it would count zero sentences.

    Not saying mine's better, just pointing out that depending on the complexity of the sentences being evaluated a parsing routine can get very complicated in a hurry.

    And leave us not forget the possibility that someone might end a sentence with three exclamation points!!! Or could they toss something as unconventional as exclamation points and question marks into the same sentence ending punctuation nightmare?!? The possibilities could go on and on... ;)
     
  18. Knarf Legna

    Knarf Legna I am not Gary Hoover

    Dusty, IMO no need to over-complicate the solution - all you need to do to count a sentence is recognize that you have two sentence termination characters in a row. Don't have to be concerned with the start of a sentence or whether you're in a sentence, as recognizing the end of a sentence is sufficient.

    This should work, you can fine tune it a little as needed for the complete universe of sentence termination character sequences including multiple spaces, EOL, etc. All it does as presented is recognize that we've run into two sentence termination characters in a row, i.e., a period or a question mark followed immediately by a space. fopen(), etc. obviously left out, but you should be able to follow easily.

    Code:
    FILE * fpInput;
    char cChar;
    int bFirstTermChar;		/* 1 == found first termination character in   	*/
    				/* a sequence of characters in the input	*/
    				/* 0 == have not found first term character	*/
    int nSentences;
    
    bFirstTermChar = 0;
    nSentences = 0;
    do {
    	cChar = fgetc(fpInput);					
    	switch (cChar)
    	{
          		case '.':					
          		case '?':		
    			bFirstTermChar = 1;
    			break;
    		case ' ':					
    			if (bFirstTermChar) {
                                ++nSentences;
    		             bFirstTermChar = 0;
    		        }	
    			break;
    		default:
    			If (bFirstTermChar) bFirstTermChar = 0;
    	}   		
       } while (cChar != EOF);
    
     
  19. dusty20

    dusty20 #97 North Central Ex.

    The only reason that I am using a capital to start it is because it is in the program description. the way that i have been doing it is once you are in a sentence it is true therefore even though you have 4 starters (A, I, 100, and G) since it already true it will not add a sentence until it hits the terminator, and after the terminator it needs to make the statement in a sentence FALSE.
     
  20. hairu

    hairu Well-Known Member

    Here's an approach that handles the requirements of starting on a capital letter, not breaking on punctuation in a sentence, and only counting sentences that end with 2 spaces.

    Code:
    #include <ctype.h>
    #define START_SEARCH_STATE 0
    #define TERMINATOR_SEARCH_STATE 1
    #define FIRST_SPACE_STATE 2
    #define SECOND_SPACE_STATE 3
    
    ....
    int state;
    FILE * fp;
    char currentChar;
    int numSentences;
    
    state = START_SEARCH_STATE;
    numSentences = 0;
    currentChar = fgetc(fp);
    
    while(currentChar != EOF)
    {
    	
    	if (state == START_SEARCH_STATE)
    	{
    		if (isalnum(currentChar) && isupper(currentChar))
    		{
    			state = TERMINATOR_SEARCH_STATE;
    		}
    	}
    	else if (state == TERMINATOR_SEARCH_STATE)
    	{
    		switch(currentChar)
    		{
    		case '.' :
    		case '!' :
    		case '?' :
    			state = FIRST_SPACE_STATE;
    			break;
    		default :
    			/* Do nothing */
    		}	
    	}
    	else if (state == FIRST_SPACE_STATE)
    	{
    		if (currentChar == ' ')
    		{
    			state = SECOND_SPACE_STATE;
    		}
    		else
    		{
    			/* Handle the punctuation in middle of sentence case */
    			state = TERMINATOR_SEARCH_STATE;
    		}
    	}
    	else if (state == SECOND_SPACE_STATE)
    	{
    		if (currentChar == ' ')
    		{
    			numSentences++;
    			state = START_SEARCH_STATE;
    		}
    		else
    		{
    			/* It's not a sentence terminator if there's only 1 space */
    			state = TERMINATOR_SEARCH_STATE;
    		}
            }
    
            currentChar = fget(fp);
    
    }
    
    
    printf("There were %d sentences\n", numSentences);
    
    
     

Share This Page