Building an Index

In yet another instance of “It’s free for a reason”, CSharpQuery doesn’t come with any kind of tool to build your index for you. Instead you have to make a small program to build the index. This can also be advantageous. For example, in my C# code I combine different rows from various track tables. This way even though the composer is part of a different table, typing their name in will show all of the tracks for that composer.

public static void UpdateCSharpFullTextSearchIndex(
    IDataStore con, int langId, CultureInfo cultureInfo, 
    string cSharpQueryIndexDirectory) {
    
    // Quicksearch SQL
    string quicksearchSql = @"
        SELECT 
            t.TrackID,
            p1.Text + ' ' + -- TrackName
            p2.Text + ' ' + -- TrackDescription
            p3.Text + ' ' + -- AlbumName
            p4.Text + ' ' + -- LibraryName
            ar.ArtistName + ' ' +
            t.Publisher as IndexText
        FROM Track t 
        INNER JOIN Album a 
            ON t.AlbumID = a.AlbumID
        INNER JOIN RecordLabel r 
            ON a.RecordLabelID = r.RecordLabelID
        INNER JOIN Artist ar 
            ON t.ArtistID = ar.ArtistID

        INNER JOIN Phrase p1 
            ON t.Title_Dict = p1.DictionaryID 
            AND p1.LanguageID=@LangID
        INNER JOIN Phrase p2 
            ON t.Description_Dict = p2.DictionaryID 
            AND p2.LanguageID=@LangID
        INNER JOIN Phrase p3 
            ON a.AlbumName_Dict = p3.DictionaryID 
            AND p3.LanguageID=@LangID
        INNER JOIN Phrase p4 
            ON r.RecordLabelName_Dict = p4.DictionaryID 
            AND p4.LanguageID=@LangID";

    SqlConnection conn = Con(con).Connection as SqlConnection;
    try {
        conn.Open();
        SqlCommand cmd = new SqlCommand(quicksearchSql, conn);
        cmd.CommandType = System.Data.CommandType.Text;
        cmd.Parameters.AddWithValue("@LangID", langId);
        SqlDataReader rdr = cmd.ExecuteReader();

        SQLServerIndexCreator creater = new SQLServerIndexCreator();
        
        // Quicksearch Index
        creater.CreateIndex(
            "QuickSearch", // The name of the index
            cSharpQueryIndexDirectory, // Index Dir
            rdr, // An open Data Reader
            cultureInfo, // The culture info for this index
            "TrackID", // The [Key] (int) column of the index
            "IndexText"); // The [Value] column of the index
        rdr.Close();
    } finally {
        if (conn != null && 
            conn.State == System.Data.ConnectionState.Open)
            conn.Close();
    }
}

Index Creation Settings Files


As you can see in this index creation example, we concatenate the track name, track description, album name, and library name for the quick search index. This will then create the .index file like “IndexQuickSearch.en-US.index”. You will notice the file is in the format of “Indexindex name.culture code.index”. It is important to have a few things in place before you try this. The following files should exist by default:
  • Invalid Chars.txt – Contains invalid chars like [tab]~{}[CR][LF], etc.
  • Noisewords.global.txt – Contains words that are not useful to index like [a-z], “and”, “the”, etc.
  • Substitutions.global.txt – Contains a list of substitutions you wish to make, this is usually used to indicate what symbols break words and which ones do not. For example: “:= “ means that we’re going to substitute the “:” sign for a blank space.
  • Thesaurus.global.xml – The thesaurus contains synonyms and compound words. NOTE: if you use the compound word functionality, the compound term must come second.
  • WhiteSpace.global.txt – This file tells the work breaker which chars are whitespace so those can be safely used to split the word terms.

These files all work together to help create a better index. You will also notice that the convention is “.global.txt” for these files. That is because for each culture you will want to specialize these files for these languages. So you can have the file “WhiteSpace.en.txt” and “WhiteSpace.en-us.txt” etc. The global lists are merged with the list for a specific language so they are global for all languages.

Congratulations, you now have an index!

Last edited Apr 1, 2010 at 9:04 PM by NathanZaugg, version 2

Comments

No comments yet.