creating array of bad names to check and replace in c#
I'm looking to create a method that loops through an list and replaces with matched values with a new value. I have something working below but it really doesnt follow the DRY principal and looks ugly.
How could I create a dictionary of value pairs that would hold my data of values to match and replace?
var match = acreData.data;
foreach(var i in match)
{
if (i.county_name == "DE KALB")
{
i.county_name = "DEKALB";
}
if (i.county_name == "DU PAGE")
{
i.county_name = "DUPAGE";
}
}
c# .net
add a comment |
I'm looking to create a method that loops through an list and replaces with matched values with a new value. I have something working below but it really doesnt follow the DRY principal and looks ugly.
How could I create a dictionary of value pairs that would hold my data of values to match and replace?
var match = acreData.data;
foreach(var i in match)
{
if (i.county_name == "DE KALB")
{
i.county_name = "DEKALB";
}
if (i.county_name == "DU PAGE")
{
i.county_name = "DUPAGE";
}
}
c# .net
6
Start withnew Dictionary..
, add values to it, and then useTryGetValue
.
– user2864740
Nov 21 '18 at 0:03
That said, in this case, would it not be sufficient to "remove white space"? If such is valid/sufficient depends on the entire problem space. One issue with the dictionary (or if/else approach) is that it is a whitelisting whereas removing white space is, for better or worse, more generalized.. (and it might be the case that the two can be used in tandem: eg. remove white space and perform alias mappings).
– user2864740
Nov 21 '18 at 0:04
2
How could I create a dictionary of value pairs MSDN is an excellent resource.. Dictionary classes are thoroughly documented and have example uses.
– radarbob
Nov 21 '18 at 0:08
Depending on where all the data resides, I have sometimes employed an Alias system where there is an official name and then as many aliases as needed. Then at as low of level as possible, a detected alias is replaced with the official name
– None of the Above
Nov 21 '18 at 0:08
1
Also future proof this, by storing this in a db, or at least a file, so you dont have to rebuild your code all the time, load it up in to memory as needed
– Michael Randall
Nov 21 '18 at 0:18
add a comment |
I'm looking to create a method that loops through an list and replaces with matched values with a new value. I have something working below but it really doesnt follow the DRY principal and looks ugly.
How could I create a dictionary of value pairs that would hold my data of values to match and replace?
var match = acreData.data;
foreach(var i in match)
{
if (i.county_name == "DE KALB")
{
i.county_name = "DEKALB";
}
if (i.county_name == "DU PAGE")
{
i.county_name = "DUPAGE";
}
}
c# .net
I'm looking to create a method that loops through an list and replaces with matched values with a new value. I have something working below but it really doesnt follow the DRY principal and looks ugly.
How could I create a dictionary of value pairs that would hold my data of values to match and replace?
var match = acreData.data;
foreach(var i in match)
{
if (i.county_name == "DE KALB")
{
i.county_name = "DEKALB";
}
if (i.county_name == "DU PAGE")
{
i.county_name = "DUPAGE";
}
}
c# .net
c# .net
asked Nov 21 '18 at 0:02
dbolligdbollig
413
413
6
Start withnew Dictionary..
, add values to it, and then useTryGetValue
.
– user2864740
Nov 21 '18 at 0:03
That said, in this case, would it not be sufficient to "remove white space"? If such is valid/sufficient depends on the entire problem space. One issue with the dictionary (or if/else approach) is that it is a whitelisting whereas removing white space is, for better or worse, more generalized.. (and it might be the case that the two can be used in tandem: eg. remove white space and perform alias mappings).
– user2864740
Nov 21 '18 at 0:04
2
How could I create a dictionary of value pairs MSDN is an excellent resource.. Dictionary classes are thoroughly documented and have example uses.
– radarbob
Nov 21 '18 at 0:08
Depending on where all the data resides, I have sometimes employed an Alias system where there is an official name and then as many aliases as needed. Then at as low of level as possible, a detected alias is replaced with the official name
– None of the Above
Nov 21 '18 at 0:08
1
Also future proof this, by storing this in a db, or at least a file, so you dont have to rebuild your code all the time, load it up in to memory as needed
– Michael Randall
Nov 21 '18 at 0:18
add a comment |
6
Start withnew Dictionary..
, add values to it, and then useTryGetValue
.
– user2864740
Nov 21 '18 at 0:03
That said, in this case, would it not be sufficient to "remove white space"? If such is valid/sufficient depends on the entire problem space. One issue with the dictionary (or if/else approach) is that it is a whitelisting whereas removing white space is, for better or worse, more generalized.. (and it might be the case that the two can be used in tandem: eg. remove white space and perform alias mappings).
– user2864740
Nov 21 '18 at 0:04
2
How could I create a dictionary of value pairs MSDN is an excellent resource.. Dictionary classes are thoroughly documented and have example uses.
– radarbob
Nov 21 '18 at 0:08
Depending on where all the data resides, I have sometimes employed an Alias system where there is an official name and then as many aliases as needed. Then at as low of level as possible, a detected alias is replaced with the official name
– None of the Above
Nov 21 '18 at 0:08
1
Also future proof this, by storing this in a db, or at least a file, so you dont have to rebuild your code all the time, load it up in to memory as needed
– Michael Randall
Nov 21 '18 at 0:18
6
6
Start with
new Dictionary..
, add values to it, and then use TryGetValue
.– user2864740
Nov 21 '18 at 0:03
Start with
new Dictionary..
, add values to it, and then use TryGetValue
.– user2864740
Nov 21 '18 at 0:03
That said, in this case, would it not be sufficient to "remove white space"? If such is valid/sufficient depends on the entire problem space. One issue with the dictionary (or if/else approach) is that it is a whitelisting whereas removing white space is, for better or worse, more generalized.. (and it might be the case that the two can be used in tandem: eg. remove white space and perform alias mappings).
– user2864740
Nov 21 '18 at 0:04
That said, in this case, would it not be sufficient to "remove white space"? If such is valid/sufficient depends on the entire problem space. One issue with the dictionary (or if/else approach) is that it is a whitelisting whereas removing white space is, for better or worse, more generalized.. (and it might be the case that the two can be used in tandem: eg. remove white space and perform alias mappings).
– user2864740
Nov 21 '18 at 0:04
2
2
How could I create a dictionary of value pairs MSDN is an excellent resource.. Dictionary classes are thoroughly documented and have example uses.
– radarbob
Nov 21 '18 at 0:08
How could I create a dictionary of value pairs MSDN is an excellent resource.. Dictionary classes are thoroughly documented and have example uses.
– radarbob
Nov 21 '18 at 0:08
Depending on where all the data resides, I have sometimes employed an Alias system where there is an official name and then as many aliases as needed. Then at as low of level as possible, a detected alias is replaced with the official name
– None of the Above
Nov 21 '18 at 0:08
Depending on where all the data resides, I have sometimes employed an Alias system where there is an official name and then as many aliases as needed. Then at as low of level as possible, a detected alias is replaced with the official name
– None of the Above
Nov 21 '18 at 0:08
1
1
Also future proof this, by storing this in a db, or at least a file, so you dont have to rebuild your code all the time, load it up in to memory as needed
– Michael Randall
Nov 21 '18 at 0:18
Also future proof this, by storing this in a db, or at least a file, so you dont have to rebuild your code all the time, load it up in to memory as needed
– Michael Randall
Nov 21 '18 at 0:18
add a comment |
5 Answers
5
active
oldest
votes
In your question, you can try to use linq and Replace
to make it.
var match = acreData.data.ToList();
match.ForEach(x =>
x.county_name = x.county_name.Replace(" ", "")
);
or you can try to create a mapper table to let your data mapper with your value. as @user2864740 say.
Dictionary<string, string> dict = new Dictionary<string, string>();
dict.Add("DE KALB", "DEKALB");
dict.Add("DU PAGE", "DUPAGE");
var match = acreData.data;
string val = string.Empty;
foreach (var i in match)
{
if (dict.TryGetValue(i.county_name, out val))
i.county_name = val;
}
the mapper table worked perfect!
– dbollig
Nov 21 '18 at 13:02
add a comment |
If this were my problem and it is possible a county could have more than one common misspelling I would create a class to hold the correct name and the common misspellings. The you could easily determine if the misspelling exists and correct if. Something like this:
public class County
{
public string CountyName { get; set; }
public List<string> CommonMisspellings { get; set; }
public County()
{
CommonMisspellings = new List<string>();
}
}
Usage:
//most likely populate from db
var counties = new List<County>();
var dekalb = new County { CountyName = "DEKALB" };
dekalb.CommonMisspellings.Add("DE KALB");
dekalb.CommonMisspellings.Add("DE_KALB");
var test = "DE KALB";
if (counties.Any(c => c.CommonMisspellings.Contains(test)))
{
test = counties.First(c => c.CommonMisspellings.Contains(test)).CountyName;
}
This does not scale well. Hopefully number of misspellings is low and it would be less than thousand times slower than necessary.
– Antonín Lejsek
Nov 21 '18 at 1:44
That's like your opinion man - it would work just fine.
– Kevin
Nov 21 '18 at 13:59
add a comment |
If you are simply replacing all words in a list containing space without space, then can use below:
var newList = match.ConvertAll(word => word.Replace(" ", ""));
ConvertAll returns a new list.
Also, I suggest not to use variable names like i, j, k etc..but use temp etc.
Sample code below:
var oldList = new List<string> {"DE KALB", "DE PAGE"};
var newList = oldList.ConvertAll(word => word.Replace(" ", ""));
add a comment |
We can try removing all the characters but letters and apostroph (Cote d'Ivoire
has it)
...
i.country_name = String.Concat(i.country_name
.Where(c => char.IsLetter(c) || c == '''));
...
add a comment |
I made a comment under answer of @Kevin and it seems it needs further explanation. Sequential searching in list does not scale well and unfortunately for Kevin, that is not my opinion, asymptotic computational complexity is math. While searching in dictionary is more or less O(1), searching in list is O(n). To show a practical impact for solution with 100 countries with 100 misspellings each, lets make a test
public class Country
{
public string CountryName { get; set; }
public List<string> CommonMisspellings { get; set; }
public Country()
{
CommonMisspellings = new List<string>();
}
}
static void Main()
{
var counties = new List<Country>();
Dictionary<string, string> dict = new Dictionary<string, string>();
Random rnd = new Random();
List<string> allCountryNames = new List<string>();
List<string> allMissNames = new List<string>();
for (int state = 0; state < 100; ++state)
{
string countryName = state.ToString() + rnd.NextDouble();
allCountryNames.Add(countryName);
var country = new Country { CountryName = countryName };
counties.Add(country);
for (int miss = 0; miss < 100; ++miss)
{
string missname = countryName + miss;
allMissNames.Add(missname);
country.CommonMisspellings.Add(missname);
dict.Add(missname, countryName);
}
}
List<string> testNames = new List<string>();
for (int i = 0; i < 100000; ++i)
{
if (rnd.Next(20) == 1)
{
testNames.Add(allMissNames[rnd.Next(allMissNames.Count)]);
}
else
{
testNames.Add(allCountryNames[rnd.Next(allCountryNames.Count)]);
}
}
System.Diagnostics.Stopwatch st = new System.Diagnostics.Stopwatch();
st.Start();
List<string> repairs = new List<string>();
foreach (var test in testNames)
{
if (counties.Any(c => c.CommonMisspellings.Contains(test)))
{
repairs.Add(counties.First(c => c.CommonMisspellings.Contains(test)).CountryName);
}
}
st.Stop();
Console.WriteLine("List approach: " + st.ElapsedMilliseconds.ToString() + "ms");
st = new System.Diagnostics.Stopwatch();
st.Start();
List<string> repairsDict = new List<string>();
foreach (var test in testNames)
{
if (dict.TryGetValue(test, out var val))
{
repairsDict.Add(val);
}
}
st.Stop();
Console.WriteLine("Dict approach: " + st.ElapsedMilliseconds.ToString() + "ms");
Console.WriteLine("Repaired count: " + repairs.Count
+ ", check " + (repairs.SequenceEqual(repairsDict) ? "OK" : "ERROR"));
Console.ReadLine();
}
And the result is
List approach: 7264ms
Dict approach: 4ms
Repaired count: 4968, check OK
List approach is about 1800x slower, actually more the thousand times slower in this case. The results are as expected. If that is a problem is another question, it depends on concrete usage pattern in concrete application and is out of scope of this post.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53403438%2fcreating-array-of-bad-names-to-check-and-replace-in-c-sharp%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
5 Answers
5
active
oldest
votes
5 Answers
5
active
oldest
votes
active
oldest
votes
active
oldest
votes
In your question, you can try to use linq and Replace
to make it.
var match = acreData.data.ToList();
match.ForEach(x =>
x.county_name = x.county_name.Replace(" ", "")
);
or you can try to create a mapper table to let your data mapper with your value. as @user2864740 say.
Dictionary<string, string> dict = new Dictionary<string, string>();
dict.Add("DE KALB", "DEKALB");
dict.Add("DU PAGE", "DUPAGE");
var match = acreData.data;
string val = string.Empty;
foreach (var i in match)
{
if (dict.TryGetValue(i.county_name, out val))
i.county_name = val;
}
the mapper table worked perfect!
– dbollig
Nov 21 '18 at 13:02
add a comment |
In your question, you can try to use linq and Replace
to make it.
var match = acreData.data.ToList();
match.ForEach(x =>
x.county_name = x.county_name.Replace(" ", "")
);
or you can try to create a mapper table to let your data mapper with your value. as @user2864740 say.
Dictionary<string, string> dict = new Dictionary<string, string>();
dict.Add("DE KALB", "DEKALB");
dict.Add("DU PAGE", "DUPAGE");
var match = acreData.data;
string val = string.Empty;
foreach (var i in match)
{
if (dict.TryGetValue(i.county_name, out val))
i.county_name = val;
}
the mapper table worked perfect!
– dbollig
Nov 21 '18 at 13:02
add a comment |
In your question, you can try to use linq and Replace
to make it.
var match = acreData.data.ToList();
match.ForEach(x =>
x.county_name = x.county_name.Replace(" ", "")
);
or you can try to create a mapper table to let your data mapper with your value. as @user2864740 say.
Dictionary<string, string> dict = new Dictionary<string, string>();
dict.Add("DE KALB", "DEKALB");
dict.Add("DU PAGE", "DUPAGE");
var match = acreData.data;
string val = string.Empty;
foreach (var i in match)
{
if (dict.TryGetValue(i.county_name, out val))
i.county_name = val;
}
In your question, you can try to use linq and Replace
to make it.
var match = acreData.data.ToList();
match.ForEach(x =>
x.county_name = x.county_name.Replace(" ", "")
);
or you can try to create a mapper table to let your data mapper with your value. as @user2864740 say.
Dictionary<string, string> dict = new Dictionary<string, string>();
dict.Add("DE KALB", "DEKALB");
dict.Add("DU PAGE", "DUPAGE");
var match = acreData.data;
string val = string.Empty;
foreach (var i in match)
{
if (dict.TryGetValue(i.county_name, out val))
i.county_name = val;
}
edited Nov 21 '18 at 0:24
answered Nov 21 '18 at 0:17


D-ShihD-Shih
25.8k61531
25.8k61531
the mapper table worked perfect!
– dbollig
Nov 21 '18 at 13:02
add a comment |
the mapper table worked perfect!
– dbollig
Nov 21 '18 at 13:02
the mapper table worked perfect!
– dbollig
Nov 21 '18 at 13:02
the mapper table worked perfect!
– dbollig
Nov 21 '18 at 13:02
add a comment |
If this were my problem and it is possible a county could have more than one common misspelling I would create a class to hold the correct name and the common misspellings. The you could easily determine if the misspelling exists and correct if. Something like this:
public class County
{
public string CountyName { get; set; }
public List<string> CommonMisspellings { get; set; }
public County()
{
CommonMisspellings = new List<string>();
}
}
Usage:
//most likely populate from db
var counties = new List<County>();
var dekalb = new County { CountyName = "DEKALB" };
dekalb.CommonMisspellings.Add("DE KALB");
dekalb.CommonMisspellings.Add("DE_KALB");
var test = "DE KALB";
if (counties.Any(c => c.CommonMisspellings.Contains(test)))
{
test = counties.First(c => c.CommonMisspellings.Contains(test)).CountyName;
}
This does not scale well. Hopefully number of misspellings is low and it would be less than thousand times slower than necessary.
– Antonín Lejsek
Nov 21 '18 at 1:44
That's like your opinion man - it would work just fine.
– Kevin
Nov 21 '18 at 13:59
add a comment |
If this were my problem and it is possible a county could have more than one common misspelling I would create a class to hold the correct name and the common misspellings. The you could easily determine if the misspelling exists and correct if. Something like this:
public class County
{
public string CountyName { get; set; }
public List<string> CommonMisspellings { get; set; }
public County()
{
CommonMisspellings = new List<string>();
}
}
Usage:
//most likely populate from db
var counties = new List<County>();
var dekalb = new County { CountyName = "DEKALB" };
dekalb.CommonMisspellings.Add("DE KALB");
dekalb.CommonMisspellings.Add("DE_KALB");
var test = "DE KALB";
if (counties.Any(c => c.CommonMisspellings.Contains(test)))
{
test = counties.First(c => c.CommonMisspellings.Contains(test)).CountyName;
}
This does not scale well. Hopefully number of misspellings is low and it would be less than thousand times slower than necessary.
– Antonín Lejsek
Nov 21 '18 at 1:44
That's like your opinion man - it would work just fine.
– Kevin
Nov 21 '18 at 13:59
add a comment |
If this were my problem and it is possible a county could have more than one common misspelling I would create a class to hold the correct name and the common misspellings. The you could easily determine if the misspelling exists and correct if. Something like this:
public class County
{
public string CountyName { get; set; }
public List<string> CommonMisspellings { get; set; }
public County()
{
CommonMisspellings = new List<string>();
}
}
Usage:
//most likely populate from db
var counties = new List<County>();
var dekalb = new County { CountyName = "DEKALB" };
dekalb.CommonMisspellings.Add("DE KALB");
dekalb.CommonMisspellings.Add("DE_KALB");
var test = "DE KALB";
if (counties.Any(c => c.CommonMisspellings.Contains(test)))
{
test = counties.First(c => c.CommonMisspellings.Contains(test)).CountyName;
}
If this were my problem and it is possible a county could have more than one common misspelling I would create a class to hold the correct name and the common misspellings. The you could easily determine if the misspelling exists and correct if. Something like this:
public class County
{
public string CountyName { get; set; }
public List<string> CommonMisspellings { get; set; }
public County()
{
CommonMisspellings = new List<string>();
}
}
Usage:
//most likely populate from db
var counties = new List<County>();
var dekalb = new County { CountyName = "DEKALB" };
dekalb.CommonMisspellings.Add("DE KALB");
dekalb.CommonMisspellings.Add("DE_KALB");
var test = "DE KALB";
if (counties.Any(c => c.CommonMisspellings.Contains(test)))
{
test = counties.First(c => c.CommonMisspellings.Contains(test)).CountyName;
}
answered Nov 21 '18 at 0:29
KevinKevin
1,5201711
1,5201711
This does not scale well. Hopefully number of misspellings is low and it would be less than thousand times slower than necessary.
– Antonín Lejsek
Nov 21 '18 at 1:44
That's like your opinion man - it would work just fine.
– Kevin
Nov 21 '18 at 13:59
add a comment |
This does not scale well. Hopefully number of misspellings is low and it would be less than thousand times slower than necessary.
– Antonín Lejsek
Nov 21 '18 at 1:44
That's like your opinion man - it would work just fine.
– Kevin
Nov 21 '18 at 13:59
This does not scale well. Hopefully number of misspellings is low and it would be less than thousand times slower than necessary.
– Antonín Lejsek
Nov 21 '18 at 1:44
This does not scale well. Hopefully number of misspellings is low and it would be less than thousand times slower than necessary.
– Antonín Lejsek
Nov 21 '18 at 1:44
That's like your opinion man - it would work just fine.
– Kevin
Nov 21 '18 at 13:59
That's like your opinion man - it would work just fine.
– Kevin
Nov 21 '18 at 13:59
add a comment |
If you are simply replacing all words in a list containing space without space, then can use below:
var newList = match.ConvertAll(word => word.Replace(" ", ""));
ConvertAll returns a new list.
Also, I suggest not to use variable names like i, j, k etc..but use temp etc.
Sample code below:
var oldList = new List<string> {"DE KALB", "DE PAGE"};
var newList = oldList.ConvertAll(word => word.Replace(" ", ""));
add a comment |
If you are simply replacing all words in a list containing space without space, then can use below:
var newList = match.ConvertAll(word => word.Replace(" ", ""));
ConvertAll returns a new list.
Also, I suggest not to use variable names like i, j, k etc..but use temp etc.
Sample code below:
var oldList = new List<string> {"DE KALB", "DE PAGE"};
var newList = oldList.ConvertAll(word => word.Replace(" ", ""));
add a comment |
If you are simply replacing all words in a list containing space without space, then can use below:
var newList = match.ConvertAll(word => word.Replace(" ", ""));
ConvertAll returns a new list.
Also, I suggest not to use variable names like i, j, k etc..but use temp etc.
Sample code below:
var oldList = new List<string> {"DE KALB", "DE PAGE"};
var newList = oldList.ConvertAll(word => word.Replace(" ", ""));
If you are simply replacing all words in a list containing space without space, then can use below:
var newList = match.ConvertAll(word => word.Replace(" ", ""));
ConvertAll returns a new list.
Also, I suggest not to use variable names like i, j, k etc..but use temp etc.
Sample code below:
var oldList = new List<string> {"DE KALB", "DE PAGE"};
var newList = oldList.ConvertAll(word => word.Replace(" ", ""));
edited Nov 21 '18 at 0:41
answered Nov 21 '18 at 0:34
GauravsaGauravsa
3,0621817
3,0621817
add a comment |
add a comment |
We can try removing all the characters but letters and apostroph (Cote d'Ivoire
has it)
...
i.country_name = String.Concat(i.country_name
.Where(c => char.IsLetter(c) || c == '''));
...
add a comment |
We can try removing all the characters but letters and apostroph (Cote d'Ivoire
has it)
...
i.country_name = String.Concat(i.country_name
.Where(c => char.IsLetter(c) || c == '''));
...
add a comment |
We can try removing all the characters but letters and apostroph (Cote d'Ivoire
has it)
...
i.country_name = String.Concat(i.country_name
.Where(c => char.IsLetter(c) || c == '''));
...
We can try removing all the characters but letters and apostroph (Cote d'Ivoire
has it)
...
i.country_name = String.Concat(i.country_name
.Where(c => char.IsLetter(c) || c == '''));
...
answered Nov 21 '18 at 8:23


Dmitry BychenkoDmitry Bychenko
107k1093133
107k1093133
add a comment |
add a comment |
I made a comment under answer of @Kevin and it seems it needs further explanation. Sequential searching in list does not scale well and unfortunately for Kevin, that is not my opinion, asymptotic computational complexity is math. While searching in dictionary is more or less O(1), searching in list is O(n). To show a practical impact for solution with 100 countries with 100 misspellings each, lets make a test
public class Country
{
public string CountryName { get; set; }
public List<string> CommonMisspellings { get; set; }
public Country()
{
CommonMisspellings = new List<string>();
}
}
static void Main()
{
var counties = new List<Country>();
Dictionary<string, string> dict = new Dictionary<string, string>();
Random rnd = new Random();
List<string> allCountryNames = new List<string>();
List<string> allMissNames = new List<string>();
for (int state = 0; state < 100; ++state)
{
string countryName = state.ToString() + rnd.NextDouble();
allCountryNames.Add(countryName);
var country = new Country { CountryName = countryName };
counties.Add(country);
for (int miss = 0; miss < 100; ++miss)
{
string missname = countryName + miss;
allMissNames.Add(missname);
country.CommonMisspellings.Add(missname);
dict.Add(missname, countryName);
}
}
List<string> testNames = new List<string>();
for (int i = 0; i < 100000; ++i)
{
if (rnd.Next(20) == 1)
{
testNames.Add(allMissNames[rnd.Next(allMissNames.Count)]);
}
else
{
testNames.Add(allCountryNames[rnd.Next(allCountryNames.Count)]);
}
}
System.Diagnostics.Stopwatch st = new System.Diagnostics.Stopwatch();
st.Start();
List<string> repairs = new List<string>();
foreach (var test in testNames)
{
if (counties.Any(c => c.CommonMisspellings.Contains(test)))
{
repairs.Add(counties.First(c => c.CommonMisspellings.Contains(test)).CountryName);
}
}
st.Stop();
Console.WriteLine("List approach: " + st.ElapsedMilliseconds.ToString() + "ms");
st = new System.Diagnostics.Stopwatch();
st.Start();
List<string> repairsDict = new List<string>();
foreach (var test in testNames)
{
if (dict.TryGetValue(test, out var val))
{
repairsDict.Add(val);
}
}
st.Stop();
Console.WriteLine("Dict approach: " + st.ElapsedMilliseconds.ToString() + "ms");
Console.WriteLine("Repaired count: " + repairs.Count
+ ", check " + (repairs.SequenceEqual(repairsDict) ? "OK" : "ERROR"));
Console.ReadLine();
}
And the result is
List approach: 7264ms
Dict approach: 4ms
Repaired count: 4968, check OK
List approach is about 1800x slower, actually more the thousand times slower in this case. The results are as expected. If that is a problem is another question, it depends on concrete usage pattern in concrete application and is out of scope of this post.
add a comment |
I made a comment under answer of @Kevin and it seems it needs further explanation. Sequential searching in list does not scale well and unfortunately for Kevin, that is not my opinion, asymptotic computational complexity is math. While searching in dictionary is more or less O(1), searching in list is O(n). To show a practical impact for solution with 100 countries with 100 misspellings each, lets make a test
public class Country
{
public string CountryName { get; set; }
public List<string> CommonMisspellings { get; set; }
public Country()
{
CommonMisspellings = new List<string>();
}
}
static void Main()
{
var counties = new List<Country>();
Dictionary<string, string> dict = new Dictionary<string, string>();
Random rnd = new Random();
List<string> allCountryNames = new List<string>();
List<string> allMissNames = new List<string>();
for (int state = 0; state < 100; ++state)
{
string countryName = state.ToString() + rnd.NextDouble();
allCountryNames.Add(countryName);
var country = new Country { CountryName = countryName };
counties.Add(country);
for (int miss = 0; miss < 100; ++miss)
{
string missname = countryName + miss;
allMissNames.Add(missname);
country.CommonMisspellings.Add(missname);
dict.Add(missname, countryName);
}
}
List<string> testNames = new List<string>();
for (int i = 0; i < 100000; ++i)
{
if (rnd.Next(20) == 1)
{
testNames.Add(allMissNames[rnd.Next(allMissNames.Count)]);
}
else
{
testNames.Add(allCountryNames[rnd.Next(allCountryNames.Count)]);
}
}
System.Diagnostics.Stopwatch st = new System.Diagnostics.Stopwatch();
st.Start();
List<string> repairs = new List<string>();
foreach (var test in testNames)
{
if (counties.Any(c => c.CommonMisspellings.Contains(test)))
{
repairs.Add(counties.First(c => c.CommonMisspellings.Contains(test)).CountryName);
}
}
st.Stop();
Console.WriteLine("List approach: " + st.ElapsedMilliseconds.ToString() + "ms");
st = new System.Diagnostics.Stopwatch();
st.Start();
List<string> repairsDict = new List<string>();
foreach (var test in testNames)
{
if (dict.TryGetValue(test, out var val))
{
repairsDict.Add(val);
}
}
st.Stop();
Console.WriteLine("Dict approach: " + st.ElapsedMilliseconds.ToString() + "ms");
Console.WriteLine("Repaired count: " + repairs.Count
+ ", check " + (repairs.SequenceEqual(repairsDict) ? "OK" : "ERROR"));
Console.ReadLine();
}
And the result is
List approach: 7264ms
Dict approach: 4ms
Repaired count: 4968, check OK
List approach is about 1800x slower, actually more the thousand times slower in this case. The results are as expected. If that is a problem is another question, it depends on concrete usage pattern in concrete application and is out of scope of this post.
add a comment |
I made a comment under answer of @Kevin and it seems it needs further explanation. Sequential searching in list does not scale well and unfortunately for Kevin, that is not my opinion, asymptotic computational complexity is math. While searching in dictionary is more or less O(1), searching in list is O(n). To show a practical impact for solution with 100 countries with 100 misspellings each, lets make a test
public class Country
{
public string CountryName { get; set; }
public List<string> CommonMisspellings { get; set; }
public Country()
{
CommonMisspellings = new List<string>();
}
}
static void Main()
{
var counties = new List<Country>();
Dictionary<string, string> dict = new Dictionary<string, string>();
Random rnd = new Random();
List<string> allCountryNames = new List<string>();
List<string> allMissNames = new List<string>();
for (int state = 0; state < 100; ++state)
{
string countryName = state.ToString() + rnd.NextDouble();
allCountryNames.Add(countryName);
var country = new Country { CountryName = countryName };
counties.Add(country);
for (int miss = 0; miss < 100; ++miss)
{
string missname = countryName + miss;
allMissNames.Add(missname);
country.CommonMisspellings.Add(missname);
dict.Add(missname, countryName);
}
}
List<string> testNames = new List<string>();
for (int i = 0; i < 100000; ++i)
{
if (rnd.Next(20) == 1)
{
testNames.Add(allMissNames[rnd.Next(allMissNames.Count)]);
}
else
{
testNames.Add(allCountryNames[rnd.Next(allCountryNames.Count)]);
}
}
System.Diagnostics.Stopwatch st = new System.Diagnostics.Stopwatch();
st.Start();
List<string> repairs = new List<string>();
foreach (var test in testNames)
{
if (counties.Any(c => c.CommonMisspellings.Contains(test)))
{
repairs.Add(counties.First(c => c.CommonMisspellings.Contains(test)).CountryName);
}
}
st.Stop();
Console.WriteLine("List approach: " + st.ElapsedMilliseconds.ToString() + "ms");
st = new System.Diagnostics.Stopwatch();
st.Start();
List<string> repairsDict = new List<string>();
foreach (var test in testNames)
{
if (dict.TryGetValue(test, out var val))
{
repairsDict.Add(val);
}
}
st.Stop();
Console.WriteLine("Dict approach: " + st.ElapsedMilliseconds.ToString() + "ms");
Console.WriteLine("Repaired count: " + repairs.Count
+ ", check " + (repairs.SequenceEqual(repairsDict) ? "OK" : "ERROR"));
Console.ReadLine();
}
And the result is
List approach: 7264ms
Dict approach: 4ms
Repaired count: 4968, check OK
List approach is about 1800x slower, actually more the thousand times slower in this case. The results are as expected. If that is a problem is another question, it depends on concrete usage pattern in concrete application and is out of scope of this post.
I made a comment under answer of @Kevin and it seems it needs further explanation. Sequential searching in list does not scale well and unfortunately for Kevin, that is not my opinion, asymptotic computational complexity is math. While searching in dictionary is more or less O(1), searching in list is O(n). To show a practical impact for solution with 100 countries with 100 misspellings each, lets make a test
public class Country
{
public string CountryName { get; set; }
public List<string> CommonMisspellings { get; set; }
public Country()
{
CommonMisspellings = new List<string>();
}
}
static void Main()
{
var counties = new List<Country>();
Dictionary<string, string> dict = new Dictionary<string, string>();
Random rnd = new Random();
List<string> allCountryNames = new List<string>();
List<string> allMissNames = new List<string>();
for (int state = 0; state < 100; ++state)
{
string countryName = state.ToString() + rnd.NextDouble();
allCountryNames.Add(countryName);
var country = new Country { CountryName = countryName };
counties.Add(country);
for (int miss = 0; miss < 100; ++miss)
{
string missname = countryName + miss;
allMissNames.Add(missname);
country.CommonMisspellings.Add(missname);
dict.Add(missname, countryName);
}
}
List<string> testNames = new List<string>();
for (int i = 0; i < 100000; ++i)
{
if (rnd.Next(20) == 1)
{
testNames.Add(allMissNames[rnd.Next(allMissNames.Count)]);
}
else
{
testNames.Add(allCountryNames[rnd.Next(allCountryNames.Count)]);
}
}
System.Diagnostics.Stopwatch st = new System.Diagnostics.Stopwatch();
st.Start();
List<string> repairs = new List<string>();
foreach (var test in testNames)
{
if (counties.Any(c => c.CommonMisspellings.Contains(test)))
{
repairs.Add(counties.First(c => c.CommonMisspellings.Contains(test)).CountryName);
}
}
st.Stop();
Console.WriteLine("List approach: " + st.ElapsedMilliseconds.ToString() + "ms");
st = new System.Diagnostics.Stopwatch();
st.Start();
List<string> repairsDict = new List<string>();
foreach (var test in testNames)
{
if (dict.TryGetValue(test, out var val))
{
repairsDict.Add(val);
}
}
st.Stop();
Console.WriteLine("Dict approach: " + st.ElapsedMilliseconds.ToString() + "ms");
Console.WriteLine("Repaired count: " + repairs.Count
+ ", check " + (repairs.SequenceEqual(repairsDict) ? "OK" : "ERROR"));
Console.ReadLine();
}
And the result is
List approach: 7264ms
Dict approach: 4ms
Repaired count: 4968, check OK
List approach is about 1800x slower, actually more the thousand times slower in this case. The results are as expected. If that is a problem is another question, it depends on concrete usage pattern in concrete application and is out of scope of this post.
answered Nov 22 '18 at 1:36
Antonín LejsekAntonín Lejsek
4,14721118
4,14721118
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53403438%2fcreating-array-of-bad-names-to-check-and-replace-in-c-sharp%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
6
Start with
new Dictionary..
, add values to it, and then useTryGetValue
.– user2864740
Nov 21 '18 at 0:03
That said, in this case, would it not be sufficient to "remove white space"? If such is valid/sufficient depends on the entire problem space. One issue with the dictionary (or if/else approach) is that it is a whitelisting whereas removing white space is, for better or worse, more generalized.. (and it might be the case that the two can be used in tandem: eg. remove white space and perform alias mappings).
– user2864740
Nov 21 '18 at 0:04
2
How could I create a dictionary of value pairs MSDN is an excellent resource.. Dictionary classes are thoroughly documented and have example uses.
– radarbob
Nov 21 '18 at 0:08
Depending on where all the data resides, I have sometimes employed an Alias system where there is an official name and then as many aliases as needed. Then at as low of level as possible, a detected alias is replaced with the official name
– None of the Above
Nov 21 '18 at 0:08
1
Also future proof this, by storing this in a db, or at least a file, so you dont have to rebuild your code all the time, load it up in to memory as needed
– Michael Randall
Nov 21 '18 at 0:18