What is the best data structure for storing a set of four (or more) values?












5














Say I have the following variables and its corresponding values which represents a record.



name = 'abc'
age = 23
weight = 60
height = 174


Please note that the value could be of different types (string, integer, float, reference-to-any-other-object, etc).



There will be many records (at least >100,000). Each and every record will be unique when all these four variables (actually its values) are put together. In other words, there exists no record with all 4 values are the same.



I am trying to find an efficient data structure in Python which will allow me to (store and) retrieve records based on any one of these variables in log(n) time complexity.



For example:



def retrieve(name=None,age=None,weight=None,height=None) 
if name is not None and age is None and weight is None and height is None:
/* get all records with the given name */
if name is None and age is not None and weight is None and height is None:
/* get all records with the given age */
....
return records


The way the retrieve should be called is as follows:



retrieve(name='abc') 


The above should return [{name:'abc', age:23, wight:50, height=175}, {name:'abc', age:28, wight:55, height=170}, etc]



retrieve(age=23) 


The above should return [{name:'abc', age:23, wight:50, height=175}, {name:'def', age:23, wight:65, height=180}, etc]



And, I may need to add one or two more variables to this record in future. For example, say, sex = 'm'. So, the retrieve function must be scalable.



So in short: Is there a data structure in Python which will allow storing a record with n number of columns (name, age, sex, weigh, height, etc) and retrieving records based on any (one) of the column in logarithmic (or ideally constant - O(1) look-up time) complexity?










share|improve this question
























  • Could you please justify the -1? It is a genuine programming question.
    – Sangeeth Saravanaraj
    Mar 14 '13 at 19:27










  • Maybe this will help you - wiki.python.org/moin/TimeComplexity ?
    – kgr
    Mar 14 '13 at 19:35










  • Why not using sql for this? Seems more appropiate. Python has builtin support for sqlite.
    – MGP
    Mar 14 '13 at 19:40










  • SQL will be slow, given OP mentions time complexity, he's probably not very happy about being tied by I/O.
    – kgr
    Mar 14 '13 at 19:50










  • Is this question about providing an implementation of such data structure, or is it about whether or not a ready-to-use implementation exists?
    – moooeeeep
    Mar 14 '13 at 19:55
















5














Say I have the following variables and its corresponding values which represents a record.



name = 'abc'
age = 23
weight = 60
height = 174


Please note that the value could be of different types (string, integer, float, reference-to-any-other-object, etc).



There will be many records (at least >100,000). Each and every record will be unique when all these four variables (actually its values) are put together. In other words, there exists no record with all 4 values are the same.



I am trying to find an efficient data structure in Python which will allow me to (store and) retrieve records based on any one of these variables in log(n) time complexity.



For example:



def retrieve(name=None,age=None,weight=None,height=None) 
if name is not None and age is None and weight is None and height is None:
/* get all records with the given name */
if name is None and age is not None and weight is None and height is None:
/* get all records with the given age */
....
return records


The way the retrieve should be called is as follows:



retrieve(name='abc') 


The above should return [{name:'abc', age:23, wight:50, height=175}, {name:'abc', age:28, wight:55, height=170}, etc]



retrieve(age=23) 


The above should return [{name:'abc', age:23, wight:50, height=175}, {name:'def', age:23, wight:65, height=180}, etc]



And, I may need to add one or two more variables to this record in future. For example, say, sex = 'm'. So, the retrieve function must be scalable.



So in short: Is there a data structure in Python which will allow storing a record with n number of columns (name, age, sex, weigh, height, etc) and retrieving records based on any (one) of the column in logarithmic (or ideally constant - O(1) look-up time) complexity?










share|improve this question
























  • Could you please justify the -1? It is a genuine programming question.
    – Sangeeth Saravanaraj
    Mar 14 '13 at 19:27










  • Maybe this will help you - wiki.python.org/moin/TimeComplexity ?
    – kgr
    Mar 14 '13 at 19:35










  • Why not using sql for this? Seems more appropiate. Python has builtin support for sqlite.
    – MGP
    Mar 14 '13 at 19:40










  • SQL will be slow, given OP mentions time complexity, he's probably not very happy about being tied by I/O.
    – kgr
    Mar 14 '13 at 19:50










  • Is this question about providing an implementation of such data structure, or is it about whether or not a ready-to-use implementation exists?
    – moooeeeep
    Mar 14 '13 at 19:55














5












5








5


5





Say I have the following variables and its corresponding values which represents a record.



name = 'abc'
age = 23
weight = 60
height = 174


Please note that the value could be of different types (string, integer, float, reference-to-any-other-object, etc).



There will be many records (at least >100,000). Each and every record will be unique when all these four variables (actually its values) are put together. In other words, there exists no record with all 4 values are the same.



I am trying to find an efficient data structure in Python which will allow me to (store and) retrieve records based on any one of these variables in log(n) time complexity.



For example:



def retrieve(name=None,age=None,weight=None,height=None) 
if name is not None and age is None and weight is None and height is None:
/* get all records with the given name */
if name is None and age is not None and weight is None and height is None:
/* get all records with the given age */
....
return records


The way the retrieve should be called is as follows:



retrieve(name='abc') 


The above should return [{name:'abc', age:23, wight:50, height=175}, {name:'abc', age:28, wight:55, height=170}, etc]



retrieve(age=23) 


The above should return [{name:'abc', age:23, wight:50, height=175}, {name:'def', age:23, wight:65, height=180}, etc]



And, I may need to add one or two more variables to this record in future. For example, say, sex = 'm'. So, the retrieve function must be scalable.



So in short: Is there a data structure in Python which will allow storing a record with n number of columns (name, age, sex, weigh, height, etc) and retrieving records based on any (one) of the column in logarithmic (or ideally constant - O(1) look-up time) complexity?










share|improve this question















Say I have the following variables and its corresponding values which represents a record.



name = 'abc'
age = 23
weight = 60
height = 174


Please note that the value could be of different types (string, integer, float, reference-to-any-other-object, etc).



There will be many records (at least >100,000). Each and every record will be unique when all these four variables (actually its values) are put together. In other words, there exists no record with all 4 values are the same.



I am trying to find an efficient data structure in Python which will allow me to (store and) retrieve records based on any one of these variables in log(n) time complexity.



For example:



def retrieve(name=None,age=None,weight=None,height=None) 
if name is not None and age is None and weight is None and height is None:
/* get all records with the given name */
if name is None and age is not None and weight is None and height is None:
/* get all records with the given age */
....
return records


The way the retrieve should be called is as follows:



retrieve(name='abc') 


The above should return [{name:'abc', age:23, wight:50, height=175}, {name:'abc', age:28, wight:55, height=170}, etc]



retrieve(age=23) 


The above should return [{name:'abc', age:23, wight:50, height=175}, {name:'def', age:23, wight:65, height=180}, etc]



And, I may need to add one or two more variables to this record in future. For example, say, sex = 'm'. So, the retrieve function must be scalable.



So in short: Is there a data structure in Python which will allow storing a record with n number of columns (name, age, sex, weigh, height, etc) and retrieving records based on any (one) of the column in logarithmic (or ideally constant - O(1) look-up time) complexity?







python data-structures lookup logarithm






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Jul 24 '16 at 1:44









martineau

66.1k989178




66.1k989178










asked Mar 14 '13 at 19:23









Sangeeth SaravanarajSangeeth Saravanaraj

8,815155489




8,815155489












  • Could you please justify the -1? It is a genuine programming question.
    – Sangeeth Saravanaraj
    Mar 14 '13 at 19:27










  • Maybe this will help you - wiki.python.org/moin/TimeComplexity ?
    – kgr
    Mar 14 '13 at 19:35










  • Why not using sql for this? Seems more appropiate. Python has builtin support for sqlite.
    – MGP
    Mar 14 '13 at 19:40










  • SQL will be slow, given OP mentions time complexity, he's probably not very happy about being tied by I/O.
    – kgr
    Mar 14 '13 at 19:50










  • Is this question about providing an implementation of such data structure, or is it about whether or not a ready-to-use implementation exists?
    – moooeeeep
    Mar 14 '13 at 19:55


















  • Could you please justify the -1? It is a genuine programming question.
    – Sangeeth Saravanaraj
    Mar 14 '13 at 19:27










  • Maybe this will help you - wiki.python.org/moin/TimeComplexity ?
    – kgr
    Mar 14 '13 at 19:35










  • Why not using sql for this? Seems more appropiate. Python has builtin support for sqlite.
    – MGP
    Mar 14 '13 at 19:40










  • SQL will be slow, given OP mentions time complexity, he's probably not very happy about being tied by I/O.
    – kgr
    Mar 14 '13 at 19:50










  • Is this question about providing an implementation of such data structure, or is it about whether or not a ready-to-use implementation exists?
    – moooeeeep
    Mar 14 '13 at 19:55
















Could you please justify the -1? It is a genuine programming question.
– Sangeeth Saravanaraj
Mar 14 '13 at 19:27




Could you please justify the -1? It is a genuine programming question.
– Sangeeth Saravanaraj
Mar 14 '13 at 19:27












Maybe this will help you - wiki.python.org/moin/TimeComplexity ?
– kgr
Mar 14 '13 at 19:35




Maybe this will help you - wiki.python.org/moin/TimeComplexity ?
– kgr
Mar 14 '13 at 19:35












Why not using sql for this? Seems more appropiate. Python has builtin support for sqlite.
– MGP
Mar 14 '13 at 19:40




Why not using sql for this? Seems more appropiate. Python has builtin support for sqlite.
– MGP
Mar 14 '13 at 19:40












SQL will be slow, given OP mentions time complexity, he's probably not very happy about being tied by I/O.
– kgr
Mar 14 '13 at 19:50




SQL will be slow, given OP mentions time complexity, he's probably not very happy about being tied by I/O.
– kgr
Mar 14 '13 at 19:50












Is this question about providing an implementation of such data structure, or is it about whether or not a ready-to-use implementation exists?
– moooeeeep
Mar 14 '13 at 19:55




Is this question about providing an implementation of such data structure, or is it about whether or not a ready-to-use implementation exists?
– moooeeeep
Mar 14 '13 at 19:55












4 Answers
4






active

oldest

votes


















7














There isn't a single data structure built into Python that does everything you want, but it's fairly easy to use a combination of the ones it does have to achieve your goals and do so fairly efficiently.



For example, say your input was the following data in a comma-separated-value file called employees.csv with field names defined as shown by the first line:



name,age,weight,height
Bob Barker,25,175,6ft 2in
Ted Kingston,28,163,5ft 10in
Mary Manson,27,140,5ft 6in
Sue Sommers,27,132,5ft 8in
Alice Toklas,24,124,5ft 6in


The following is working code which illustrates how to read and store this data into a list of records, and automatically create separate look-up tables for finding records associated with the values of contained in the fields each of these record.



The records are instances of a class created by namedtuple which is a very memory efficient because each one lacks a __dict__ attribute that class instances normally contain. Using them will make it possible to access the fields of each by name using dot syntax, like record.fieldname.



The look-up tables are defaultdict(list) instances, which provide dictionary-like O(1) look-up times on average, and also allow multiple values to be associated with each one. So the look-up key is the value of the field value being sought, and the data associated with it will be a list of the integer indices of the Person records stored in the employees list with that value — so they'll all be relatively small.



Note that the code for the class is completely data-driven in that it doesn't contain any hardcoded field names which instead are all taken from the first row of csv data input file when it's read in. Of course when using an instance, all retrieve() method calls must provide valid field names.



Update



Modified to not create a lookup table for every unique value of every field when the data file is first read. Now the retrieve() method "lazily" creates them only when one is needed (and saves/caches the result for future use). Also modified to work in Python 2.7+ including 3.x.



from collections import defaultdict, namedtuple
import csv

class DataBase(object):
def __init__(self, csv_filename, recordname):
# Read data from csv format file into a list of namedtuples.
with open(csv_filename, 'r') as inputfile:
csv_reader = csv.reader(inputfile, delimiter=',')
self.fields = next(csv_reader) # Read header row.
self.Record = namedtuple(recordname, self.fields)
self.records = [self.Record(*row) for row in csv_reader]
self.valid_fieldnames = set(self.fields)

# Create an empty table of lookup tables for each field name that maps
# each unique field value to a list of record-list indices of the ones
# that contain it.
self.lookup_tables = {}

def retrieve(self, **kwargs):
""" Fetch a list of records with a field name with the value supplied
as a keyword arg (or return None if there aren't any).
"""
if len(kwargs) != 1: raise ValueError(
'Exactly one fieldname keyword argument required for retrieve function '
'(%s specified)' % ', '.join([repr(k) for k in kwargs.keys()]))
field, value = kwargs.popitem() # Keyword arg's name and value.
if field not in self.valid_fieldnames:
raise ValueError('keyword arg "%s" isn't a valid field name' % field)
if field not in self.lookup_tables: # Need to create a lookup table?
lookup_table = self.lookup_tables[field] = defaultdict(list)
for index, record in enumerate(self.records):
field_value = getattr(record, field)
lookup_table[field_value].append(index)
# Return (possibly empty) sequence of matching records.
return tuple(self.records[index]
for index in self.lookup_tables[field].get(value, ))

if __name__ == '__main__':
empdb = DataBase('employees.csv', 'Person')

print("retrieve(name='Ted Kingston'): {}".format(empdb.retrieve(name='Ted Kingston')))
print("retrieve(age='27'): {}".format(empdb.retrieve(age='27')))
print("retrieve(weight='150'): {}".format(empdb.retrieve(weight='150')))
try:
print("retrieve(hight='5ft 6in'):".format(empdb.retrieve(hight='5ft 6in')))
except ValueError as e:
print("ValueError('{}') raised as expected".format(e))
else:
raise type('NoExceptionError', (Exception,), {})(
'No exception raised from "retrieve(hight='5ft')" call.')


Output:



retrieve(name='Ted Kingston'): [Person(name='Ted Kingston', age='28', weight='163', height='5ft 10in')]
retrieve(age='27'): [Person(name='Mary Manson', age='27', weight='140', height='5ft 6in'),
Person(name='Sue Sommers', age='27', weight='132', height='5ft 8in')]
retrieve(weight='150'): None
retrieve(hight='5ft 6in'): ValueError('keyword arg "hight" is an invalid fieldname')
raised as expected





share|improve this answer































    4















    Is there a data structure in Python which will allow storing a record with n number of columns (name, age, sex, weigh, height, etc) and retrieving records based on any (one) of the column in logarithmic (or ideally constant - O(1) look-up time) complexity?




    No, there is none. But you could try to implement one on the basis of one dictionary per value dimension. As long as your values are hashable of course. If you implement a custom class for your records, each dictionary will contain references to the same objects. This will save you some memory.






    share|improve this answer





























      3














      Given http://wiki.python.org/moin/TimeComplexity how about this:




      • Have a dictionary for every column you're interested in - AGE, NAME, etc.

      • Have the keys of that dictionaries (AGE, NAME) be possible values for given column (35 or "m").

      • Have a list of lists representing values for one "collection", e.g. VALUES = [ [35, "m"], ...]

      • Have the value of column dictionaries (AGE, NAME) be lists of indices from the VALUES list.

      • Have a dictionary which maps column name to index within lists in VALUES so that you know that first column is age and second is sex (you could avoid that and use dictionaries, but they introduce large memory footrpint and with over 100K objects this may or not be a problem).


      Then the retrieve function could look like this:



      def retrieve(column_name, column_value):
      if column_name == "age":
      return [VALUES[index] for index in AGE[column_value]]
      elif ...: # repeat for other "columns"


      Then, this is what you get



      VALUES = [[35, "m"], [20, "f"]]
      AGE = {35:[0], 20:[1]}
      SEX = {"m":[0], "f":[1]}
      KEYS = ["age", "sex"]

      retrieve("age", 35)
      # [[35, 'm']]


      If you want a dictionary, you can do the following:



      [dict(zip(KEYS, values)) for values in retrieve("age", 35)]
      # [{'age': 35, 'sex': 'm'}]


      but again, dictionaries are a little heavy on the memory side, so if you can go with lists of values it might be better.



      Both dictionary and list retrieval are O(1) on average - worst case for dictionary is O(n) - so this should be pretty fast. Maintaining that will be a little bit of pain, but not so much. To "write", you'd just have to append to the VALUES list and then append the index in VALUES to each of the dictionaries.



      Of course, then best would be to benchmark your actual implementation and look for potential improvements, but hopefully this make sense and will get you going :)



      EDIT:



      Please note that as @moooeeeep said, this will only work if your values are hashable and therefore can be used as dictionary keys.






      share|improve this answer































        2














        You could achieve logarithmic time complexity in a relational database using indexes (O(log(n)**k) with single column indexes). Then to retrieve data just construct appropriate SQL:



        names = {'name', 'age', 'weight', 'height'}

        def retrieve(c, **params):
        if not (params and names.issuperset(params)):
        raise ValueError(params)
        where = ' and '.join(map('{0}=:{0}'.format, params))
        return c.execute('select * from records where ' + where, params)


        Example:



        import sqlite3

        c = sqlite3.connect(':memory:')
        c.row_factory = sqlite3.Row # to provide key access

        # create table
        c.execute("""create table records
        (name text, age integer, weight real, height real)""")

        # insert data
        records = (('abc', 23, 60, 174+i) for i in range(2))
        c.executemany('insert into records VALUES (?,?,?,?)', records)

        # create indexes
        for name in names:
        c.execute("create index idx_{0} on records ({0})".format(name))

        try:
        retrieve(c, naame='abc')
        except ValueError:
        pass
        else:
        assert 0

        for record in retrieve(c, name='abc', weight=60):
        print(record['height'])


        Output:



        174.0
        175.0





        share|improve this answer























        • Could you tell me the name of the following syntax? names = {'name', 'age', 'weight', 'height'}
          – LED Fantom
          Feb 4 '17 at 0:33






        • 1




          @LEDFantom: It is a set display (a literal that creates set() object). It is available on both Python 2.7 and Python 3.
          – jfs
          Feb 4 '17 at 0:53











        Your Answer






        StackExchange.ifUsing("editor", function () {
        StackExchange.using("externalEditor", function () {
        StackExchange.using("snippets", function () {
        StackExchange.snippets.init();
        });
        });
        }, "code-snippets");

        StackExchange.ready(function() {
        var channelOptions = {
        tags: "".split(" "),
        id: "1"
        };
        initTagRenderer("".split(" "), "".split(" "), channelOptions);

        StackExchange.using("externalEditor", function() {
        // Have to fire editor after snippets, if snippets enabled
        if (StackExchange.settings.snippets.snippetsEnabled) {
        StackExchange.using("snippets", function() {
        createEditor();
        });
        }
        else {
        createEditor();
        }
        });

        function createEditor() {
        StackExchange.prepareEditor({
        heartbeatType: 'answer',
        autoActivateHeartbeat: false,
        convertImagesToLinks: true,
        noModals: true,
        showLowRepImageUploadWarning: true,
        reputationToPostImages: 10,
        bindNavPrevention: true,
        postfix: "",
        imageUploader: {
        brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
        contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
        allowUrls: true
        },
        onDemand: true,
        discardSelector: ".discard-answer"
        ,immediatelyShowMarkdownHelp:true
        });


        }
        });














        draft saved

        draft discarded


















        StackExchange.ready(
        function () {
        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f15418386%2fwhat-is-the-best-data-structure-for-storing-a-set-of-four-or-more-values%23new-answer', 'question_page');
        }
        );

        Post as a guest















        Required, but never shown

























        4 Answers
        4






        active

        oldest

        votes








        4 Answers
        4






        active

        oldest

        votes









        active

        oldest

        votes






        active

        oldest

        votes









        7














        There isn't a single data structure built into Python that does everything you want, but it's fairly easy to use a combination of the ones it does have to achieve your goals and do so fairly efficiently.



        For example, say your input was the following data in a comma-separated-value file called employees.csv with field names defined as shown by the first line:



        name,age,weight,height
        Bob Barker,25,175,6ft 2in
        Ted Kingston,28,163,5ft 10in
        Mary Manson,27,140,5ft 6in
        Sue Sommers,27,132,5ft 8in
        Alice Toklas,24,124,5ft 6in


        The following is working code which illustrates how to read and store this data into a list of records, and automatically create separate look-up tables for finding records associated with the values of contained in the fields each of these record.



        The records are instances of a class created by namedtuple which is a very memory efficient because each one lacks a __dict__ attribute that class instances normally contain. Using them will make it possible to access the fields of each by name using dot syntax, like record.fieldname.



        The look-up tables are defaultdict(list) instances, which provide dictionary-like O(1) look-up times on average, and also allow multiple values to be associated with each one. So the look-up key is the value of the field value being sought, and the data associated with it will be a list of the integer indices of the Person records stored in the employees list with that value — so they'll all be relatively small.



        Note that the code for the class is completely data-driven in that it doesn't contain any hardcoded field names which instead are all taken from the first row of csv data input file when it's read in. Of course when using an instance, all retrieve() method calls must provide valid field names.



        Update



        Modified to not create a lookup table for every unique value of every field when the data file is first read. Now the retrieve() method "lazily" creates them only when one is needed (and saves/caches the result for future use). Also modified to work in Python 2.7+ including 3.x.



        from collections import defaultdict, namedtuple
        import csv

        class DataBase(object):
        def __init__(self, csv_filename, recordname):
        # Read data from csv format file into a list of namedtuples.
        with open(csv_filename, 'r') as inputfile:
        csv_reader = csv.reader(inputfile, delimiter=',')
        self.fields = next(csv_reader) # Read header row.
        self.Record = namedtuple(recordname, self.fields)
        self.records = [self.Record(*row) for row in csv_reader]
        self.valid_fieldnames = set(self.fields)

        # Create an empty table of lookup tables for each field name that maps
        # each unique field value to a list of record-list indices of the ones
        # that contain it.
        self.lookup_tables = {}

        def retrieve(self, **kwargs):
        """ Fetch a list of records with a field name with the value supplied
        as a keyword arg (or return None if there aren't any).
        """
        if len(kwargs) != 1: raise ValueError(
        'Exactly one fieldname keyword argument required for retrieve function '
        '(%s specified)' % ', '.join([repr(k) for k in kwargs.keys()]))
        field, value = kwargs.popitem() # Keyword arg's name and value.
        if field not in self.valid_fieldnames:
        raise ValueError('keyword arg "%s" isn't a valid field name' % field)
        if field not in self.lookup_tables: # Need to create a lookup table?
        lookup_table = self.lookup_tables[field] = defaultdict(list)
        for index, record in enumerate(self.records):
        field_value = getattr(record, field)
        lookup_table[field_value].append(index)
        # Return (possibly empty) sequence of matching records.
        return tuple(self.records[index]
        for index in self.lookup_tables[field].get(value, ))

        if __name__ == '__main__':
        empdb = DataBase('employees.csv', 'Person')

        print("retrieve(name='Ted Kingston'): {}".format(empdb.retrieve(name='Ted Kingston')))
        print("retrieve(age='27'): {}".format(empdb.retrieve(age='27')))
        print("retrieve(weight='150'): {}".format(empdb.retrieve(weight='150')))
        try:
        print("retrieve(hight='5ft 6in'):".format(empdb.retrieve(hight='5ft 6in')))
        except ValueError as e:
        print("ValueError('{}') raised as expected".format(e))
        else:
        raise type('NoExceptionError', (Exception,), {})(
        'No exception raised from "retrieve(hight='5ft')" call.')


        Output:



        retrieve(name='Ted Kingston'): [Person(name='Ted Kingston', age='28', weight='163', height='5ft 10in')]
        retrieve(age='27'): [Person(name='Mary Manson', age='27', weight='140', height='5ft 6in'),
        Person(name='Sue Sommers', age='27', weight='132', height='5ft 8in')]
        retrieve(weight='150'): None
        retrieve(hight='5ft 6in'): ValueError('keyword arg "hight" is an invalid fieldname')
        raised as expected





        share|improve this answer




























          7














          There isn't a single data structure built into Python that does everything you want, but it's fairly easy to use a combination of the ones it does have to achieve your goals and do so fairly efficiently.



          For example, say your input was the following data in a comma-separated-value file called employees.csv with field names defined as shown by the first line:



          name,age,weight,height
          Bob Barker,25,175,6ft 2in
          Ted Kingston,28,163,5ft 10in
          Mary Manson,27,140,5ft 6in
          Sue Sommers,27,132,5ft 8in
          Alice Toklas,24,124,5ft 6in


          The following is working code which illustrates how to read and store this data into a list of records, and automatically create separate look-up tables for finding records associated with the values of contained in the fields each of these record.



          The records are instances of a class created by namedtuple which is a very memory efficient because each one lacks a __dict__ attribute that class instances normally contain. Using them will make it possible to access the fields of each by name using dot syntax, like record.fieldname.



          The look-up tables are defaultdict(list) instances, which provide dictionary-like O(1) look-up times on average, and also allow multiple values to be associated with each one. So the look-up key is the value of the field value being sought, and the data associated with it will be a list of the integer indices of the Person records stored in the employees list with that value — so they'll all be relatively small.



          Note that the code for the class is completely data-driven in that it doesn't contain any hardcoded field names which instead are all taken from the first row of csv data input file when it's read in. Of course when using an instance, all retrieve() method calls must provide valid field names.



          Update



          Modified to not create a lookup table for every unique value of every field when the data file is first read. Now the retrieve() method "lazily" creates them only when one is needed (and saves/caches the result for future use). Also modified to work in Python 2.7+ including 3.x.



          from collections import defaultdict, namedtuple
          import csv

          class DataBase(object):
          def __init__(self, csv_filename, recordname):
          # Read data from csv format file into a list of namedtuples.
          with open(csv_filename, 'r') as inputfile:
          csv_reader = csv.reader(inputfile, delimiter=',')
          self.fields = next(csv_reader) # Read header row.
          self.Record = namedtuple(recordname, self.fields)
          self.records = [self.Record(*row) for row in csv_reader]
          self.valid_fieldnames = set(self.fields)

          # Create an empty table of lookup tables for each field name that maps
          # each unique field value to a list of record-list indices of the ones
          # that contain it.
          self.lookup_tables = {}

          def retrieve(self, **kwargs):
          """ Fetch a list of records with a field name with the value supplied
          as a keyword arg (or return None if there aren't any).
          """
          if len(kwargs) != 1: raise ValueError(
          'Exactly one fieldname keyword argument required for retrieve function '
          '(%s specified)' % ', '.join([repr(k) for k in kwargs.keys()]))
          field, value = kwargs.popitem() # Keyword arg's name and value.
          if field not in self.valid_fieldnames:
          raise ValueError('keyword arg "%s" isn't a valid field name' % field)
          if field not in self.lookup_tables: # Need to create a lookup table?
          lookup_table = self.lookup_tables[field] = defaultdict(list)
          for index, record in enumerate(self.records):
          field_value = getattr(record, field)
          lookup_table[field_value].append(index)
          # Return (possibly empty) sequence of matching records.
          return tuple(self.records[index]
          for index in self.lookup_tables[field].get(value, ))

          if __name__ == '__main__':
          empdb = DataBase('employees.csv', 'Person')

          print("retrieve(name='Ted Kingston'): {}".format(empdb.retrieve(name='Ted Kingston')))
          print("retrieve(age='27'): {}".format(empdb.retrieve(age='27')))
          print("retrieve(weight='150'): {}".format(empdb.retrieve(weight='150')))
          try:
          print("retrieve(hight='5ft 6in'):".format(empdb.retrieve(hight='5ft 6in')))
          except ValueError as e:
          print("ValueError('{}') raised as expected".format(e))
          else:
          raise type('NoExceptionError', (Exception,), {})(
          'No exception raised from "retrieve(hight='5ft')" call.')


          Output:



          retrieve(name='Ted Kingston'): [Person(name='Ted Kingston', age='28', weight='163', height='5ft 10in')]
          retrieve(age='27'): [Person(name='Mary Manson', age='27', weight='140', height='5ft 6in'),
          Person(name='Sue Sommers', age='27', weight='132', height='5ft 8in')]
          retrieve(weight='150'): None
          retrieve(hight='5ft 6in'): ValueError('keyword arg "hight" is an invalid fieldname')
          raised as expected





          share|improve this answer


























            7












            7








            7






            There isn't a single data structure built into Python that does everything you want, but it's fairly easy to use a combination of the ones it does have to achieve your goals and do so fairly efficiently.



            For example, say your input was the following data in a comma-separated-value file called employees.csv with field names defined as shown by the first line:



            name,age,weight,height
            Bob Barker,25,175,6ft 2in
            Ted Kingston,28,163,5ft 10in
            Mary Manson,27,140,5ft 6in
            Sue Sommers,27,132,5ft 8in
            Alice Toklas,24,124,5ft 6in


            The following is working code which illustrates how to read and store this data into a list of records, and automatically create separate look-up tables for finding records associated with the values of contained in the fields each of these record.



            The records are instances of a class created by namedtuple which is a very memory efficient because each one lacks a __dict__ attribute that class instances normally contain. Using them will make it possible to access the fields of each by name using dot syntax, like record.fieldname.



            The look-up tables are defaultdict(list) instances, which provide dictionary-like O(1) look-up times on average, and also allow multiple values to be associated with each one. So the look-up key is the value of the field value being sought, and the data associated with it will be a list of the integer indices of the Person records stored in the employees list with that value — so they'll all be relatively small.



            Note that the code for the class is completely data-driven in that it doesn't contain any hardcoded field names which instead are all taken from the first row of csv data input file when it's read in. Of course when using an instance, all retrieve() method calls must provide valid field names.



            Update



            Modified to not create a lookup table for every unique value of every field when the data file is first read. Now the retrieve() method "lazily" creates them only when one is needed (and saves/caches the result for future use). Also modified to work in Python 2.7+ including 3.x.



            from collections import defaultdict, namedtuple
            import csv

            class DataBase(object):
            def __init__(self, csv_filename, recordname):
            # Read data from csv format file into a list of namedtuples.
            with open(csv_filename, 'r') as inputfile:
            csv_reader = csv.reader(inputfile, delimiter=',')
            self.fields = next(csv_reader) # Read header row.
            self.Record = namedtuple(recordname, self.fields)
            self.records = [self.Record(*row) for row in csv_reader]
            self.valid_fieldnames = set(self.fields)

            # Create an empty table of lookup tables for each field name that maps
            # each unique field value to a list of record-list indices of the ones
            # that contain it.
            self.lookup_tables = {}

            def retrieve(self, **kwargs):
            """ Fetch a list of records with a field name with the value supplied
            as a keyword arg (or return None if there aren't any).
            """
            if len(kwargs) != 1: raise ValueError(
            'Exactly one fieldname keyword argument required for retrieve function '
            '(%s specified)' % ', '.join([repr(k) for k in kwargs.keys()]))
            field, value = kwargs.popitem() # Keyword arg's name and value.
            if field not in self.valid_fieldnames:
            raise ValueError('keyword arg "%s" isn't a valid field name' % field)
            if field not in self.lookup_tables: # Need to create a lookup table?
            lookup_table = self.lookup_tables[field] = defaultdict(list)
            for index, record in enumerate(self.records):
            field_value = getattr(record, field)
            lookup_table[field_value].append(index)
            # Return (possibly empty) sequence of matching records.
            return tuple(self.records[index]
            for index in self.lookup_tables[field].get(value, ))

            if __name__ == '__main__':
            empdb = DataBase('employees.csv', 'Person')

            print("retrieve(name='Ted Kingston'): {}".format(empdb.retrieve(name='Ted Kingston')))
            print("retrieve(age='27'): {}".format(empdb.retrieve(age='27')))
            print("retrieve(weight='150'): {}".format(empdb.retrieve(weight='150')))
            try:
            print("retrieve(hight='5ft 6in'):".format(empdb.retrieve(hight='5ft 6in')))
            except ValueError as e:
            print("ValueError('{}') raised as expected".format(e))
            else:
            raise type('NoExceptionError', (Exception,), {})(
            'No exception raised from "retrieve(hight='5ft')" call.')


            Output:



            retrieve(name='Ted Kingston'): [Person(name='Ted Kingston', age='28', weight='163', height='5ft 10in')]
            retrieve(age='27'): [Person(name='Mary Manson', age='27', weight='140', height='5ft 6in'),
            Person(name='Sue Sommers', age='27', weight='132', height='5ft 8in')]
            retrieve(weight='150'): None
            retrieve(hight='5ft 6in'): ValueError('keyword arg "hight" is an invalid fieldname')
            raised as expected





            share|improve this answer














            There isn't a single data structure built into Python that does everything you want, but it's fairly easy to use a combination of the ones it does have to achieve your goals and do so fairly efficiently.



            For example, say your input was the following data in a comma-separated-value file called employees.csv with field names defined as shown by the first line:



            name,age,weight,height
            Bob Barker,25,175,6ft 2in
            Ted Kingston,28,163,5ft 10in
            Mary Manson,27,140,5ft 6in
            Sue Sommers,27,132,5ft 8in
            Alice Toklas,24,124,5ft 6in


            The following is working code which illustrates how to read and store this data into a list of records, and automatically create separate look-up tables for finding records associated with the values of contained in the fields each of these record.



            The records are instances of a class created by namedtuple which is a very memory efficient because each one lacks a __dict__ attribute that class instances normally contain. Using them will make it possible to access the fields of each by name using dot syntax, like record.fieldname.



            The look-up tables are defaultdict(list) instances, which provide dictionary-like O(1) look-up times on average, and also allow multiple values to be associated with each one. So the look-up key is the value of the field value being sought, and the data associated with it will be a list of the integer indices of the Person records stored in the employees list with that value — so they'll all be relatively small.



            Note that the code for the class is completely data-driven in that it doesn't contain any hardcoded field names which instead are all taken from the first row of csv data input file when it's read in. Of course when using an instance, all retrieve() method calls must provide valid field names.



            Update



            Modified to not create a lookup table for every unique value of every field when the data file is first read. Now the retrieve() method "lazily" creates them only when one is needed (and saves/caches the result for future use). Also modified to work in Python 2.7+ including 3.x.



            from collections import defaultdict, namedtuple
            import csv

            class DataBase(object):
            def __init__(self, csv_filename, recordname):
            # Read data from csv format file into a list of namedtuples.
            with open(csv_filename, 'r') as inputfile:
            csv_reader = csv.reader(inputfile, delimiter=',')
            self.fields = next(csv_reader) # Read header row.
            self.Record = namedtuple(recordname, self.fields)
            self.records = [self.Record(*row) for row in csv_reader]
            self.valid_fieldnames = set(self.fields)

            # Create an empty table of lookup tables for each field name that maps
            # each unique field value to a list of record-list indices of the ones
            # that contain it.
            self.lookup_tables = {}

            def retrieve(self, **kwargs):
            """ Fetch a list of records with a field name with the value supplied
            as a keyword arg (or return None if there aren't any).
            """
            if len(kwargs) != 1: raise ValueError(
            'Exactly one fieldname keyword argument required for retrieve function '
            '(%s specified)' % ', '.join([repr(k) for k in kwargs.keys()]))
            field, value = kwargs.popitem() # Keyword arg's name and value.
            if field not in self.valid_fieldnames:
            raise ValueError('keyword arg "%s" isn't a valid field name' % field)
            if field not in self.lookup_tables: # Need to create a lookup table?
            lookup_table = self.lookup_tables[field] = defaultdict(list)
            for index, record in enumerate(self.records):
            field_value = getattr(record, field)
            lookup_table[field_value].append(index)
            # Return (possibly empty) sequence of matching records.
            return tuple(self.records[index]
            for index in self.lookup_tables[field].get(value, ))

            if __name__ == '__main__':
            empdb = DataBase('employees.csv', 'Person')

            print("retrieve(name='Ted Kingston'): {}".format(empdb.retrieve(name='Ted Kingston')))
            print("retrieve(age='27'): {}".format(empdb.retrieve(age='27')))
            print("retrieve(weight='150'): {}".format(empdb.retrieve(weight='150')))
            try:
            print("retrieve(hight='5ft 6in'):".format(empdb.retrieve(hight='5ft 6in')))
            except ValueError as e:
            print("ValueError('{}') raised as expected".format(e))
            else:
            raise type('NoExceptionError', (Exception,), {})(
            'No exception raised from "retrieve(hight='5ft')" call.')


            Output:



            retrieve(name='Ted Kingston'): [Person(name='Ted Kingston', age='28', weight='163', height='5ft 10in')]
            retrieve(age='27'): [Person(name='Mary Manson', age='27', weight='140', height='5ft 6in'),
            Person(name='Sue Sommers', age='27', weight='132', height='5ft 8in')]
            retrieve(weight='150'): None
            retrieve(hight='5ft 6in'): ValueError('keyword arg "hight" is an invalid fieldname')
            raised as expected






            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited Nov 19 '18 at 19:35

























            answered Mar 14 '13 at 21:39









            martineaumartineau

            66.1k989178




            66.1k989178

























                4















                Is there a data structure in Python which will allow storing a record with n number of columns (name, age, sex, weigh, height, etc) and retrieving records based on any (one) of the column in logarithmic (or ideally constant - O(1) look-up time) complexity?




                No, there is none. But you could try to implement one on the basis of one dictionary per value dimension. As long as your values are hashable of course. If you implement a custom class for your records, each dictionary will contain references to the same objects. This will save you some memory.






                share|improve this answer


























                  4















                  Is there a data structure in Python which will allow storing a record with n number of columns (name, age, sex, weigh, height, etc) and retrieving records based on any (one) of the column in logarithmic (or ideally constant - O(1) look-up time) complexity?




                  No, there is none. But you could try to implement one on the basis of one dictionary per value dimension. As long as your values are hashable of course. If you implement a custom class for your records, each dictionary will contain references to the same objects. This will save you some memory.






                  share|improve this answer
























                    4












                    4








                    4







                    Is there a data structure in Python which will allow storing a record with n number of columns (name, age, sex, weigh, height, etc) and retrieving records based on any (one) of the column in logarithmic (or ideally constant - O(1) look-up time) complexity?




                    No, there is none. But you could try to implement one on the basis of one dictionary per value dimension. As long as your values are hashable of course. If you implement a custom class for your records, each dictionary will contain references to the same objects. This will save you some memory.






                    share|improve this answer













                    Is there a data structure in Python which will allow storing a record with n number of columns (name, age, sex, weigh, height, etc) and retrieving records based on any (one) of the column in logarithmic (or ideally constant - O(1) look-up time) complexity?




                    No, there is none. But you could try to implement one on the basis of one dictionary per value dimension. As long as your values are hashable of course. If you implement a custom class for your records, each dictionary will contain references to the same objects. This will save you some memory.







                    share|improve this answer












                    share|improve this answer



                    share|improve this answer










                    answered Mar 14 '13 at 19:54









                    moooeeeepmoooeeeep

                    20.9k1072127




                    20.9k1072127























                        3














                        Given http://wiki.python.org/moin/TimeComplexity how about this:




                        • Have a dictionary for every column you're interested in - AGE, NAME, etc.

                        • Have the keys of that dictionaries (AGE, NAME) be possible values for given column (35 or "m").

                        • Have a list of lists representing values for one "collection", e.g. VALUES = [ [35, "m"], ...]

                        • Have the value of column dictionaries (AGE, NAME) be lists of indices from the VALUES list.

                        • Have a dictionary which maps column name to index within lists in VALUES so that you know that first column is age and second is sex (you could avoid that and use dictionaries, but they introduce large memory footrpint and with over 100K objects this may or not be a problem).


                        Then the retrieve function could look like this:



                        def retrieve(column_name, column_value):
                        if column_name == "age":
                        return [VALUES[index] for index in AGE[column_value]]
                        elif ...: # repeat for other "columns"


                        Then, this is what you get



                        VALUES = [[35, "m"], [20, "f"]]
                        AGE = {35:[0], 20:[1]}
                        SEX = {"m":[0], "f":[1]}
                        KEYS = ["age", "sex"]

                        retrieve("age", 35)
                        # [[35, 'm']]


                        If you want a dictionary, you can do the following:



                        [dict(zip(KEYS, values)) for values in retrieve("age", 35)]
                        # [{'age': 35, 'sex': 'm'}]


                        but again, dictionaries are a little heavy on the memory side, so if you can go with lists of values it might be better.



                        Both dictionary and list retrieval are O(1) on average - worst case for dictionary is O(n) - so this should be pretty fast. Maintaining that will be a little bit of pain, but not so much. To "write", you'd just have to append to the VALUES list and then append the index in VALUES to each of the dictionaries.



                        Of course, then best would be to benchmark your actual implementation and look for potential improvements, but hopefully this make sense and will get you going :)



                        EDIT:



                        Please note that as @moooeeeep said, this will only work if your values are hashable and therefore can be used as dictionary keys.






                        share|improve this answer




























                          3














                          Given http://wiki.python.org/moin/TimeComplexity how about this:




                          • Have a dictionary for every column you're interested in - AGE, NAME, etc.

                          • Have the keys of that dictionaries (AGE, NAME) be possible values for given column (35 or "m").

                          • Have a list of lists representing values for one "collection", e.g. VALUES = [ [35, "m"], ...]

                          • Have the value of column dictionaries (AGE, NAME) be lists of indices from the VALUES list.

                          • Have a dictionary which maps column name to index within lists in VALUES so that you know that first column is age and second is sex (you could avoid that and use dictionaries, but they introduce large memory footrpint and with over 100K objects this may or not be a problem).


                          Then the retrieve function could look like this:



                          def retrieve(column_name, column_value):
                          if column_name == "age":
                          return [VALUES[index] for index in AGE[column_value]]
                          elif ...: # repeat for other "columns"


                          Then, this is what you get



                          VALUES = [[35, "m"], [20, "f"]]
                          AGE = {35:[0], 20:[1]}
                          SEX = {"m":[0], "f":[1]}
                          KEYS = ["age", "sex"]

                          retrieve("age", 35)
                          # [[35, 'm']]


                          If you want a dictionary, you can do the following:



                          [dict(zip(KEYS, values)) for values in retrieve("age", 35)]
                          # [{'age': 35, 'sex': 'm'}]


                          but again, dictionaries are a little heavy on the memory side, so if you can go with lists of values it might be better.



                          Both dictionary and list retrieval are O(1) on average - worst case for dictionary is O(n) - so this should be pretty fast. Maintaining that will be a little bit of pain, but not so much. To "write", you'd just have to append to the VALUES list and then append the index in VALUES to each of the dictionaries.



                          Of course, then best would be to benchmark your actual implementation and look for potential improvements, but hopefully this make sense and will get you going :)



                          EDIT:



                          Please note that as @moooeeeep said, this will only work if your values are hashable and therefore can be used as dictionary keys.






                          share|improve this answer


























                            3












                            3








                            3






                            Given http://wiki.python.org/moin/TimeComplexity how about this:




                            • Have a dictionary for every column you're interested in - AGE, NAME, etc.

                            • Have the keys of that dictionaries (AGE, NAME) be possible values for given column (35 or "m").

                            • Have a list of lists representing values for one "collection", e.g. VALUES = [ [35, "m"], ...]

                            • Have the value of column dictionaries (AGE, NAME) be lists of indices from the VALUES list.

                            • Have a dictionary which maps column name to index within lists in VALUES so that you know that first column is age and second is sex (you could avoid that and use dictionaries, but they introduce large memory footrpint and with over 100K objects this may or not be a problem).


                            Then the retrieve function could look like this:



                            def retrieve(column_name, column_value):
                            if column_name == "age":
                            return [VALUES[index] for index in AGE[column_value]]
                            elif ...: # repeat for other "columns"


                            Then, this is what you get



                            VALUES = [[35, "m"], [20, "f"]]
                            AGE = {35:[0], 20:[1]}
                            SEX = {"m":[0], "f":[1]}
                            KEYS = ["age", "sex"]

                            retrieve("age", 35)
                            # [[35, 'm']]


                            If you want a dictionary, you can do the following:



                            [dict(zip(KEYS, values)) for values in retrieve("age", 35)]
                            # [{'age': 35, 'sex': 'm'}]


                            but again, dictionaries are a little heavy on the memory side, so if you can go with lists of values it might be better.



                            Both dictionary and list retrieval are O(1) on average - worst case for dictionary is O(n) - so this should be pretty fast. Maintaining that will be a little bit of pain, but not so much. To "write", you'd just have to append to the VALUES list and then append the index in VALUES to each of the dictionaries.



                            Of course, then best would be to benchmark your actual implementation and look for potential improvements, but hopefully this make sense and will get you going :)



                            EDIT:



                            Please note that as @moooeeeep said, this will only work if your values are hashable and therefore can be used as dictionary keys.






                            share|improve this answer














                            Given http://wiki.python.org/moin/TimeComplexity how about this:




                            • Have a dictionary for every column you're interested in - AGE, NAME, etc.

                            • Have the keys of that dictionaries (AGE, NAME) be possible values for given column (35 or "m").

                            • Have a list of lists representing values for one "collection", e.g. VALUES = [ [35, "m"], ...]

                            • Have the value of column dictionaries (AGE, NAME) be lists of indices from the VALUES list.

                            • Have a dictionary which maps column name to index within lists in VALUES so that you know that first column is age and second is sex (you could avoid that and use dictionaries, but they introduce large memory footrpint and with over 100K objects this may or not be a problem).


                            Then the retrieve function could look like this:



                            def retrieve(column_name, column_value):
                            if column_name == "age":
                            return [VALUES[index] for index in AGE[column_value]]
                            elif ...: # repeat for other "columns"


                            Then, this is what you get



                            VALUES = [[35, "m"], [20, "f"]]
                            AGE = {35:[0], 20:[1]}
                            SEX = {"m":[0], "f":[1]}
                            KEYS = ["age", "sex"]

                            retrieve("age", 35)
                            # [[35, 'm']]


                            If you want a dictionary, you can do the following:



                            [dict(zip(KEYS, values)) for values in retrieve("age", 35)]
                            # [{'age': 35, 'sex': 'm'}]


                            but again, dictionaries are a little heavy on the memory side, so if you can go with lists of values it might be better.



                            Both dictionary and list retrieval are O(1) on average - worst case for dictionary is O(n) - so this should be pretty fast. Maintaining that will be a little bit of pain, but not so much. To "write", you'd just have to append to the VALUES list and then append the index in VALUES to each of the dictionaries.



                            Of course, then best would be to benchmark your actual implementation and look for potential improvements, but hopefully this make sense and will get you going :)



                            EDIT:



                            Please note that as @moooeeeep said, this will only work if your values are hashable and therefore can be used as dictionary keys.







                            share|improve this answer














                            share|improve this answer



                            share|improve this answer








                            edited Mar 14 '13 at 20:38

























                            answered Mar 14 '13 at 19:49









                            kgrkgr

                            7,02913142




                            7,02913142























                                2














                                You could achieve logarithmic time complexity in a relational database using indexes (O(log(n)**k) with single column indexes). Then to retrieve data just construct appropriate SQL:



                                names = {'name', 'age', 'weight', 'height'}

                                def retrieve(c, **params):
                                if not (params and names.issuperset(params)):
                                raise ValueError(params)
                                where = ' and '.join(map('{0}=:{0}'.format, params))
                                return c.execute('select * from records where ' + where, params)


                                Example:



                                import sqlite3

                                c = sqlite3.connect(':memory:')
                                c.row_factory = sqlite3.Row # to provide key access

                                # create table
                                c.execute("""create table records
                                (name text, age integer, weight real, height real)""")

                                # insert data
                                records = (('abc', 23, 60, 174+i) for i in range(2))
                                c.executemany('insert into records VALUES (?,?,?,?)', records)

                                # create indexes
                                for name in names:
                                c.execute("create index idx_{0} on records ({0})".format(name))

                                try:
                                retrieve(c, naame='abc')
                                except ValueError:
                                pass
                                else:
                                assert 0

                                for record in retrieve(c, name='abc', weight=60):
                                print(record['height'])


                                Output:



                                174.0
                                175.0





                                share|improve this answer























                                • Could you tell me the name of the following syntax? names = {'name', 'age', 'weight', 'height'}
                                  – LED Fantom
                                  Feb 4 '17 at 0:33






                                • 1




                                  @LEDFantom: It is a set display (a literal that creates set() object). It is available on both Python 2.7 and Python 3.
                                  – jfs
                                  Feb 4 '17 at 0:53
















                                2














                                You could achieve logarithmic time complexity in a relational database using indexes (O(log(n)**k) with single column indexes). Then to retrieve data just construct appropriate SQL:



                                names = {'name', 'age', 'weight', 'height'}

                                def retrieve(c, **params):
                                if not (params and names.issuperset(params)):
                                raise ValueError(params)
                                where = ' and '.join(map('{0}=:{0}'.format, params))
                                return c.execute('select * from records where ' + where, params)


                                Example:



                                import sqlite3

                                c = sqlite3.connect(':memory:')
                                c.row_factory = sqlite3.Row # to provide key access

                                # create table
                                c.execute("""create table records
                                (name text, age integer, weight real, height real)""")

                                # insert data
                                records = (('abc', 23, 60, 174+i) for i in range(2))
                                c.executemany('insert into records VALUES (?,?,?,?)', records)

                                # create indexes
                                for name in names:
                                c.execute("create index idx_{0} on records ({0})".format(name))

                                try:
                                retrieve(c, naame='abc')
                                except ValueError:
                                pass
                                else:
                                assert 0

                                for record in retrieve(c, name='abc', weight=60):
                                print(record['height'])


                                Output:



                                174.0
                                175.0





                                share|improve this answer























                                • Could you tell me the name of the following syntax? names = {'name', 'age', 'weight', 'height'}
                                  – LED Fantom
                                  Feb 4 '17 at 0:33






                                • 1




                                  @LEDFantom: It is a set display (a literal that creates set() object). It is available on both Python 2.7 and Python 3.
                                  – jfs
                                  Feb 4 '17 at 0:53














                                2












                                2








                                2






                                You could achieve logarithmic time complexity in a relational database using indexes (O(log(n)**k) with single column indexes). Then to retrieve data just construct appropriate SQL:



                                names = {'name', 'age', 'weight', 'height'}

                                def retrieve(c, **params):
                                if not (params and names.issuperset(params)):
                                raise ValueError(params)
                                where = ' and '.join(map('{0}=:{0}'.format, params))
                                return c.execute('select * from records where ' + where, params)


                                Example:



                                import sqlite3

                                c = sqlite3.connect(':memory:')
                                c.row_factory = sqlite3.Row # to provide key access

                                # create table
                                c.execute("""create table records
                                (name text, age integer, weight real, height real)""")

                                # insert data
                                records = (('abc', 23, 60, 174+i) for i in range(2))
                                c.executemany('insert into records VALUES (?,?,?,?)', records)

                                # create indexes
                                for name in names:
                                c.execute("create index idx_{0} on records ({0})".format(name))

                                try:
                                retrieve(c, naame='abc')
                                except ValueError:
                                pass
                                else:
                                assert 0

                                for record in retrieve(c, name='abc', weight=60):
                                print(record['height'])


                                Output:



                                174.0
                                175.0





                                share|improve this answer














                                You could achieve logarithmic time complexity in a relational database using indexes (O(log(n)**k) with single column indexes). Then to retrieve data just construct appropriate SQL:



                                names = {'name', 'age', 'weight', 'height'}

                                def retrieve(c, **params):
                                if not (params and names.issuperset(params)):
                                raise ValueError(params)
                                where = ' and '.join(map('{0}=:{0}'.format, params))
                                return c.execute('select * from records where ' + where, params)


                                Example:



                                import sqlite3

                                c = sqlite3.connect(':memory:')
                                c.row_factory = sqlite3.Row # to provide key access

                                # create table
                                c.execute("""create table records
                                (name text, age integer, weight real, height real)""")

                                # insert data
                                records = (('abc', 23, 60, 174+i) for i in range(2))
                                c.executemany('insert into records VALUES (?,?,?,?)', records)

                                # create indexes
                                for name in names:
                                c.execute("create index idx_{0} on records ({0})".format(name))

                                try:
                                retrieve(c, naame='abc')
                                except ValueError:
                                pass
                                else:
                                assert 0

                                for record in retrieve(c, name='abc', weight=60):
                                print(record['height'])


                                Output:



                                174.0
                                175.0






                                share|improve this answer














                                share|improve this answer



                                share|improve this answer








                                edited Mar 14 '13 at 20:45

























                                answered Mar 14 '13 at 20:40









                                jfsjfs

                                262k785511080




                                262k785511080












                                • Could you tell me the name of the following syntax? names = {'name', 'age', 'weight', 'height'}
                                  – LED Fantom
                                  Feb 4 '17 at 0:33






                                • 1




                                  @LEDFantom: It is a set display (a literal that creates set() object). It is available on both Python 2.7 and Python 3.
                                  – jfs
                                  Feb 4 '17 at 0:53


















                                • Could you tell me the name of the following syntax? names = {'name', 'age', 'weight', 'height'}
                                  – LED Fantom
                                  Feb 4 '17 at 0:33






                                • 1




                                  @LEDFantom: It is a set display (a literal that creates set() object). It is available on both Python 2.7 and Python 3.
                                  – jfs
                                  Feb 4 '17 at 0:53
















                                Could you tell me the name of the following syntax? names = {'name', 'age', 'weight', 'height'}
                                – LED Fantom
                                Feb 4 '17 at 0:33




                                Could you tell me the name of the following syntax? names = {'name', 'age', 'weight', 'height'}
                                – LED Fantom
                                Feb 4 '17 at 0:33




                                1




                                1




                                @LEDFantom: It is a set display (a literal that creates set() object). It is available on both Python 2.7 and Python 3.
                                – jfs
                                Feb 4 '17 at 0:53




                                @LEDFantom: It is a set display (a literal that creates set() object). It is available on both Python 2.7 and Python 3.
                                – jfs
                                Feb 4 '17 at 0:53


















                                draft saved

                                draft discarded




















































                                Thanks for contributing an answer to Stack Overflow!


                                • Please be sure to answer the question. Provide details and share your research!

                                But avoid



                                • Asking for help, clarification, or responding to other answers.

                                • Making statements based on opinion; back them up with references or personal experience.


                                To learn more, see our tips on writing great answers.





                                Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


                                Please pay close attention to the following guidance:


                                • Please be sure to answer the question. Provide details and share your research!

                                But avoid



                                • Asking for help, clarification, or responding to other answers.

                                • Making statements based on opinion; back them up with references or personal experience.


                                To learn more, see our tips on writing great answers.




                                draft saved


                                draft discarded














                                StackExchange.ready(
                                function () {
                                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f15418386%2fwhat-is-the-best-data-structure-for-storing-a-set-of-four-or-more-values%23new-answer', 'question_page');
                                }
                                );

                                Post as a guest















                                Required, but never shown





















































                                Required, but never shown














                                Required, but never shown












                                Required, but never shown







                                Required, but never shown

































                                Required, but never shown














                                Required, but never shown












                                Required, but never shown







                                Required, but never shown







                                Popular posts from this blog

                                MongoDB - Not Authorized To Execute Command

                                in spring boot 2.1 many test slices are not allowed anymore due to multiple @BootstrapWith

                                How to fix TextFormField cause rebuild widget in Flutter