Optimize and speed up MySQL query selection

I'm trying to figure out which is the best way to optimize my current selection query on a MySQL database.

I have 2 MySQL tables with a relationship one-to-many. One is the user table that contains the unique list of users and It has around 22krows. One is the linedata table which contains all the possible coordinates for each user and it has around 490k rows.

In this case we can assume the foreign key between the 2 tables is the id value. In the case of the user table the id is also the auto-increment primary key, while in the linedata table it's not primary key cause we can have more rows for the same user.

The CREATE STMT structure

CREATE TABLE `user` (

  `id` int(11) NOT NULL AUTO_INCREMENT,

  `isActive` tinyint(4) NOT NULL,

  `userId` varchar(255) COLLATE utf8_unicode_ci NOT NULL,

  `name` varchar(255) COLLATE utf8_unicode_ci NOT NULL,

  `gender` varchar(45) COLLATE utf8_unicode_ci NOT NULL,

  `age` int(11) NOT NULL,

  PRIMARY KEY (`id`)

) ENGINE=InnoDB AUTO_INCREMENT=21938 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;





CREATE TABLE `linedata` (

  `id` int(11) NOT NULL,

  `userId` varchar(255) COLLATE utf8_unicode_ci NOT NULL,

  `timestamp` datetime NOT NULL,

  `x` float NOT NULL,

  `y` float NOT NULL

) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

The selection query

SELECT 

        u.id, 

        u.isActive, 

        u.userId,

        u.name,

        u.gender,

        u.age,

        GROUP_CONCAT(CONCAT_WS(', ',timestamp,x, y)

                     ORDER BY timestamp ASC SEPARATOR '; '

                    ) as linedata_0



        FROM user u 

        JOIN linedata l

        ON u.id=l.id

        WHERE DATEDIFF(l.timestamp, '2018-02-28T20:00:00.000Z') >= 0

          AND DATEDIFF(l.timestamp, '2018-11-20T09:20:08.218Z') <= 0

        GROUP BY userId;

The EXPLAIN output

+-------+---------------+-----------+-----------+-------------------+-----------+---------------+-----------+-----------+------------------------------------------------------------+

    |   ID  |   SELECT_TYPE |   TABLE   |   TYPE    |   POSSIBLE_KEYS   |   KEY     |   KEY_LEN     |   REF     |   ROWS    |       EXTRA                                                |

    +-------+---------------+-----------+-----------+-------------------+-----------+---------------+-----------+-----------+------------------------------------------------------------+

    |   1   |   SIMPLE      |   l      |   ALL   |   NULL         |   NULL |      NULL        |   NULL    |   491157   |   "Using where; Using temporary; Using filesort" |

    +-------+---------------+-----------+-----------+-------------------+-----------+---------------+-----------+-----------+------------------------------------------------------------+

    |   1   |   SIMPLE      |   u      |   eq_ref  |   PRIMARY         |   PRIMARY |      4        |   l.id   |   1       |     NULL                                                   |

    +-------+---------------+-----------+-----------+-------------------+-----------+---------------+-----------+-----------+------------------------------------------------------------+

The selection query works if for example I add another WHERE condition for filter single users. Let's say that I want to select just 200 user, then I got around 14 seconds as execution time. Around 7 seconds if I select just the first 100 users. But in case of having only datetime range condition it seems loading without an ending point. Any suggestions?

UPDATE

After following the Rick's suggestions now the query benchmark is around 14 seconds. Here below the EXPLAIN EXTENDED:

id,select_type,table,type,possible_keys,key,key_len,ref,rows,filtered,Extra
1,PRIMARY,u,index,PRIMARY,PRIMARY,4,NULL,21959,100.00,NULL
1,PRIMARY,l,ref,id_timestamp_index,id_timestamp_index,4,u.id,14,100.00,"Using index condition"
2,"DEPENDENT SUBQUERY",NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,"No tables used"

I have changed a bit some values of the tables:

linedata table user table

Where the id in user table can be joined with userId in linedata table. And they are integer now. We will have string type just for the userId value in user table cause it is a sort of long string identifier like 0000309ab2912b2fd34350d7e6c079846bb6c5e1f97d3ccb053d15061433e77a_0.

So, just for make a quick example we will have in user and in linedata table:

+-------+-----------+-----------+-------------------+--------+---+

|   id  | isActive  |   userId  |       name        | gender |age|

+-------+-----------+-----------+-------------------+--------+---+

|   1   |   1       |  x4by4d   |   john            | m      | 22|

|   2   |   1       |  3ub3ub   |   bob             | m      | 50|

+-------+-----------+-----------+-------------------+--------+---+







+-------+-----------+-----------+------+---+

|   id  | userId    |timestamp  |  x   | y |

+-------+-----------+-----------+------+----+

|   1   |   1       | somedate  |  30  | 10 |

|   2   |   1       | somedate  |  45  | 15 |

|   3   |   1       | somedate  |  50  | 20 |

|   4   |   2       | somedate  |  20  |  5 |

|   5   |   2       | somedate  |  25  | 10 |

+-------+-----------+-----------+------+----+

I have added a compound index made of userId and timestamp values in linedata table.

Maybe instead of having as primary key an ai id value for linedata table, if I add a composite primary key made of userId+timestamp? Should increase the performance or maybe not?

edited Nov 21 '18 at 21:36

Rick James

68.8k559100

asked Nov 20 '18 at 13:26

UgoL

470414

add a comment |

I'm trying to figure out which is the best way to optimize my current selection query on a MySQL database.

The CREATE STMT structure

CREATE TABLE `user` (

  `id` int(11) NOT NULL AUTO_INCREMENT,

  `isActive` tinyint(4) NOT NULL,

  `userId` varchar(255) COLLATE utf8_unicode_ci NOT NULL,

  `name` varchar(255) COLLATE utf8_unicode_ci NOT NULL,

  `gender` varchar(45) COLLATE utf8_unicode_ci NOT NULL,

  `age` int(11) NOT NULL,

  PRIMARY KEY (`id`)

) ENGINE=InnoDB AUTO_INCREMENT=21938 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;





CREATE TABLE `linedata` (

  `id` int(11) NOT NULL,

  `userId` varchar(255) COLLATE utf8_unicode_ci NOT NULL,

  `timestamp` datetime NOT NULL,

  `x` float NOT NULL,

  `y` float NOT NULL

) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

The selection query

SELECT 

        u.id, 

        u.isActive, 

        u.userId,

        u.name,

        u.gender,

        u.age,

        GROUP_CONCAT(CONCAT_WS(', ',timestamp,x, y)

                     ORDER BY timestamp ASC SEPARATOR '; '

                    ) as linedata_0



        FROM user u 

        JOIN linedata l

        ON u.id=l.id

        WHERE DATEDIFF(l.timestamp, '2018-02-28T20:00:00.000Z') >= 0

          AND DATEDIFF(l.timestamp, '2018-11-20T09:20:08.218Z') <= 0

        GROUP BY userId;

The EXPLAIN output

+-------+---------------+-----------+-----------+-------------------+-----------+---------------+-----------+-----------+------------------------------------------------------------+

    |   ID  |   SELECT_TYPE |   TABLE   |   TYPE    |   POSSIBLE_KEYS   |   KEY     |   KEY_LEN     |   REF     |   ROWS    |       EXTRA                                                |

    +-------+---------------+-----------+-----------+-------------------+-----------+---------------+-----------+-----------+------------------------------------------------------------+

    |   1   |   SIMPLE      |   l      |   ALL   |   NULL         |   NULL |      NULL        |   NULL    |   491157   |   "Using where; Using temporary; Using filesort" |

    +-------+---------------+-----------+-----------+-------------------+-----------+---------------+-----------+-----------+------------------------------------------------------------+

    |   1   |   SIMPLE      |   u      |   eq_ref  |   PRIMARY         |   PRIMARY |      4        |   l.id   |   1       |     NULL                                                   |

    +-------+---------------+-----------+-----------+-------------------+-----------+---------------+-----------+-----------+------------------------------------------------------------+

UPDATE

After following the Rick's suggestions now the query benchmark is around 14 seconds. Here below the EXPLAIN EXTENDED:

I have changed a bit some values of the tables:

linedata table user table

So, just for make a quick example we will have in user and in linedata table:

+-------+-----------+-----------+-------------------+--------+---+

|   id  | isActive  |   userId  |       name        | gender |age|

+-------+-----------+-----------+-------------------+--------+---+

|   1   |   1       |  x4by4d   |   john            | m      | 22|

|   2   |   1       |  3ub3ub   |   bob             | m      | 50|

+-------+-----------+-----------+-------------------+--------+---+







+-------+-----------+-----------+------+---+

|   id  | userId    |timestamp  |  x   | y |

+-------+-----------+-----------+------+----+

|   1   |   1       | somedate  |  30  | 10 |

|   2   |   1       | somedate  |  45  | 15 |

|   3   |   1       | somedate  |  50  | 20 |

|   4   |   2       | somedate  |  20  |  5 |

|   5   |   2       | somedate  |  25  | 10 |

+-------+-----------+-----------+------+----+

I have added a compound index made of userId and timestamp values in linedata table.

Maybe instead of having as primary key an ai id value for linedata table, if I add a composite primary key made of userId+timestamp? Should increase the performance or maybe not?

edited Nov 21 '18 at 21:36

Rick James

68.8k559100

asked Nov 20 '18 at 13:26

UgoL

470414

add a comment |

I'm trying to figure out which is the best way to optimize my current selection query on a MySQL database.

The CREATE STMT structure

CREATE TABLE `user` (

  `id` int(11) NOT NULL AUTO_INCREMENT,

  `isActive` tinyint(4) NOT NULL,

  `userId` varchar(255) COLLATE utf8_unicode_ci NOT NULL,

  `name` varchar(255) COLLATE utf8_unicode_ci NOT NULL,

  `gender` varchar(45) COLLATE utf8_unicode_ci NOT NULL,

  `age` int(11) NOT NULL,

  PRIMARY KEY (`id`)

) ENGINE=InnoDB AUTO_INCREMENT=21938 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;





CREATE TABLE `linedata` (

  `id` int(11) NOT NULL,

  `userId` varchar(255) COLLATE utf8_unicode_ci NOT NULL,

  `timestamp` datetime NOT NULL,

  `x` float NOT NULL,

  `y` float NOT NULL

) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

The selection query

SELECT 

        u.id, 

        u.isActive, 

        u.userId,

        u.name,

        u.gender,

        u.age,

        GROUP_CONCAT(CONCAT_WS(', ',timestamp,x, y)

                     ORDER BY timestamp ASC SEPARATOR '; '

                    ) as linedata_0



        FROM user u 

        JOIN linedata l

        ON u.id=l.id

        WHERE DATEDIFF(l.timestamp, '2018-02-28T20:00:00.000Z') >= 0

          AND DATEDIFF(l.timestamp, '2018-11-20T09:20:08.218Z') <= 0

        GROUP BY userId;

The EXPLAIN output

+-------+---------------+-----------+-----------+-------------------+-----------+---------------+-----------+-----------+------------------------------------------------------------+

    |   ID  |   SELECT_TYPE |   TABLE   |   TYPE    |   POSSIBLE_KEYS   |   KEY     |   KEY_LEN     |   REF     |   ROWS    |       EXTRA                                                |

    +-------+---------------+-----------+-----------+-------------------+-----------+---------------+-----------+-----------+------------------------------------------------------------+

    |   1   |   SIMPLE      |   l      |   ALL   |   NULL         |   NULL |      NULL        |   NULL    |   491157   |   "Using where; Using temporary; Using filesort" |

    +-------+---------------+-----------+-----------+-------------------+-----------+---------------+-----------+-----------+------------------------------------------------------------+

    |   1   |   SIMPLE      |   u      |   eq_ref  |   PRIMARY         |   PRIMARY |      4        |   l.id   |   1       |     NULL                                                   |

    +-------+---------------+-----------+-----------+-------------------+-----------+---------------+-----------+-----------+------------------------------------------------------------+

UPDATE

After following the Rick's suggestions now the query benchmark is around 14 seconds. Here below the EXPLAIN EXTENDED:

I have changed a bit some values of the tables:

linedata table user table

So, just for make a quick example we will have in user and in linedata table:

+-------+-----------+-----------+-------------------+--------+---+

|   id  | isActive  |   userId  |       name        | gender |age|

+-------+-----------+-----------+-------------------+--------+---+

|   1   |   1       |  x4by4d   |   john            | m      | 22|

|   2   |   1       |  3ub3ub   |   bob             | m      | 50|

+-------+-----------+-----------+-------------------+--------+---+







+-------+-----------+-----------+------+---+

|   id  | userId    |timestamp  |  x   | y |

+-------+-----------+-----------+------+----+

|   1   |   1       | somedate  |  30  | 10 |

|   2   |   1       | somedate  |  45  | 15 |

|   3   |   1       | somedate  |  50  | 20 |

|   4   |   2       | somedate  |  20  |  5 |

|   5   |   2       | somedate  |  25  | 10 |

+-------+-----------+-----------+------+----+

I have added a compound index made of userId and timestamp values in linedata table.

Maybe instead of having as primary key an ai id value for linedata table, if I add a composite primary key made of userId+timestamp? Should increase the performance or maybe not?

edited Nov 21 '18 at 21:36

Rick James

68.8k559100

asked Nov 20 '18 at 13:26

UgoL

470414

I'm trying to figure out which is the best way to optimize my current selection query on a MySQL database.

The CREATE STMT structure

CREATE TABLE `user` (

  `id` int(11) NOT NULL AUTO_INCREMENT,

  `isActive` tinyint(4) NOT NULL,

  `userId` varchar(255) COLLATE utf8_unicode_ci NOT NULL,

  `name` varchar(255) COLLATE utf8_unicode_ci NOT NULL,

  `gender` varchar(45) COLLATE utf8_unicode_ci NOT NULL,

  `age` int(11) NOT NULL,

  PRIMARY KEY (`id`)

) ENGINE=InnoDB AUTO_INCREMENT=21938 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;





CREATE TABLE `linedata` (

  `id` int(11) NOT NULL,

  `userId` varchar(255) COLLATE utf8_unicode_ci NOT NULL,

  `timestamp` datetime NOT NULL,

  `x` float NOT NULL,

  `y` float NOT NULL

) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

The selection query

SELECT 

        u.id, 

        u.isActive, 

        u.userId,

        u.name,

        u.gender,

        u.age,

        GROUP_CONCAT(CONCAT_WS(', ',timestamp,x, y)

                     ORDER BY timestamp ASC SEPARATOR '; '

                    ) as linedata_0



        FROM user u 

        JOIN linedata l

        ON u.id=l.id

        WHERE DATEDIFF(l.timestamp, '2018-02-28T20:00:00.000Z') >= 0

          AND DATEDIFF(l.timestamp, '2018-11-20T09:20:08.218Z') <= 0

        GROUP BY userId;

The EXPLAIN output

+-------+---------------+-----------+-----------+-------------------+-----------+---------------+-----------+-----------+------------------------------------------------------------+

    |   ID  |   SELECT_TYPE |   TABLE   |   TYPE    |   POSSIBLE_KEYS   |   KEY     |   KEY_LEN     |   REF     |   ROWS    |       EXTRA                                                |

    +-------+---------------+-----------+-----------+-------------------+-----------+---------------+-----------+-----------+------------------------------------------------------------+

    |   1   |   SIMPLE      |   l      |   ALL   |   NULL         |   NULL |      NULL        |   NULL    |   491157   |   "Using where; Using temporary; Using filesort" |

    +-------+---------------+-----------+-----------+-------------------+-----------+---------------+-----------+-----------+------------------------------------------------------------+

    |   1   |   SIMPLE      |   u      |   eq_ref  |   PRIMARY         |   PRIMARY |      4        |   l.id   |   1       |     NULL                                                   |

    +-------+---------------+-----------+-----------+-------------------+-----------+---------------+-----------+-----------+------------------------------------------------------------+

UPDATE

After following the Rick's suggestions now the query benchmark is around 14 seconds. Here below the EXPLAIN EXTENDED:

I have changed a bit some values of the tables:

linedata table user table

So, just for make a quick example we will have in user and in linedata table:

+-------+-----------+-----------+-------------------+--------+---+

|   id  | isActive  |   userId  |       name        | gender |age|

+-------+-----------+-----------+-------------------+--------+---+

|   1   |   1       |  x4by4d   |   john            | m      | 22|

|   2   |   1       |  3ub3ub   |   bob             | m      | 50|

+-------+-----------+-----------+-------------------+--------+---+







+-------+-----------+-----------+------+---+

|   id  | userId    |timestamp  |  x   | y |

+-------+-----------+-----------+------+----+

|   1   |   1       | somedate  |  30  | 10 |

|   2   |   1       | somedate  |  45  | 15 |

|   3   |   1       | somedate  |  50  | 20 |

|   4   |   2       | somedate  |  20  |  5 |

|   5   |   2       | somedate  |  25  | 10 |

+-------+-----------+-----------+------+----+

I have added a compound index made of userId and timestamp values in linedata table.

Maybe instead of having as primary key an ai id value for linedata table, if I add a composite primary key made of userId+timestamp? Should increase the performance or maybe not?

mysql select query-optimization rdbms

edited Nov 21 '18 at 21:36

Rick James

68.8k559100

asked Nov 20 '18 at 13:26

UgoL

470414

edited Nov 21 '18 at 21:36

Rick James

68.8k559100

asked Nov 20 '18 at 13:26

UgoL

470414

edited Nov 21 '18 at 21:36

Rick James

68.8k559100

edited Nov 21 '18 at 21:36

Rick James

68.8k559100

edited Nov 21 '18 at 21:36

Rick James

68.8k559100

asked Nov 20 '18 at 13:26

UgoL

470414

asked Nov 20 '18 at 13:26

UgoL

470414

asked Nov 20 '18 at 13:26

UgoL

470414

add a comment |

1 Answer
1

active

oldest

votes

I need to help you fix several bugs before discussing performance.

First of all, '2018-02-28T20:00:00.000Z' won't work in MySQL. It needs to be '2018-02-28 20:00:00.000' and something needs to be done about the timezone.

Then, don't "hide a column in a function". That is DATEDIFF(l.timestamp ...) cannot use any indexing on timestamp.

So, instead of

    WHERE  DATEDIFF(l.timestamp, '2018-02-28T20:00:00.000Z') >= 0

      AND  DATEDIFF(l.timestamp, '2018-11-20T09:20:08.218Z') <= 0

do something like

    WHERE  l.timestamp >= '2018-02-28 20:00:00.000'

      AND  l.timestamp  < '2018-11-20 09:20:08.218'

I'm confused about the two tables. Both have id and userid, yet you join on id. Perhaps instead of

CREATE TABLE `linedata` (

  `id` int(11) NOT NULL,

  `userId` varchar(255) COLLATE utf8_unicode_ci NOT NULL,

  ...

you meant

CREATE TABLE `linedata` (

  `id` int(11) NOT NULL  AUTO_INCREMENT,  -- (the id for `linedata`)

  `userId` int NOT NULL,   -- to link to the other table

  ...

  PRIMARY KEY(id)

...

Then there could be several linedata rows for each user.

At that point, this

    JOIN  linedata l  ON u.id=l.id

becomes

    JOIN  linedata l  ON u.id=l.userid

Now, for performance: linedata needs INDEX(userid, timestamp) - in that order.

Now, think about the output. You are asking for up to 22K rows, with possibly hundreds of "ts,x,y" strung together in one of the columns. What will receive this much data? Will it choke on it?

And GROUP_CONCAT has a default limit of 1024 bytes. That will allow for about 50 points. If a 'user' can be in more than 50 spots in 9 days, consider increasing group_concat_max_len before running the query.

To make it work even faster, reformulate it this way:

SELECT  u.id, u.isActive, u.userId, u.name, u.gender, u.age,

        ( SELECT  GROUP_CONCAT(CONCAT_WS(', ',timestamp, x, y)

                      ORDER BY timestamp ASC

                      SEPARATOR '; ')

        ) as linedata_0

    FROM  user u

    JOIN  linedata l  ON u.id = l.userid

    WHERE  l.timestamp >= '2018-02-28 20:00:00.000'

      AND  l.timestamp  < '2018-11-20 09:20:08.218';

Another thing. You probably want to be able to look up a user by name; so add INDEX(name)

Oh, what the heck is the VARCHAR(255) for userID?? Ids are normally integers.

answered Nov 21 '18 at 5:21

Rick James

68.8k559100

This covers most of what I was going to suggest. Note, I think the OP included their accurate table structures, it's just that there is no PK in the linedata table, and the id column there does in fact contain the same value as the id column in the user table. The userId looks like an alternative identifier, for some purpose. Likely, your suggested index of (userid,timestamp) could instead be handled by just making a PRIMARY KEY out of (id,timestamp), as a user is unlikely to be allowed to have more than one location at a time.

– Willem Renzema
Nov 21 '18 at 5:41

What says @WillemRenzema is right. I didn't have a pk for table linedata, but anyway I have created an ai pk called id. And I joined the id of user table(pk ai) with the userId value that now is integer too and matches correctly.

– UgoL
Nov 21 '18 at 10:22

@Rick James thank you for your answer. I have corrected certain points.. especially the personId type. I have generated an integer value that corresponds to the unique id of the user table now. What I think that cannot work properly is your query suggestion, cause I get just one row result that corresponds to just one user. In theory I should get around 21k rows with the concatenation of each timestamp and coordinates. I think you are missing the final group by u.id. Am I right? And if I add it the benchmark for get all the data is around 15 secs. I update the EXPLAIN stmt.

– UgoL
Nov 21 '18 at 10:31

@UgoL - Good start. But before I go any further, please clarify personId versus userId -- are they the same? Do they uniquely identify each user? Why 255? The EXPLAIN and timing are still bad because of lack if index(es) on linedata; please fix. (And maybe some other things that Willem or I mentioned.)

– Rick James
Nov 21 '18 at 17:36

@RickJames yes forgive me for the confusion.. personId is userId. Let me update the question with an example for show better everything, especially the relationship between userId and id. Cause I made some changes. What you mean with lack of indexes on linedata? It should be correct I supposed, but I'll show too this part in the update question.

– UgoL
Nov 21 '18 at 17:44

|
show 7 more comments

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53394029%2foptimize-and-speed-up-mysql-query-selection%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

I need to help you fix several bugs before discussing performance.

First of all, '2018-02-28T20:00:00.000Z' won't work in MySQL. It needs to be '2018-02-28 20:00:00.000' and something needs to be done about the timezone.

Then, don't "hide a column in a function". That is DATEDIFF(l.timestamp ...) cannot use any indexing on timestamp.

So, instead of

    WHERE  DATEDIFF(l.timestamp, '2018-02-28T20:00:00.000Z') >= 0

      AND  DATEDIFF(l.timestamp, '2018-11-20T09:20:08.218Z') <= 0

do something like

    WHERE  l.timestamp >= '2018-02-28 20:00:00.000'

      AND  l.timestamp  < '2018-11-20 09:20:08.218'

I'm confused about the two tables. Both have id and userid, yet you join on id. Perhaps instead of

CREATE TABLE `linedata` (

  `id` int(11) NOT NULL,

  `userId` varchar(255) COLLATE utf8_unicode_ci NOT NULL,

  ...

you meant

CREATE TABLE `linedata` (

  `id` int(11) NOT NULL  AUTO_INCREMENT,  -- (the id for `linedata`)

  `userId` int NOT NULL,   -- to link to the other table

  ...

  PRIMARY KEY(id)

...

Then there could be several linedata rows for each user.

At that point, this

    JOIN  linedata l  ON u.id=l.id

becomes

    JOIN  linedata l  ON u.id=l.userid

Now, for performance: linedata needs INDEX(userid, timestamp) - in that order.

Now, think about the output. You are asking for up to 22K rows, with possibly hundreds of "ts,x,y" strung together in one of the columns. What will receive this much data? Will it choke on it?

To make it work even faster, reformulate it this way:

SELECT  u.id, u.isActive, u.userId, u.name, u.gender, u.age,

        ( SELECT  GROUP_CONCAT(CONCAT_WS(', ',timestamp, x, y)

                      ORDER BY timestamp ASC

                      SEPARATOR '; ')

        ) as linedata_0

    FROM  user u

    JOIN  linedata l  ON u.id = l.userid

    WHERE  l.timestamp >= '2018-02-28 20:00:00.000'

      AND  l.timestamp  < '2018-11-20 09:20:08.218';

Another thing. You probably want to be able to look up a user by name; so add INDEX(name)

Oh, what the heck is the VARCHAR(255) for userID?? Ids are normally integers.

answered Nov 21 '18 at 5:21

Rick James

68.8k559100

This covers most of what I was going to suggest. Note, I think the OP included their accurate table structures, it's just that there is no PK in the linedata table, and the id column there does in fact contain the same value as the id column in the user table. The userId looks like an alternative identifier, for some purpose. Likely, your suggested index of (userid,timestamp) could instead be handled by just making a PRIMARY KEY out of (id,timestamp), as a user is unlikely to be allowed to have more than one location at a time.

– Willem Renzema
Nov 21 '18 at 5:41

What says @WillemRenzema is right. I didn't have a pk for table linedata, but anyway I have created an ai pk called id. And I joined the id of user table(pk ai) with the userId value that now is integer too and matches correctly.

– UgoL
Nov 21 '18 at 10:22

@Rick James thank you for your answer. I have corrected certain points.. especially the personId type. I have generated an integer value that corresponds to the unique id of the user table now. What I think that cannot work properly is your query suggestion, cause I get just one row result that corresponds to just one user. In theory I should get around 21k rows with the concatenation of each timestamp and coordinates. I think you are missing the final group by u.id. Am I right? And if I add it the benchmark for get all the data is around 15 secs. I update the EXPLAIN stmt.

– UgoL
Nov 21 '18 at 10:31

@UgoL - Good start. But before I go any further, please clarify personId versus userId -- are they the same? Do they uniquely identify each user? Why 255? The EXPLAIN and timing are still bad because of lack if index(es) on linedata; please fix. (And maybe some other things that Willem or I mentioned.)

– Rick James
Nov 21 '18 at 17:36

@RickJames yes forgive me for the confusion.. personId is userId. Let me update the question with an example for show better everything, especially the relationship between userId and id. Cause I made some changes. What you mean with lack of indexes on linedata? It should be correct I supposed, but I'll show too this part in the update question.

– UgoL
Nov 21 '18 at 17:44

|
show 7 more comments

I need to help you fix several bugs before discussing performance.

First of all, '2018-02-28T20:00:00.000Z' won't work in MySQL. It needs to be '2018-02-28 20:00:00.000' and something needs to be done about the timezone.

Then, don't "hide a column in a function". That is DATEDIFF(l.timestamp ...) cannot use any indexing on timestamp.

So, instead of

    WHERE  DATEDIFF(l.timestamp, '2018-02-28T20:00:00.000Z') >= 0

      AND  DATEDIFF(l.timestamp, '2018-11-20T09:20:08.218Z') <= 0

do something like

    WHERE  l.timestamp >= '2018-02-28 20:00:00.000'

      AND  l.timestamp  < '2018-11-20 09:20:08.218'

I'm confused about the two tables. Both have id and userid, yet you join on id. Perhaps instead of

CREATE TABLE `linedata` (

  `id` int(11) NOT NULL,

  `userId` varchar(255) COLLATE utf8_unicode_ci NOT NULL,

  ...

you meant

CREATE TABLE `linedata` (

  `id` int(11) NOT NULL  AUTO_INCREMENT,  -- (the id for `linedata`)

  `userId` int NOT NULL,   -- to link to the other table

  ...

  PRIMARY KEY(id)

...

Then there could be several linedata rows for each user.

At that point, this

    JOIN  linedata l  ON u.id=l.id

becomes

    JOIN  linedata l  ON u.id=l.userid

Now, for performance: linedata needs INDEX(userid, timestamp) - in that order.

Now, think about the output. You are asking for up to 22K rows, with possibly hundreds of "ts,x,y" strung together in one of the columns. What will receive this much data? Will it choke on it?

To make it work even faster, reformulate it this way:

SELECT  u.id, u.isActive, u.userId, u.name, u.gender, u.age,

        ( SELECT  GROUP_CONCAT(CONCAT_WS(', ',timestamp, x, y)

                      ORDER BY timestamp ASC

                      SEPARATOR '; ')

        ) as linedata_0

    FROM  user u

    JOIN  linedata l  ON u.id = l.userid

    WHERE  l.timestamp >= '2018-02-28 20:00:00.000'

      AND  l.timestamp  < '2018-11-20 09:20:08.218';

Another thing. You probably want to be able to look up a user by name; so add INDEX(name)

Oh, what the heck is the VARCHAR(255) for userID?? Ids are normally integers.

answered Nov 21 '18 at 5:21

Rick James

68.8k559100

This covers most of what I was going to suggest. Note, I think the OP included their accurate table structures, it's just that there is no PK in the linedata table, and the id column there does in fact contain the same value as the id column in the user table. The userId looks like an alternative identifier, for some purpose. Likely, your suggested index of (userid,timestamp) could instead be handled by just making a PRIMARY KEY out of (id,timestamp), as a user is unlikely to be allowed to have more than one location at a time.

– Willem Renzema
Nov 21 '18 at 5:41

What says @WillemRenzema is right. I didn't have a pk for table linedata, but anyway I have created an ai pk called id. And I joined the id of user table(pk ai) with the userId value that now is integer too and matches correctly.

– UgoL
Nov 21 '18 at 10:22

@Rick James thank you for your answer. I have corrected certain points.. especially the personId type. I have generated an integer value that corresponds to the unique id of the user table now. What I think that cannot work properly is your query suggestion, cause I get just one row result that corresponds to just one user. In theory I should get around 21k rows with the concatenation of each timestamp and coordinates. I think you are missing the final group by u.id. Am I right? And if I add it the benchmark for get all the data is around 15 secs. I update the EXPLAIN stmt.

– UgoL
Nov 21 '18 at 10:31

@UgoL - Good start. But before I go any further, please clarify personId versus userId -- are they the same? Do they uniquely identify each user? Why 255? The EXPLAIN and timing are still bad because of lack if index(es) on linedata; please fix. (And maybe some other things that Willem or I mentioned.)

– Rick James
Nov 21 '18 at 17:36

@RickJames yes forgive me for the confusion.. personId is userId. Let me update the question with an example for show better everything, especially the relationship between userId and id. Cause I made some changes. What you mean with lack of indexes on linedata? It should be correct I supposed, but I'll show too this part in the update question.

– UgoL
Nov 21 '18 at 17:44

|
show 7 more comments

I need to help you fix several bugs before discussing performance.

First of all, '2018-02-28T20:00:00.000Z' won't work in MySQL. It needs to be '2018-02-28 20:00:00.000' and something needs to be done about the timezone.

Then, don't "hide a column in a function". That is DATEDIFF(l.timestamp ...) cannot use any indexing on timestamp.

So, instead of

    WHERE  DATEDIFF(l.timestamp, '2018-02-28T20:00:00.000Z') >= 0

      AND  DATEDIFF(l.timestamp, '2018-11-20T09:20:08.218Z') <= 0

do something like

    WHERE  l.timestamp >= '2018-02-28 20:00:00.000'

      AND  l.timestamp  < '2018-11-20 09:20:08.218'

I'm confused about the two tables. Both have id and userid, yet you join on id. Perhaps instead of

CREATE TABLE `linedata` (

  `id` int(11) NOT NULL,

  `userId` varchar(255) COLLATE utf8_unicode_ci NOT NULL,

  ...

you meant

CREATE TABLE `linedata` (

  `id` int(11) NOT NULL  AUTO_INCREMENT,  -- (the id for `linedata`)

  `userId` int NOT NULL,   -- to link to the other table

  ...

  PRIMARY KEY(id)

...

Then there could be several linedata rows for each user.

At that point, this

    JOIN  linedata l  ON u.id=l.id

becomes

    JOIN  linedata l  ON u.id=l.userid

Now, for performance: linedata needs INDEX(userid, timestamp) - in that order.

Now, think about the output. You are asking for up to 22K rows, with possibly hundreds of "ts,x,y" strung together in one of the columns. What will receive this much data? Will it choke on it?

To make it work even faster, reformulate it this way:

SELECT  u.id, u.isActive, u.userId, u.name, u.gender, u.age,

        ( SELECT  GROUP_CONCAT(CONCAT_WS(', ',timestamp, x, y)

                      ORDER BY timestamp ASC

                      SEPARATOR '; ')

        ) as linedata_0

    FROM  user u

    JOIN  linedata l  ON u.id = l.userid

    WHERE  l.timestamp >= '2018-02-28 20:00:00.000'

      AND  l.timestamp  < '2018-11-20 09:20:08.218';

Another thing. You probably want to be able to look up a user by name; so add INDEX(name)

Oh, what the heck is the VARCHAR(255) for userID?? Ids are normally integers.

answered Nov 21 '18 at 5:21

Rick James

68.8k559100

I need to help you fix several bugs before discussing performance.

First of all, '2018-02-28T20:00:00.000Z' won't work in MySQL. It needs to be '2018-02-28 20:00:00.000' and something needs to be done about the timezone.

Then, don't "hide a column in a function". That is DATEDIFF(l.timestamp ...) cannot use any indexing on timestamp.

So, instead of

    WHERE  DATEDIFF(l.timestamp, '2018-02-28T20:00:00.000Z') >= 0

      AND  DATEDIFF(l.timestamp, '2018-11-20T09:20:08.218Z') <= 0

do something like

    WHERE  l.timestamp >= '2018-02-28 20:00:00.000'

      AND  l.timestamp  < '2018-11-20 09:20:08.218'

I'm confused about the two tables. Both have id and userid, yet you join on id. Perhaps instead of

CREATE TABLE `linedata` (

  `id` int(11) NOT NULL,

  `userId` varchar(255) COLLATE utf8_unicode_ci NOT NULL,

  ...

you meant

CREATE TABLE `linedata` (

  `id` int(11) NOT NULL  AUTO_INCREMENT,  -- (the id for `linedata`)

  `userId` int NOT NULL,   -- to link to the other table

  ...

  PRIMARY KEY(id)

...

Then there could be several linedata rows for each user.

At that point, this

    JOIN  linedata l  ON u.id=l.id

becomes

    JOIN  linedata l  ON u.id=l.userid

Now, for performance: linedata needs INDEX(userid, timestamp) - in that order.

Now, think about the output. You are asking for up to 22K rows, with possibly hundreds of "ts,x,y" strung together in one of the columns. What will receive this much data? Will it choke on it?

To make it work even faster, reformulate it this way:

SELECT  u.id, u.isActive, u.userId, u.name, u.gender, u.age,

        ( SELECT  GROUP_CONCAT(CONCAT_WS(', ',timestamp, x, y)

                      ORDER BY timestamp ASC

                      SEPARATOR '; ')

        ) as linedata_0

    FROM  user u

    JOIN  linedata l  ON u.id = l.userid

    WHERE  l.timestamp >= '2018-02-28 20:00:00.000'

      AND  l.timestamp  < '2018-11-20 09:20:08.218';

Another thing. You probably want to be able to look up a user by name; so add INDEX(name)

Oh, what the heck is the VARCHAR(255) for userID?? Ids are normally integers.

answered Nov 21 '18 at 5:21

Rick James

68.8k559100

answered Nov 21 '18 at 5:21

Rick James

68.8k559100

answered Nov 21 '18 at 5:21

Rick James

68.8k559100

answered Nov 21 '18 at 5:21

Rick James

68.8k559100

This covers most of what I was going to suggest. Note, I think the OP included their accurate table structures, it's just that there is no PK in the linedata table, and the id column there does in fact contain the same value as the id column in the user table. The userId looks like an alternative identifier, for some purpose. Likely, your suggested index of (userid,timestamp) could instead be handled by just making a PRIMARY KEY out of (id,timestamp), as a user is unlikely to be allowed to have more than one location at a time.

– Willem Renzema
Nov 21 '18 at 5:41

What says @WillemRenzema is right. I didn't have a pk for table linedata, but anyway I have created an ai pk called id. And I joined the id of user table(pk ai) with the userId value that now is integer too and matches correctly.

– UgoL
Nov 21 '18 at 10:22

@Rick James thank you for your answer. I have corrected certain points.. especially the personId type. I have generated an integer value that corresponds to the unique id of the user table now. What I think that cannot work properly is your query suggestion, cause I get just one row result that corresponds to just one user. In theory I should get around 21k rows with the concatenation of each timestamp and coordinates. I think you are missing the final group by u.id. Am I right? And if I add it the benchmark for get all the data is around 15 secs. I update the EXPLAIN stmt.

– UgoL
Nov 21 '18 at 10:31

@UgoL - Good start. But before I go any further, please clarify personId versus userId -- are they the same? Do they uniquely identify each user? Why 255? The EXPLAIN and timing are still bad because of lack if index(es) on linedata; please fix. (And maybe some other things that Willem or I mentioned.)

– Rick James
Nov 21 '18 at 17:36

@RickJames yes forgive me for the confusion.. personId is userId. Let me update the question with an example for show better everything, especially the relationship between userId and id. Cause I made some changes. What you mean with lack of indexes on linedata? It should be correct I supposed, but I'll show too this part in the update question.

– UgoL
Nov 21 '18 at 17:44

|
show 7 more comments

This covers most of what I was going to suggest. Note, I think the OP included their accurate table structures, it's just that there is no PK in the linedata table, and the id column there does in fact contain the same value as the id column in the user table. The userId looks like an alternative identifier, for some purpose. Likely, your suggested index of (userid,timestamp) could instead be handled by just making a PRIMARY KEY out of (id,timestamp), as a user is unlikely to be allowed to have more than one location at a time.

– Willem Renzema
Nov 21 '18 at 5:41

What says @WillemRenzema is right. I didn't have a pk for table linedata, but anyway I have created an ai pk called id. And I joined the id of user table(pk ai) with the userId value that now is integer too and matches correctly.

– UgoL
Nov 21 '18 at 10:22

@Rick James thank you for your answer. I have corrected certain points.. especially the personId type. I have generated an integer value that corresponds to the unique id of the user table now. What I think that cannot work properly is your query suggestion, cause I get just one row result that corresponds to just one user. In theory I should get around 21k rows with the concatenation of each timestamp and coordinates. I think you are missing the final group by u.id. Am I right? And if I add it the benchmark for get all the data is around 15 secs. I update the EXPLAIN stmt.

– UgoL
Nov 21 '18 at 10:31

@UgoL - Good start. But before I go any further, please clarify personId versus userId -- are they the same? Do they uniquely identify each user? Why 255? The EXPLAIN and timing are still bad because of lack if index(es) on linedata; please fix. (And maybe some other things that Willem or I mentioned.)

– Rick James
Nov 21 '18 at 17:36

@RickJames yes forgive me for the confusion.. personId is userId. Let me update the question with an example for show better everything, especially the relationship between userId and id. Cause I made some changes. What you mean with lack of indexes on linedata? It should be correct I supposed, but I'll show too this part in the update question.

– UgoL
Nov 21 '18 at 17:44

This covers most of what I was going to suggest. Note, I think the OP included their accurate table structures, it's just that there is no PK in the linedata table, and the id column there does in fact contain the same value as the id column in the user table. The userId looks like an alternative identifier, for some purpose. Likely, your suggested index of (userid,timestamp) could instead be handled by just making a PRIMARY KEY out of (id,timestamp), as a user is unlikely to be allowed to have more than one location at a time.

– Willem Renzema
Nov 21 '18 at 5:41

What says @WillemRenzema is right. I didn't have a pk for table linedata, but anyway I have created an ai pk called id. And I joined the id of user table(pk ai) with the userId value that now is integer too and matches correctly.

– UgoL
Nov 21 '18 at 10:22

@Rick James thank you for your answer. I have corrected certain points.. especially the personId type. I have generated an integer value that corresponds to the unique id of the user table now. What I think that cannot work properly is your query suggestion, cause I get just one row result that corresponds to just one user. In theory I should get around 21k rows with the concatenation of each timestamp and coordinates. I think you are missing the final group by u.id. Am I right? And if I add it the benchmark for get all the data is around 15 secs. I update the EXPLAIN stmt.

– UgoL
Nov 21 '18 at 10:31

@UgoL - Good start. But before I go any further, please clarify personId versus userId -- are they the same? Do they uniquely identify each user? Why 255? The EXPLAIN and timing are still bad because of lack if index(es) on linedata; please fix. (And maybe some other things that Willem or I mentioned.)

– Rick James
Nov 21 '18 at 17:36

@RickJames yes forgive me for the confusion.. personId is userId. Let me update the question with an example for show better everything, especially the relationship between userId and id. Cause I made some changes. What you mean with lack of indexes on linedata? It should be correct I supposed, but I'll show too this part in the update question.

– UgoL
Nov 21 '18 at 17:44

|
show 7 more comments

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

Search This Blog

Ufyukyu