Wednesday, October 26, 2005
PDC05 Sessions Online
In case you - just like me - were unable to attend the PDC05, Microsoft now put all sessions online including streaming and downloadable presentation videos, transcripts and PowerPoint slides.
Sunday, October 23, 2005
Correlated Subqueries
Just last week we faced a SQL performance issue in one of our applications during ongoing development work. Reason was a correlated subquery - that is, a query that can't be evaluated independently, but depends on the outer query for its results, something like:
or
Please note that those are just examples for better understanding, and that database optimizers can do a better job on simple subqueries like that. Our real-life SQL included several inserts-by-select and a deletes-by-select with inverse exists / not exists operations on complex subqueries.
I know correlated subqueries are unavoidable sometimes, but often there are alternatives. In many cases they can be replaced by joins (which might be or not be faster - e.g. there are situations when exists/not exists-operators are superior to joins, because the database will stop looping as soon as one row is found - but that of course depends on the underlying data). Incoherent subqueries though (in opposite to correlated subqueries) should also outperform joins under normal circumstances.
SQL Server Pro Robert Vieira writes in "Professional SQL Server 2000 Programming": "Internally, a correlated subquery is going to create a nested loop situation. This can create quite a bit of overhead. Subqueries are substantially faster than cursors in most instances, but slower than other options that might be available."
When joining is no option, it might be a good idea to take a step back and do some redesign in a larger context. We managed to achieve a many-fold performance boost on a typical workload by replacing the correlated subqueries with a different approach that produced the same results at the end.
Do not expect the database engine to optimize away all tradeoffs that might be caused by a certain query design. The most-advanced optimizer can't take over complete responsibility from the developer. It's important to be conscious about which kind of tuning your database can do during execution, and which not. Creating an index on the correlated subquery's attribute that connects it to the embedding query is a good starting point.
select A1,
(select min(B1)
from TableB
where TableB.B2 = TableA.A2) as MinB1
from TableA
or
select A1
from TableA
where exists (select 1
from TableB
where TableB.B2 = TableA.A2)
Please note that those are just examples for better understanding, and that database optimizers can do a better job on simple subqueries like that. Our real-life SQL included several inserts-by-select and a deletes-by-select with inverse exists / not exists operations on complex subqueries.
I know correlated subqueries are unavoidable sometimes, but often there are alternatives. In many cases they can be replaced by joins (which might be or not be faster - e.g. there are situations when exists/not exists-operators are superior to joins, because the database will stop looping as soon as one row is found - but that of course depends on the underlying data). Incoherent subqueries though (in opposite to correlated subqueries) should also outperform joins under normal circumstances.
SQL Server Pro Robert Vieira writes in "Professional SQL Server 2000 Programming": "Internally, a correlated subquery is going to create a nested loop situation. This can create quite a bit of overhead. Subqueries are substantially faster than cursors in most instances, but slower than other options that might be available."
When joining is no option, it might be a good idea to take a step back and do some redesign in a larger context. We managed to achieve a many-fold performance boost on a typical workload by replacing the correlated subqueries with a different approach that produced the same results at the end.
Do not expect the database engine to optimize away all tradeoffs that might be caused by a certain query design. The most-advanced optimizer can't take over complete responsibility from the developer. It's important to be conscious about which kind of tuning your database can do during execution, and which not. Creating an index on the correlated subquery's attribute that connects it to the embedding query is a good starting point.
Sunday, October 16, 2005
So, The Internet Started In 1995?
Maybe I am just to harsh on IT journalism or people writing about technical issues they just don't understand, but sometimes popular scientific articles are simply so far away from the truth, it drives me crazy. This also makes you wonder about all other kind of popular scientific publishing. I know about software, and most popular software articles are plain nonsense. That is were I notice. But what about topics regarding medicine, biology or physics? Is the same happening there as well?
E.g. Dan Brown's book Digital Fortress was cheered by the press, but littered with nonsense each time it came to talk about cryptographic issues.
Today, Austrian Broadcasting Corporation's tech portal ORF Futurezone published an article about the history of the Internet, stating that "Europe accepted TCP/IP and the Internet architecture in 1995" and "the ISO/OSI committees and telephone companies were opposing TCP/IP, as the were favoring the ISO/OSI networking model". You can't be serious! I mean I understand the average Internet user today might think the Internet started in the mid-90's with the first graphical browsers Mosaic (1993) and Netscape (1994), but IT journalists should know better.
Error in reasoning #1: Defining the Internet equal to the World Wide Web. One might argue that the WWW took off in 1994 with the first graphical browsers available on widespread operating systems like Windows 3 or MacOS. But WWW was and is just a subset of Internet protocol standards (mainly HTTP and HTML) and some server and browser implementations. When I took Internet classes at university in 1994, WWW was mentioned as one of several appliances next to Gopher, FTP, Mail, Usenet, Archie, WAIS and the like. In 1992 when I connected to the Internet for the first time, I did so to read Usenet. Predecessor network ARPANet had been introduced in 1969, and soon after the first mails were sent and files were transferred.
Error in reasoning #2: Putting ISO/OSI network model and TCP/IP in conflict. A short answer to that one: The ISO/OSI is a model for network protocol classification, TCP/IP is a concrete network protocol standard and implementationm where TCP is located at ISO/OSI layer 4 (transport layer), and IP at layer 3 (network layer).
Error in reasoning #3: Taking one interviewee's word for granted that there was European opposition (e.g. telephone companies) to the Internet and TCP/IP until 1995. Truth is: Norway was the first European country to connect its research institutions to the ARPANet already in 1973. Austria followed quite late, and joined NSFNet in 1989. The first commercial providers in Austria were founded in the early 90's, when the Internet was opened for commercial usage. If there was opposition from telephone companies, that was true for the US as well. Bob Taylor, ARPANet project lead, was running against windmills at AT&T for years, because packet-switching just did not fit into their circuit-switching-infested brains.
E.g. Dan Brown's book Digital Fortress was cheered by the press, but littered with nonsense each time it came to talk about cryptographic issues.
Today, Austrian Broadcasting Corporation's tech portal ORF Futurezone published an article about the history of the Internet, stating that "Europe accepted TCP/IP and the Internet architecture in 1995" and "the ISO/OSI committees and telephone companies were opposing TCP/IP, as the were favoring the ISO/OSI networking model". You can't be serious! I mean I understand the average Internet user today might think the Internet started in the mid-90's with the first graphical browsers Mosaic (1993) and Netscape (1994), but IT journalists should know better.
Error in reasoning #1: Defining the Internet equal to the World Wide Web. One might argue that the WWW took off in 1994 with the first graphical browsers available on widespread operating systems like Windows 3 or MacOS. But WWW was and is just a subset of Internet protocol standards (mainly HTTP and HTML) and some server and browser implementations. When I took Internet classes at university in 1994, WWW was mentioned as one of several appliances next to Gopher, FTP, Mail, Usenet, Archie, WAIS and the like. In 1992 when I connected to the Internet for the first time, I did so to read Usenet. Predecessor network ARPANet had been introduced in 1969, and soon after the first mails were sent and files were transferred.
Error in reasoning #2: Putting ISO/OSI network model and TCP/IP in conflict. A short answer to that one: The ISO/OSI is a model for network protocol classification, TCP/IP is a concrete network protocol standard and implementationm where TCP is located at ISO/OSI layer 4 (transport layer), and IP at layer 3 (network layer).
Error in reasoning #3: Taking one interviewee's word for granted that there was European opposition (e.g. telephone companies) to the Internet and TCP/IP until 1995. Truth is: Norway was the first European country to connect its research institutions to the ARPANet already in 1973. Austria followed quite late, and joined NSFNet in 1989. The first commercial providers in Austria were founded in the early 90's, when the Internet was opened for commercial usage. If there was opposition from telephone companies, that was true for the US as well. Bob Taylor, ARPANet project lead, was running against windmills at AT&T for years, because packet-switching just did not fit into their circuit-switching-infested brains.
Saturday, October 08, 2005
New Office 12 User Interface
Revolutionary news from Microsoft - Office usability will never be the same again. Office 12 comes with a completely refurnished user interface. Folks, it was about time! E.g. all Word for Windows releases at least since version 2.0 (1992) essentially followed the same user experience paradigm.
For me the most annoying feature ever were "personalized menus" (you know, only the last recently used menus became visible, so you certainly wouldn't find what you were looking for). That one was introduced in, what was it, Office 2000? And don't let me start moaning about Clippy.
But from what I have seen, Office 12 will be a real step forward. It's a bit risky: just imagine the install base - millions and millions of secretaries who have been ill-conditioned over the last decade in order to find the 1% of overall functionality that they are actually using. But only the bold one wins! Don't let Apple take away all the user interface glory, Microsofties!
For me the most annoying feature ever were "personalized menus" (you know, only the last recently used menus became visible, so you certainly wouldn't find what you were looking for). That one was introduced in, what was it, Office 2000? And don't let me start moaning about Clippy.
But from what I have seen, Office 12 will be a real step forward. It's a bit risky: just imagine the install base - millions and millions of secretaries who have been ill-conditioned over the last decade in order to find the 1% of overall functionality that they are actually using. But only the bold one wins! Don't let Apple take away all the user interface glory, Microsofties!
Monday, October 03, 2005
Five Things Every Win32 Programmer Needs To Know
I was really looking forward to Raymond Chen's PDC talk about "Five Things Every Win32 Programmer Needs To Know" (partly because I was curious which five things he would pick, partly I was wondering whether I would "pass the test" - although I do not really consider myself a Win32 programmer at heart). Unfortunately, Channel 9's hand recording was unviewable because of the shaky picture and the poor audio quality. Luckily Raymond's PDC slides are available on commnet.microsoftpdc.com.
Subscribe to:
Posts (Atom)